CREATING A: Quality-of-Storage-Service Model.
1. Install small-scale Storage Area Networks (SANs), a new technology to improve connectivity between multiple servers and storage devices, for targeted applications to identify key ROI metrics on those basic installations and help in planning the transition to larger-scale SAN implementations.
2. Reduce risk by defining and implementing best practices for rolling out and managing more scalable enterprise-class SANs, which permit efficiencies because of economies of scale by being able to share the storage infrastructure flexibly across many Storage Accounts.
3. Put Storage Account security methodologies in place to provide assurances that an organization's data cannot be maliciously or accidentally accessed by another set of users.
4. Create a new highly specialized role of Storage Administrator within the data-center team with up-to-date skills in storage technology and clearly defined accountability to meet the defined levels of QoSS, as well as cost and security expectations.
5. Select a standard set of centralized storage management and data availability solutions that enable the implementation of QoSS metrics using the defined best practices in the heterogeneous enterprise class SAN.
6. Assess and deploy emerging Storage Virtualization architectures, which can dramatically simplify the management processes for large amounts of storage by providing logical abstraction of the physical infrastructure and reduce the need to manage the storage or server platforms on a vendor-specific basis.
7. Implement charge-back tools and processes, which allow Storage Administrators to generate reports on costs and delivered service metrics associated with QoSS levels for each Storage Account to ensure accountability and accurate billing.
A QoSS model can provide significant efficiencies in addressing the escalating costs of storage and optimize the impact of highly skilled IT staff. By combining new process, tools, and technology, deploying a QoSS affects the bottom line by:
* Escalating the visibility of the generally hidden storage administrative costs by using a billing or charge-back model for each storage account based on levels of service
* Reducing delays for strategic IT projects due to storage-skills shortages by relying on a specialized storage administrator team with up-to-date storage expertise
* Removing inefficiencies by leveraging the "economies of scale" of centralized procurement and sharing of consolidated server and storage resources
* Enabling flexibility to optimize the storage costs and characteristics to the needs of a specific application or organization, which may change over time, by empowering the storage administrators to configure various levels of service without a high degree of manual reconfiguration.
* Facilitating implementation of high levels of storage service -- e.g., very fast-application time-to recovery or enabling data availability to survive even the catastrophic geographic failures that might otherwise be impossible because of costs or complexity.
Time to Recovery
Time-to-Recovery refers to the rate at which applications can be back online with full access to their data in the event of a network, hardware, or soft application failure. Time-to-Recovery metrics can range from hours, for applications or services such as intranet Web servers not critical to business operation, to seconds for e-commerce or business-to-business applications that directly affect a company's revenue stream. To make the management of application recovery time for many Storage Accounts efficient, the Storage Administrator needs to avoid a high degree of manual and platform-specific operations, such as physical storage or server reconfiguration. Elements of recovery time that must be managed include server and OS recovery, reconnecting the application to its storage, restarting the application, and reconnecting the application to the network.
To measure Time-to-in hours or days, traditional backup and restore strategies can be used in which a copy of the applications data is stored on tape and is restored to the server when a failure occurs. This leaves the task of servering an application restart to other processes or tools. More advanced backup and restore strategies use "tape virtualization" with automated tape libraries or other forms of storage virtualization to implement continuous incremental backups and point-intime data snapshots on tape to minimize the labor and amount of data required to restore the application's data to online disk.
Managing Time-to-Recovery as a key QoSS metric enables the datacenter team to deliver on the most visible of service-level agreements: application up-time. Using a combination of centralized backup and high-availability tools, the Storage Administrator can simplify the full range of application recovery processes for all storage accounts and deliver on the most demanding recovery time expectations for business-critical applications. Choosing tools that operate over heterogeneous server and storage platforms, reduces training costs and the likelihood of administrative mistakes. SANs and storage virtualization enable the scalability of high-availability clustering to many servers and large heterogeneous storage configurations.
Time-to-Capacity is the speed at which new storage can be assigned and usable, or "provisioned," into a Storage Account's applications. Applications are initially assigned enough storage space to satisfy the current needs but also excess capacity for planned growth. For business-critical applications, the tendency is to overprovision excess capacity because the time to assign new storage may be so long that the application will crash before the operation can be completed. This obviously leads to gross inefficiencies in the use and purchasing of capital equipment. As in Time-to-Recovery, Time-to-Capacity needs can range from several hours or days to less than a minute--e.g., for financial applications at year's end. In all cases, the storage administrator must be able to provision storage with performance and reliability characteristics matched to the needs of the application, and the process should be done with minimal manual intervention or application downtime.
A key element ensuring that Time-to-Capacity can be delivered within an hour or less is to use server-resident storage-virtualization technologies, such as logical volume management, that sit just "beneath" the application or file system. Logical volume management is used to dynamically add the newly provisioned storage into the application while it remains online, avoiding the hours or even days required for planned application downtime.
Managing Time-to-Capacity enables the Storage Administrator to avoid over-purchasing of capital storage equipment and to deal quickly with changes in the storage needs of a business-critical application. By leveraging the scale of SAN infrastructure and the simplicity provided by storage-virtualization technologies, the Storage Administrator is empowered with tools to provide quick action to avoid system crashes due to lack of space and to proactively assign storage that best meets the needs of each application's storage.
Application Performance Guarantee
The Storage Administrator must also provide guarantees that the storage infrastructure can meet the specific performance needs of the applications in each storage account. The metrics used for each application can vary widely. For example, a transactional database used for e-commerce may require minimum latency and have very random data-access patterns. Or a financial ERP system may need to handle significant write traffic for updates from many sources during the last days of a fiscal quarter. Also some of these guarantees may be "real-time"--for example, requiring a measure in terms of minimum levels of data access bandwidth-or the Storage Account may be more concerned with sustainable minimum average over time. In all cases, these application performance characteristics must be carefully defined and agreed to by the Storage Administrator with each Storage Account to set expectations properly. In all cases, it is important to quantify the perceived performance from an application's perspective.
Application storage performance in a QoSS model built on SANs today can be monitored in a composite from many sources of information. Real-time and historical reports and events are gathered independently from SAN fabrics storage arrays, I/O cards, storage management software, and applications. The Storage Administrator reviews these various sources of information to determine if they are meeting the specific needs of each Storage Account and how to best optimize the environment. SANs can be used to establish multiple paths from an application to its data, increasing total bandwidth available, and storage routes can be configured to prioritize certain data paths to minimize latency impact. For high levels of performance tuning, server-resident storage-virtualization technologies that can monitor the access patterns of the application are used to optimize server I/O resources. Today the Storage Administrator will generally have to over-allocate storage-performance resources assigned to the application because of the fragmented nature of performance management tools. The coming generation of SAN management software will better correlate the "end-to-end" information gathered from the paths from the application through to the disk and enable performance management of this entire path.
Managing performance guarantees at the application level provides the best measure for accountability of the centralized storage infrastructure. By using tools that provide a view of performance factors for every element of path from the application to storage, as well as a data-center- wide view, the Storage Administrator can best identify true performance bottlenecks for performance-sensitive applications. In addition, the most critical applications can be provided a "right of way" by configuring the end-to-end storage path as needed to prioritize data access.
Survivability of Data
The value of stored data may vary for each storage account; in today's business climate nearly all stored data is critical to business operation in some way. Each storage account, however, will make a determination of the value of this data and events it should be able to survive. This data survivability can be thought of in terms of three metrics: currency of data that can be recovered; the scope of hardware, software, and administrative failures that can be survived; and recoverability from facility failures. All these metrics require redundancy of data but in various forms and locations. Currency of data is measured in terms of the lag that the redundant data has behind the current application data. Hardware scope is defined in terms of specific events that can be survived, such as storage subsystem failure, server crash, or accidental database corruption.
A wide scope of failures can be addressed using a combination of high-availability clustering (see Time -to Recovery) and system-recovery tools that can roll back the state of database or file system to a specific point in time. Ensuring that data can survive facility failures requires an off-site copy of data. Although this can be done manually by transporting tapes to an off-site facility, SAN and WAN infrastructures can connect data centers to each other over metropolitan and continental distances. This allows a backup or online data mirroring operation to write directly to a remote storage device, dramatically reducing the labor and transportation costs in moving the data off-site. This wide area infrastructure is an upfront capital investment, allowing a high level of data survivability for a Storage Account to be implemented with a modest marginal cost.
Managing data survivability as a QoSS metric ensures that all Storage Accounts take stock of how important data is to their business, raising the awareness of potential loss of a critical asset. By leveraging the economies of scale that the large-scale SANs provide, high levels of data survivability options exist without dramatic incremental costs in infrastructure. And by using a range of tools from traditional periodic backups to sophisticated high-availability clustering and replication, the most efficient means can be used to meet the data survivability needs of each Storage Account on the centralized infrastructure.
Enabling QoSS: Accountability For Storage Account Security And Reduced Total Cost Of Storage
Moving to this new QoSS model requires buy-in from the departments or organizations that own each storage account and senior IT executives. To enable the success of this model, two key factors must be addressed by the QoSS processes and tools: managing Storage Account Security and demonstrating a reduced Total Cost of Storage. Storage Account Security, which becomes even more important if an outsourced service provider is used for the Storage Administrator function, is necessary to assure the owners of the storage account that there is no accidental or malicious access to their data. Reducing Total Cost Of Storage is required to ensure that the company is truly gaining efficiencies from this new model. Both factors are a key part of the accountability process as this Storage Administrator role is put in place.
The QoSS model implies a splitmodel for managing storage in which the Storage Administrator manages the "global" storage resources in the data center, while each Storage Account is assigned a specific portion of these resources via the storage-virtualization services. The owners of the Storage Account may request some level of administrative control over their resources, whether it is as simple as periodic reports or some ability to modify the storage configuration within the account. However they should not be able to have any visibility to resources not explicitly assigned to their account, nor should they be able to perform operations on their resources that might accidentally damage another account's data. First-generation storage account security solutions use dedicated physical storage arrays and SAN fabric topologies to each Storage Account and often specialized software or hardware that acts as an access "client" for each server in the account. Whereas this ensures storage security, it does not permit the desired efficiencies of sharing the SAN and storage infrastructure in a flexible way. State-of-the-art SANs can now allow portions of the SAN interconnect and devices to be allocated a Storage Account, enabling a cost-effective way of sharing the data-center resources. To maximize security in this partitioned environment, the management access points of the SAN devices must be isolated from the users of the Storage Account and a "storage firewall" must be set up at the point of access into the centralized SAN, ensuring that they cannot get visibility into any other resources.
To demonstrate a reduced Total Cost of Storage, the Storage Administrator must implement a reporting and charge-back process in the terms set out by the QoSS agreements. Higher levels of Time-to-Capacity or very demanding Application Performance Guarantees require incremental resources in storage infrastructure at the expense of another Storage Account, and thus result in an increase charge back to the account. The Storage Administrator must monitor the level of service delivered in addition to the capital infrastructure and manpower required to manage each account to generate this report. This level of billing ensures that both the Storage Account owner and the IT executives get visibility into the costs required to manage the desired level of QoSS resulting in more careful analysis of requirements. Ideally, the tools used to manage the QoSS environment should produce customized reports that can easily map back to specific Storage Accounts and the QoSS agreements to minimize the administrative overhead requi red for accounting.
Robin Purohil is vice president of product management, Availability Products, VERITAS Software (Mountain View, CA).
|Printer friendly Cite/link Email Feedback|
|Title Annotation:||Industry Trend or Event|
|Publication:||Computer Technology Review|
|Date:||Apr 1, 2001|
|Previous Article:||Storage Management Best Practices.|
|Next Article:||Jiro Jams!|