Will continuous data protection make tape-based backups obsolete?
The promise of CDP is that it can protect any data, anywhere in the data center and recover a consistent version of the data as it existed at a specific-point-in-time. This article explores CDP technology and offers guidance when choosing a data protection solution.
Today's businesses are faced with an ever-increasing amount of data, threatening to undermine their existing storage management solutions. Creating a copy of yesterday's changed data and running it to tape is no longer adequate to support the real-time requirements of today's business. Critical data changes occur throughout the day, and to protect this data, customers are frequently turning to technologies such as CDP to offer improved recovery times.
How Did CDP Get Started?
Continuous data protection came into use about three years ago and since then, many different vendors soon touted their solution as CDP. This caused a lot of confusion in the industry, as each vendor claimed to have the only "true" CDP solution. To mitigate this, in January 2005 the CDP special interest group of the SNIA Data Management Forum was created. The mission of the CDP-SIG is to be the leading authority and resource on CDP solutions and to facilitate and promote CDP interoperability.
A CDP product is one that will continuously monitor an object for changes and will preserve copies of all prior versions of the object. The user will have the ability to view and access these prior versions as required. The time to perform recovery changes from hours or days to seconds or minutes. The backup window is no longer a problem because there is no longer the concept of a backup window. There are two approaches to CDP object protection: file-system centric or block-based. Historically, the CDP file system approach started on the Windows environment, where most applications utilized files to hold their data. Block based CDP started in the UNIX (and now Linux) community, where database applications traditionally bypassed the file system and operated directly at the disk/block level.
File-system CDP products are typically found in Microsoft Windows environments, and usually offer a file-system or explorer interface for their configuration and recovery operations. Recovery is typically end-user driven and is either provided through an extension to the file system or as a shared file system based on NFS or CIFS that can be mounted to a recovery server.
Block-based CDP products started in intelligent arrays such as those from EMC and HDS and soon moved to host-based or appliance-based platforms. These CDP products operate as a layered feature of the underlying storage infrastructure, and usually operate independent of the host's file system and volume manager. Recovery is typically storage or database administrator driven, is provided through capabilities outside of the platform being protected, and is managed by the CDP implementation.
What About Recovery?
There are two general principles that govern all recovery: the recovery point objective (RPO) and the recovery time objective (RTO). The RPO defines how much data you are willing to lose when you recover data. For example, if you backup twice a day your RPO would be 12 hours, which is the maximum amount of data loss that could occur between backup images. The RTO defines how long it will take to recover your business processes from a data failure. This includes not only the data recovery, but restarting the servers or applications that depend on that data. These recovery considerations must also be applied to local and remote recovery strategies.
A true CDP product protects every data change, and the RPO approaches zero. On the other hand, with the vast amount of data being recoverable, how you choose the recovery point effects your recovery time.
Unless a CDP product includes some application level awareness, all that the administrator knows about a CDP image is the date and time it was taken. It may take many recovery iterations to narrow down to the correct image, and some CDP vendors provide additional tools to help.
One way to determine if CDP is right for you is ask yourself a set of qualifying questions. For example, are you worried about meeting the business SLAs established by the CIO? Perhaps you are you being asked to measure business impact of downtime or are looking to modify or improve your backup site and strategy? Do you have rapidly changing data that is critical to business operations, and are you worried about shrinking backup windows to protect that data? If you answered yes to one or more of these questions, you should be seriously investigating CDP technology.
What About Tape Backup or Archiving?
The need for an off-site archive is greater these days then ever. New mandates in North America and Europe are requiring companies to retain off-site copies of their business data in a form that can be quickly retrieved. While traditional tape backups can be used to meet this mandate, many companies are now investigating newer technologies such as CDP, content addressable storage and virtual tape libraries to improve their RTO. They may also be replicating their data to an off-site DR site.
Companies that have invested heavily in backup technology don't need to throw away their investment when they implement CDP. A CDP product compliments existing backup products by providing an additional on-line, near instantaneous recovery of recently changed information.
A CDP product will present a copy of data from any point in time. An important use for this copy is as the source for a traditional backup system. For example, a simple process would present a copy of the data to the backup server which then backs it up, and then throws away the CDP copy. If the backup were to fail, it would be a simple matter to recreate the image and restart the backup. Additionally you can use CDP technology to provide a richer backup environment, for example auditing the data before the backup is performed.
What To Look for in a CDP Product?
There are several features a mature CDP product brings to the market. These include:
* Support of heterogeneous storage and server environments. Today's customers are refusing to be locked into a single vendor for their storage and server solution. Users should select a CDP product that doesn't restrict them to only a subset of their possible storage and server environments.
* Awareness of applications and their environments. Application recovery is becoming more complex and time consuming, users should chose a product that integrates application specifics into the CDP recovery process.
* Non-invasive to the application or server that is being protected. A CDP product should attempt to minimize any impact to the applications I/O throughput or CPU load. This is best done by keeping the CDP footprint on the application server to a minimum, and moving any 'heavy-lifting' to an external server or appliance.
* Out of band solution. An out of band CDP solution does not impede the flow of data from the server to storage. Hence a failure of the CDP product will not affect your applications access to storage.
* Built on a scalable, reliable platform. If the CDP product is hosted on an appliance platform, the user should have the ability to add additional appliances that can scale their CDP capacity as the data protection needs grow.
* Supports a federated application environment. Many of today's complex applications (such as SAP R3) utilize servers and storage that span multiple hosts. Customers should choose a CDP product that supports these systems, as it provides the user with a consistent, federated image for recovery.
* Supports business policies and SLAs. Companies assign different values to their different applications. A CDP product that is flexible in its support of differing protection and recovery policies can provide a better overall solution.
* Can be extended by use of APIs/CLIs. Look for a CDP product that has functionality that can be easily extended by the customer to meet business needs.
* Tightly integrated with business continuity technologies. A CPD product that supports application clusters and remote replication provides a stronger solution then a CDP product that only provides a stand-alone solution.
There are many resources available on the internet that discusses CDP technology, starting with the SNIA website. Additional resources include industry conferences such as Storage Networking World, the Network Storage Conference, and the Gartner Storage Summit. Finally, take advantage of the wealth of information on CDP on various vendor websites such as Kashya.
Gary Archer is the senior manager for product marketing at Kashya, Inc. (San Jose, CA) and is their representative on the SNIA CDP-SIG.
|Printer friendly Cite/link Email Feedback|
|Publication:||Computer Technology Review|
|Date:||Sep 1, 2006|
|Previous Article:||Downfall of the mainframe? Still alive and computing.|
|Next Article:||Protect your data: top 10 list of recommendations.|