Data protection strategies: are they too complex?Data protection disaster recovery and security are possibly the three most critical issues facing the I.T. industry today. Clearly, the impact of terrorist activity, natural disasters, a much higher fiscal responsibility, and the globalization globalization Process by which the experience of everyday life, marked by the diffusion of commodities and ideas, is becoming standardized around the world. Factors that have contributed to globalization include increasingly sophisticated communications and transportation of our economy have repeatedly correlated cor·re·late v. cor·re·lat·ed, cor·re·lat·ing, cor·re·lates v.tr. 1. To put or bring into causal, complementary, parallel, or reciprocal relation. 2. the value of data to survival of most businesses. Though determining the monetary value of data remains difficult and varies significantly based upon the business, knowing the relative value of data for a given business is becoming a more common practice and enables businesses to select the most appropriate high availability Also called "RAS" (reliability, availability, serviceability) or "fault resilient," it refers to a multiprocessing system that can quickly recover from a failure. There may be a minute or two of downtime while one system switches over to another, but processing will continue. strategy for their storage infrastructures. The choices are many and they have tradeoffs that must be considered. For reference purposes, four distinct levels of classifying data are most commonly used. These levels indicate which backup and recovery technology may be optimally suited and most cost-effective for each level. [ILLUSTRATION OMITTED] Data Protection Considerations Data replication covers a wide range of methods to create additional copies of data either locally or in a remote location. Understanding the growing list of options available to implement a high-availability storage strategy is essential but can seem confusing and existing solutions entail many tradeoffs. Nonetheless, a successful implementation of data replication techniques can significantly improve the likelihood of surviving a business disaster. Replication Options Replication refers to the process of creating multiple copies of data at either local and/or remote locations. There are several techniques available to implement replication. 1. Backup/restore is the most traditional disaster recovery method writing data (usually a complete file) from primary disk to either disk or tape for backup and from tape or disk back to primary disk for recovery. These are sometimes referred to as D2D2T (Disk-to-Disk-to-Tape) Refers to backing up data on disks first and tape (or optical disc) second. Backing up onto tape is performed at less frequent intervals than from disk to disk. See D2D and virtual tape. for disk-to-disk-to-tape, D2D (Disk-to-Disk) Typically refers to backing up data on disks rather than on tape. Disk-to-disk backup systems provide a very fast restore capability compared with tape backup. See D2D2T and virtual tape. for disk-to-disk, or D2T D2T Drecq Daniel Technologies D2T Decoupled Truncated Tip for disk to tape. In most cases, traditional backup causes the application being backed up to be impacted or even stop. Data can be backed up locally or remotely. Some more advanced application specific backup modules allow open files and data bases to be backed up non-disruptively without stopping or interrupting the application. Tradeoffs exist when choosing a backup strategy. Backing up full disk volumes or files may become very time consuming and be difficult to schedule. In addition to full backups See backup types. , incremental Additional or increased growth, bulk, quantity, number, or value; enlarged. Incremental cost is additional or increased cost of an item or service apart from its actual cost. and differential backups See backup types. (operating system) differential backup - A kind of backup that copies all files that have changed since the last full backup. Each differential backup will include all files in previous differential backups since the full backup so to restore a version of represent further options. In a differential backup, the same data that was backed up on the previous differential backup is also backed up on the next differential backup. That's why differentials often grow each day in size between full backups. This means that daily backups get gradually larger, but are easier to restore. A full restore only requires the last full backup and the last differential. For incremental backups See backup types. (operating system) incremental backup - A kind of backup that copies all files which have changed since the date of the previous backup. The first backup of a file system should include all files - a "full backup". Call this level 0. , only the data that has changed since the last incremental backup is backed up. This reduces the amount of data backed up and therefore reduces the time needed for "backup window." A full restore, however, takes longer as each incremental will have to be restored to get all files to their last known state and is generally a more complex process. Often a full backup will be performed weekly while an incremental backup is performed daily. 2. Mirroring is implemented as a block-for-block replica Earlier document exchange software from Farallon Communications, Inc. that converted a Windows or Mac document into a proprietary viewing format. The viewer could be distributed separately or embedded within the document itself, turning it into a single-document viewer. of a file, a logical unit, or a physical disk volume normally using disks for all copies. Once the mirrored data element is established by copying the original data element, the mirror is maintained by replicating all write operations in two (or more) places creating identical copies. The choices for mirroring increases, as the storage administrator must choose to implement asynchronous Refers to events that are not synchronized, or coordinated, in time. The following are considered asynchronous operations. The interval between transmitting A and B is not the same as between B and C. The ability to initiate a transmission at either end. or synchronous Refers to events that are synchronized, or coordinated, in time. For example, the interval between transmitting A and B is the same as between B and C, and completing the current operation before the next one is started are considered synchronous operations. Contrast with asynchronous. mirroring and tradeoffs exist for each case. In synchronous mirroring, both the source and the target devices must acknowledge the write is completed before the next write can occur. This slows application performance but keeps the mirrored elements synchronized syn·chro·nize v. syn·chro·nized, syn·chro·niz·ing, syn·chro·niz·es v.intr. 1. To occur at the same time; be simultaneous. 2. To operate in unison. v.tr. 1. as mirror images of each other. For asynchronous mirroring, the source and target devices do not have to synchronize See synchronization. their writes and the second and subsequent writes occur independently. Therefore, asynchronous mirroring is faster than synchronous mirroring but the secondary copies are slightly out-of-sync with the primary copy. This is sometimes referred to as a fuzzy fuzz·y adj. fuzz·i·er, fuzz·i·est 1. Covered with fuzz. 2. Of or resembling fuzz. 3. Not clear; indistinct: a fuzzy recollection of past events. 4. copy. In reality, the secondary data element is rarely more than one minute behind or out-of-sync with the primary copy. Mirroring is also defined and commonly referred to as RAID-1. Mirroring is used for many mission critical applications and it is the fastest way to recover data since restore operations can occur in no more than a few seconds by switching over to a mirrored copy. 3. Snapshot (1) A saved copy of memory including the contents of all memory bytes, hardware registers and status indicators. It is periodically taken in order to restore the system in the event of failure. (2) A saved copy of a file before it is updated. copy is another disk high-availability feature and has gained popularity as it provides a less expensive means for business to experience some of the benefits of disk mirroring. With snapshot copy, only one complete copy of the data exists at a time. When using snapshot copy and write operations occur, the changed areas are saved in a separate area or partition A reserved part of disk or memory that is set aside for some purpose. On a PC, new hard disks must be partitioned before they can be formatted for the operating system, and the Fdisk utility is used for this task. on disk of disk storage specifically reserved for snapshot activity. Here the old value of the affected area or block is being saved in case the new block(s) are corrupted. Tradeoffs exist here also. Every change to the primary copy of data generates additional write operations to the area on disk storage designated to contain the snapshots. This activity adds IO overhead and increases disk storage consumption. The storage administrator will also need to manage the number and currency of snapshots. Snapshots provide data protection from intrusion and data corruption Data corruption refers to errors in computer data that occur during transmission or retrieval, introducing unintended changes to the original data. Computer storage and transmission systems use a number of measures to provide data integrity, the lack of errors. but not from a device failure. 4. Point-In-Time (PIT) copy is the fourth type of replication technology. PIT copy provides a view of data at a specific point-in-time. It eliminates the need to shut down an application for backup and enables a continuous mode of providing data protection. Like a series of still images, PIT copies are complete data images taken at specified points in time. PIT copies generate more storage consumption and an additional IO workload as frequency increases. PIT copies enable an administrator to go back in time to restore data from a stable state prior to when a corruption or other disruption occurred. Gaining popularity, this is the best method to protect from human errors, software problems, viruses and intrusions and data corruption. Again, tradeoffs exist. The more frequent the PIT copy is taken, the more storage is required and the more time it takes to determine which copy is the correct one to restore from. 5.Journaling is another method to enable data recovery where every write and update operation is written to another device that may or may not be the same as the primary device. Unlike mirroring, however, the secondary copy is a sequential history of write events. All write operations are queued to the secondary device, or the journal device, which may be disk or tape. Usually an asynchronous approach, journaling uses metadata to associate the write operation with the location in primary storage where the data belongs. Tradeoffs exist in journaling. In a typical IT environment, approximately 20% of all IOs are writes. Journaling can reconstruct re·con·struct tr.v. re·con·struct·ed, re·con·struct·ing, re·con·structs 1. To construct again; rebuild. 2. a full volume if the journal was initially created as a mirrored copy of the primary volume. If the journal copy was not created as a mirrored copy, journaling is not a substitute for a physical device failure. Journals are typically kept as a continuous history for 2 to 4 days covering the period of maximum likelihood for a data recovery action to occur. Journals also speed up the recovery process and reduce the backup window. Journals are especially good for protecting from intrusion and data corruption, enabling restores to go back in time to a point before the corruption occurred. Normally, journaling accompanies one of the above replication strategies to build a complete recovery strategy for a file, data set, or data base. Where to Replicate rep·li·cate v. 1. To duplicate, copy, reproduce, or repeat. 2. To reproduce or make an exact copy or copies of genetic material, a cell, or an organism. n. A repetition of an experiment or a procedure. ? Replication can be implemented at the server, the switch or network, or in the storage subsystem The part of a computer system that provides the storage. It includes the controller and disk drives. See storage system. . Again, there are choices and tradeoffs exist. For server replication, the replication function runs on the server(s) and the servers almost always need to run the same operating system operating system (OS) Software that controls the operation of a computer, directs the input and output of data, keeps track of files, and controls the processing of computer programs. . Data is usually transmitted between servers using an IP network. The storage devices do not need to be the same on each server, however host compute To perform mathematical operations or general computer processing. For an explanation of "The 3 C's," or how the computer processes data, see computer. resources are consumed. Switch- or network-based replication supports both heterogeneous servers and storage devices and doesn't consume host resources. This may be the most costly approach, as proprietary hardware and software are normally required. Storage subsystem-based replication requires the devices to be the same or from the same vendor but the switch and server can be different. This is the most proprietary implementation of replication. If you think this is confusing and there are too many choices, you're definitely not alone. Why does a high availability solution for disk require someone to select from full, incremental, differential, point-in-time, synchronous or asynchronous mirrors, snapshot copies, or journals running on either the host, the switch or in the storage subsystem? Why does using tape for backup often seem so simple by comparison? Storage management for disk is complex and businesses have often decided to choose the simpler approach after sustaining three years of downsizing (1) Converting mainframe and mini-based systems to client/server LANs. (2) To reduce equipment and associated costs by switching to a less-expensive system. (jargon) downsizing and cutbacks. The replication technologies are in place to deliver high availability data protection capabilities. Choose one that best suits your business. As data becomes more valuable every day, implement some type of data protection strategy. Doing nothing is a strategy, just not a very good one.
Data Description
Classification
Mission Critical Up to 15% of online data, fundamental data required
for business survival in the event of a disaster.
Normally mirrored to disk and also backed to tape in a
different geographic location.
Vital About 20% of online data. Data used in normal business
processes but may not be immediately needed for a
disaster recovery. Normally backed up to tape and/or
replicated to lower cost disk storage.
Sensitive About 25% of online data. Data used in normal business
processes that has an alternative source or can be
reconstructed and may not be needed for hours or days
after a disaster. Normally backed to tape.
Non-critical Typically 40% of online data. Data that is not needed
for disaster recovery. Easily reconstructed or
duplicated from prior backup copies.
|
|
||||||||||||||||||||

Printer friendly
Cite/link
Email
Feedback
Reader Opinion