Understanding Online Archiving.Online archiving provides more efficient and faster access, plus major disk space savings without performance penalties System administrators have historically relied on offline archiving for data backup and storage. In a typical scenario, offline archiving is a manual process for moving data to a media that is no longer connected to the system environment. When it becomes necessary to retrieve that particular data, then another similar manual process must be performed to bring the data back onto the system environment so that the data can be used. Other drawbacks exist besides the intensive time required in the manual archiving method such as it impacts user productivity while the system administrators wait for an operator to locate the right tape and load it. Locating the desired files can also be a challenge with potentially hundreds or even thousands of on-site and off-site tapes. Once the files are found, the data on the tapes may be corrupted due to an indefinite shelf life of tape medium caused by oxidation oxidation /ox·i·da·tion/ (ok?si-da´shun) the act of oxidizing or state of being oxidized.ox·idative ox·i·da·tion n. 1. The combination of a substance with oxygen. 2. . Finally, it increases management and helpdesk costs because it is a very manpower intensive operation. The other traditional data storage, nearline archiving, involves moving the data to a slower media such as robotic tape and laser or magnetic optical jukeboxes See optical disc library. . Nearline archiving is also referred to as Hierarchical Storage Management See HSM. (HSM (1) (Hierarchical Storage Management) The automatic movement of files from hard disk to slower, less-expensive storage media. The typical hierarchy is from magnetic disk to optical disc to tape. ). Retrieving data from nearline archiving devices is slow, but is much faster than doing it from offline archiving, since it is not a manual process. A HSM system selects files through a policy procedure and archives them. The archiving is a multi-step process, including data compression data compression Process of reducing the amount of data needed for storage or transmission of a given piece of information (text, graphics, video, sound, etc.), typically by use of encoding techniques. and then moving the files to the nearline storage Nearline storage (where Nearline is a contraction of Near-online) is a term used in computer science to describe an intermediate type of data storage. It is a compromise between online storage (constant, very rapid access to data) and offline storage (infrequent device. Additionally, when a user or application attempts to access an archived file See archive. , a time lag occurs. The HSM will find the device and media where the file is located, and then inform the device to load the appropriate media. Once the file media is loaded, the HSM will retrieve the file from the media and decompress To restore compressed data back to its original size. (compression, data) decompress - To reverse the effects of data compression. it, at which time the file will be available. Issues the system administrator faces in nearline archiving include the configuration requirements for optimum storage: archiving of the least-needed data. Additionally, the HSM system must operate as desired without adversely affecting performance on a regular basis. For example, let's say an HSM system is configured con·fig·ure tr.v. con·fig·ured, con·fig·ur·ing, con·fig·ures To design, arrange, set up, or shape with a view to specific applications or uses: and files are migrated to nearline devices. A "performance hit" or lag time is required to access a particular file and bring it back to the online system. If the HSM system is not properly configured, one of two situations can occur. First, the system administrator is not archiving enough data because he or she is not sure whether it will be needed or whether the performance lag time is acceptable or, second, too much data is archived and each time the file is accessed, lag time results. A case in point is an application that requires a nearline-archived file every three months. On each occasion, this file is retrieved from a tape robotics robotics, science and technology of general purpose, programmable machine systems. Contrary to the popular fiction image of robots as ambulatory machines of human appearance capable of performing almost any task, most robotic systems are anchored to fixed positions system, brought back into the system, and lag time is incurred. Here's how this scenario plays out. In 60 days, this particular file is moved off the system and, 30 days later, it is moved back on the system. As a result of this highly unproductive movement, most system administrators generally opt for the first extreme of not archiving enough data due to the lag time issue. Then, there is the cost of nearline archiving because it is a highly complex system. Both the hardware and the software are expensive. However, the highest cost incurred with nearline archiving, or HSM, is management. HSM is complex to configure See configuration. (software) configure - A program by Richard Stallman to discover properties of the current platform and to set up make to compile and install gcc. Cygnus configure was a similar system developed by K. and to manage well. Without archiving data, system administrators will definitely run out of disk space. Each time this occurs, the system is brought down and new hardware is installed. Then, it is configured and the data is reloaded. The downtime The time during which a computer is not functioning due to hardware, operating system or application program failure. and management are very expensive. (This scenario assumes that the hardware was already purchased and delivered. If not, the cost of managing this system skyrockets.) Also, the more pieces of hardware, the greater the opportunity for failure. The Table shows that on average the disk drive Mean-Time-Between-Failure (MTBF (Mean Time Between Failure) The average time a component works without failure. It is the number of failures divided by the hours under observation. MTBF - Mean Time Between Failures ) is five years for one disk. With 60 disks, MTBF is one month, and, with 180 disks, it is 10 days. Online Archiving Online archiving for Unix and Windows NT (Windows New Technology) A 32-bit operating system from Microsoft for Intel x86 CPUs. NT is the core technology in Windows 2000 and Windows XP (see Windows). Available in separate client and server versions, it includes built-in networking and preemptive multitasking. environments is now making its entrance to resolve these storage and backup issues that are plaguing system administrators. Online archiving refers to taking data not being used on a regular basis and storing it efficiently on direct access systems--disk drives or enterprise storage systems connected via SCSI SCSI in full Small Computer System Interface Once common standard for connecting peripheral devices (disks, modems, printers, etc.) to small and medium-sized computers. SCSI has given way to faster standards, such as Firewire and USB. , fiber, or other cabling. Additional hardware is not required in an online archiving environment, but more importantly, in addition to efficient data storage, the hallmark of online archiving is high-speed access when the data is needed. Key benefits to the system administrator are reduced backup time and reduced hard-drive requirements, which in turn, translates into reduced management, maintenance, and support expenditures. Online archiving comes at a time when cost of ownership continues to escalate es·ca·late v. es·ca·lat·ed, es·ca·lat·ing, es·ca·lates v.tr. To increase, enlarge, or intensify: escalated the hostilities in the Persian Gulf. v.intr. dramatically. Take, for example, a $10,000 hardware investment. Industry experts say the cost of running a piece of hardware like a disk drive is $5 to $7 annually for every dollar spent on hardware. Therefore, for a $10,000 investment, annual cost is $50,000 to $70,000. A five-year cost of ownership for that $10,000 hardware investment is about a quarter million dollars, including the cost of the hardware. However, online archiving helps the system administrator to save those high-level expenses by providing a more efficient way to store data. Files continue to reside on the direct access disks. Hence, data availability Refers to the degree to which data can be instantly accessed. The term is mostly associated with service levels that are set up either by the internal IT organization or that may be guaranteed by a third party datacenter or storage provider. is increased and access time is greatly reduced compared to offline and nearline archiving. There is also the key benefit of performance gain during normal backup procedures due to the following aspects. First, compressed files remain compressed on the backup tape See tape backup. . This reduces the time and resources required to move the data back online. Secondly, there is less data to travel through disks into computer memory and then to tape devices. A probable benefit is the decrease of network bottlenecks during network backup or utilizing Network- Attached Storage (NAS (1) See network access server. (2) (Network Attached Storage) A specialized file server that connects to the network. A NAS device contains a slimmed-down operating system and a file system and processes only I/O requests by supporting the popular ). Set Policy And Forget It When online archiving is used, the user sets the storage policy within seconds by specifying the filing characteristics: for example, file extension, size, name, last-modified date, or owner. Then, he or she can forget about it. Online archiving dynamically compresses the data according to according to prep. 1. As stated or indicated by; on the authority of: according to historians. 2. In keeping with: according to instructions. 3. the preset preset Cardiac pacing A parameter of a pacemaker that is programmed permanently when manufactured policy. The file remains online for immediate access and is transparent to users and applications. When the user accesses the file, online archiving retrieves that data twice as fast as it was compressed. The user can juggle disk space online to make room for new files, emergency tasks, or testing databases. Also, files can be compressed to delay moving them to HSM tape storage. The user has, at his or her command, compression ratios compression ratio Degree to which the fuel mixture in an internal-combustion engine is compressed before ignition. It is defined as the volume of the combustion chamber with the piston farthest out divided by the volume with the piston in the full-compression position ( up to 99 percent so that more files can be stored without adding disks and, thus, remain under budget and keep a safe lead ahead of today's dramatically growing data. In effect, an online archiving system is automatic and all its key operations are transparent to the user. When data is compressed automatically, all filing characteristics remain exactly the same. When users access those compressed files, they are automatically decompressed. When they finish with this data, the file is left uncompressed for better performance. Then, when they archive it again, based on the file meeting the policy, the file is recompressed at that time. The rationale behind this is that users accessing a file will likely use it multiple times: appending to it, updating it, or just reviewing it. System administrators can also tune compression to trade off speed versus compressibility com·press·i·ble adj. That can be compressed: compressible packing materials; a compressible box. com·press . When the archiving policy is set up, part of that procedure is deciding whether to optimize speed or compressibility. The user can set his or her compression policy characteristics to uphold up·hold tr.v. up·held , up·hold·ing, up·holds 1. To hold aloft; raise: upheld the banner proudly. 2. To prevent from falling or sinking; support. 3. specific performance levels. As far as performance is concerned, files compressed at any ratio consume less time to transfer to directories such as NFS (Network File System) The file sharing protocol in a Unix network. This de facto Unix standard, which is widely known as a "distributed file system," was developed by Sun. See file sharing protocol and WebNFS. NFS - Network File System drives or to write to tape. Faster restore reduces the time spent on system reloads and disaster recovery. Online And HSM Online archiving is complementary to nearline archiving. For instance, a system administrator may currently have 500GB of disk space. He or she knows that, one day soon, a nearline system will be needed; but he or she does not want to engage it yet because system administration resources aren't available. Here's where online archiving provides system administrators a stepping-stone to archiving capability without the difficulties involved in purchasing and installing a Hierarchical Storage Management system. Plus, an online archiving system can serve as an educational tool for understanding what archiving is really about before moving into the complications of a HSM. Online archiving works with a HSM system as a first-line archiving step. For example, a file that is not used after 30 days becomes online archived. If, after six, nine, or 12 months that particular file hasn't been touched, then it is moved to the nearline device. The system administrator gains the archiving efficiencies with online archiving before that file is moved off. This way, system administrators gain that additional step that they don't have with a HSM system. If they already have a HSM system in use, system administrators would naturally question having online archiving. The most significant advantage of adding online archiving is to be able to store and access valuable files that are periodically needed without incurring costly performance penalties. On the other hand, if they don't have a HSM system but are rapidly accumulating data, adding enormous numbers of disk drives, and are worried about the eventual HSM purchase and installation, then they can opt for online archiving. This involves software installation without hardware considerations. This approach alleviates system administrators from being concerned about disk space usage or worrying over the problems of archiving the right amount of data or of incurring performance lag time. The benefit of online archiving is gaining disk space savings without taking performance hits. In summary, online archiving provides the following benefits: 1. Reduced system administration costs through the reduction of disk space for archived files. This saves $50,000 to $70,000, annually, per $10,000 saved in hardware. 2. Reduced backup media and time, since fewer bytes are used. 3. Reduced occurrence of out-of-disk space failures and future disaster-recovery time to increase up time. 4. Reduced number of disk drives to decrease risk of hardware failures. 5. Increased archiving capacity and performance through the addition of a stepping stone before needing to archive nearline. Paul Wang is the president of Solution-Soft (San Jose San Jose, city, United States San Jose (sănəzā`, săn hōzā`), city (1990 pop. 782,248), seat of Santa Clara co., W central Calif.; founded 1777, inc. 1850. , CA).
MTF as Disks Increase
Number of Disks Mean-Time-Between-Failure
1 5 years
60 1 month
180 10 days
|
|
||||||||||||||||||

Printer friendly
Cite/link
Email
Feedback
Reader Opinion