Printer Friendly

File systems and storage.

Structured data storage gets most of the press--is this the realm of storage area networks, virtualization. automated provisioning. But this kind of well-behaved data only makes up 20% of enterprise data--an important 20%, but still a distinct minority. That leaves 80% of enterprise data as semi-structured (e-mail, combined database and file systems) and unstructured (word processing, spreadsheets, presentations, images). These files take up a large amount of storage capacity and can be difficult to manage, but they contain a lot of business-critical information. Jeff Erramouspe, president and CEO of Deepfile said, "There's a lot of attention paid to data center-centric data. There's very little information paid to the file system. That tends to be spreadsheets and PowerPoint files and Word files, or Access databases that are used by individuals or groups or departments. It could be creative data, things like source code if you looked at a software development company. This is a highly under-managed area."

File system-based storage management is evolving to meet the needs of unstructured data Dave Howard, president of Colorado Software Architects, decided to base their storage management development efforts on file systems because of its potential and opportunities. "If the file system is intelligent enough, any application will work with it. Frankly, it was fertile ground because not a lot had been done in that area."

Sun Microsystems has been firmly in this camp since their acquisition of SAM-FS, which archives files onto secondary media but keeps them online and immediately accessible. Suzan Szollar, Sun's product line manager in marketing, said, "In terms of direction, I think we're looking at all environment where we're distributing more and more of the file system across the SAN and the network, and also being able to work in heterogeneous environments."

Opportunities for file-based storage management exist throughout vertical markets. Some of the sectors with the highest file management needs include:

* Broadcast and video on demand

* Medical/healthcare

* Government/military/aerospace

* Education

* Oil and gas

* Manufacturing

* Life sciences

* Telco.

Erramouspe said, "You would never think of treating your structured dam the way you treat your unstructured data. There are five times as much unstructured as structured, but we rarely manage it. Everyone has DBAs, but who has file system managers?"

According to Marty Ward, director of product marketing, high availability and storage management at Veritas, file-based storage management development continues to build on traditional volume management and file system management services. During the 1990s, developers worked on cluster-based file storage management. such as integrating structured and unstructured data through clustered volume management services. This work flowed into today's file-based storage resource management (SRM) research, integrating file and block-based backup and recovery, and tiered storage.


Erramouspe said that even the largest companies haven't managed their files and file systems well. "The reason they haven't gotten it right is they're trying to manage the wrong thing. They're trying to manage disk and volumes and partitions, Our view is they need to manage their files, and if they manage their files well, everything else will take care of itself." SRM is one way of simplifying and extending management resources to file systems.

SRM sounds simple enough--a way of managing storage resources. However, the specifics differ wildly and range across physical and logical layers. All SRM packages monitor and report, some add automation and control features, and most are increasingly tied to application-specific service requirements.

Physical layer SRM: Works at the fabric and/or device Level. Example: Monitors and reports on port connections and alerts the storage administrator to network congestion or failure.

Logical layer SLIM: Manages stored data at the file level--files, file systems, volumes and volume groups (i.e., tracks and reports on the amount of space that an application's data has consumed).

Some logical level SRM packages already report on volume and disk usage and other broad categories, but file-based SRM adds file level reporting, Sample queries might include the one hundred fastest growing files on a network, the one hundred oldest files on an array, or one hundred files that haven't been accessed for over a month but aren't part of a critical dataset. This kind of detailed query allows administrators to make intelligent decisions about file archiving and retirement, and to identify orphaned files.

Backup and Recovery

Backup and recovery has traditionally been file-based, but has become more challenging because file volumes have grown so much larger. Chris Van Wagoner, director of product marketing at Comm Vault said, "The backup world in general, and Comm Vault in particular, has looked to block level technologies to provide better performance." Basing backup even on incremental file changes can be time consuming: backup applications must check each individual file for modifications since the last backup. At an average of a sixtieth of a second this does not take long, but when the backup application must consider 60 million files it becomes prohibitive to maintain a reasonable backup window. Instead, backup applications can speed up (and shrink backup window times) by looking at changed blocks instead.

However, block-based backup loses individual file definitions. This is a problem when a restore application must locate individual files on a backup. The restore application needs to know information such as what format the data is in--machine readable format? Has it been written by a backup product? If so, was it a block level or file level transfer? Has the data been copied--snapshots, replication, mirroring? If the backup is block-based, the short answer is: Who knows? There is no file structure index to tell the restore application. This also impacts semi-structured applications such as CAD/CAM, which stores image information in databases. Backup applications must preserve and synchronize file-based images with their database index entries, or they can't be restored.

One fix would be to add block to file mapping to backup products, similar to what a file system would have. Data protection would run at the block level, but would also preserve file intelligence. Van Wagoner added, "'Those are the kinds of things that are coming along that will give the end user the best of both worlds: the performance and speed of block by block, and the logical recovery based on the files."

Ward also believes that that file-based and block-based data will converge. "'We're seeing the structured data stored in the same place as the unstructured data. This gives easier allocation and easier management of the data out to the application. The ease of management comes into play on the backend, by backing up data onto the same storage network."

Tiered Storage

Data lifecycle management, which is the process of managing data from creation to retirement, doesn't exist yet in a full-blown version. Existing tools do help to manage the cycle, including tiered storage. Tiered storage, which operates in both file- and block-based environments, uses policies to determine data characteristics. Depending on those parameters, it will then assign data protection levels. It will also match archgiving procedures according to policy. For example, IT might direct their tiered storage management software to archive an HR application onto on-line ATA disk where it is transparently available to end-users. The same application can migrate year-old Word documents from a senior management or legal group onto long-term tape storage. This allows IT to optimize storage by the value of the file data, and to save time by efficiently migrating data to secondary storage systems.

File-based tiered storage particularly benefits verticals with petabytes of file storage, such as geological explorations, medical MRI files, and massive genome studies in the life sciences. Tiered storage is also helpful when dealing with compliance issues, where companies must be able to locate and quickly restore their archived data. Nor does it stop there. Bob Bingham, chief marketing officer of TeraCloud, said about compliance, "It's bigger than just regulatory issues, every company bus their own criteria in which they want to manage their infrastructure. It's a balance between quality of service and cost. There are some companies that never want to delete a file. And to maintain that quality of service and best practices they're going to have a much higher cost."

Ultimately, data management may come together under a utility computing model. Utility storage spans the world of storage software and hardware: working across physical and logical layers; SAN, NAS and DAS; and block- and file-based data. By tying together unstructured, semi-structured and structured data management, its possible to virtualize and automate provisioning across all enterprise storage.

Bingham said, "At the end of the day, applications and systems come and go, but the data ends up being the company's most important asset. The infrastructure is the means to an end, the data is the end to itself."
COPYRIGHT 2003 West World Productions, Inc.
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2003, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:Special SAN Section
Author:Chudnow, Christine Taylor
Publication:Computer Technology Review
Date:Jul 1, 2003
Previous Article:Where does an IP-SAN solution fit?
Next Article:Employing IP SANs for Microsoft exchange deployment.

Related Articles
Storage Networking--Promises, challenges And Coming Convergence.
SAN Cache: SSD In The SAN.
Virtual Worlds: Virtualization Layers In The SAN.
SANs VS. NAS: What You Should Know And Why You Should Care.
New horizons in Enterprise Storage: NAS gateway precursors SAN/NAS convergence. (Cover story).
Virtual storage and real confusion: a big disconnect between what vendors offer and what users want.
SAN and NAS convergence: can satisfy storage requirements while reducing complexity and overhead. (Storage Networking).
Where does an IP-SAN solution fit?

Terms of use | Privacy policy | Copyright © 2020 Farlex, Inc. | Feedback | For webmasters