Printer Friendly

CAS: storage for fixed content; getting the most out of all your information. (Storage Networking).

A study conducted by the University of California at Berkeley, entitled "How Much Information" stated that the world will produce nearly 12 billion gigabytes of information this year, of which more than half is unchanging digital assets, otherwise known as "fixed content." Fixed Content is retained for active reference and value, it takes many forms such as critical business, legal, and reference documents; X-rays, email attachments, check images, broadcast content, satellite imagery, and much more. Unlike databases or files, which change or are constantly updated, the value of fixed content stems from the combined attributes of expanded use, authenticity and long life.

Once relegated to storage archives or file cabinets, fixed content is being driven online. This is fueled by internal needs, regulatory requirements, digitization across virtually all industries, and the desire to leverage this content into new services and revenue streams. However, the business value of fixed content is largely untapped today, because securely storing and managing large amounts of fixed content online for years or decades has been cost prohibitive, especially when organizations attempt to provide access to unlimited numbers of users at Web speeds.

Traditional disk storage systems with block or file access schemes are well suited for storage of tens to hundreds of terabytes of data typical of collaborative applications. But these systems lack the ability to easily and cost effectively scale and manage massive fixed content repositories that can reach hundreds of terabytes to petabytes in size. Balancing the logistics of data placement and capacity scaling with the need to authenticate data over the content's life, no matter how long that life may be (months, years or decades) also presents a challenge to traditional storage. To solve the quandary of managing and accessing large amounts of fixed content, a new category of networked storage has emerged: "content addressed storage" or CAS.

Pioneered by EMC Corporation, Centera is the industry's first implementation of CAS. CAS is optimized for managing, sharing, and protecting fixed content over its life, just as SAN has been optimized for block data and NAS has been optimized for files.

The content explosion is increasing storage capacity requirements in some industries by as much as 100% or more. Some industries where CAS will have the most immediate impact are:

Healthcare: CAS effectively eliminates the traditional barriers to widespread distribution and online availability of crucial digitized medical information such as X-rays, MRIs, and medical records. CAS enables management costs to remain flat as digitial X-rays and other large medical images accumulate, while ensuring long-term retention and authenticity of these digital images.

Financial Services: CAS addresses two major needs: 1) adherence to stringent regulations that require long-term content integrity, and 2) cost-effective online access to financial information with assured content integrity allows the information to be re-purposed to improve customer services and deliver new revenue producing services/products.

Film, Broadcast, and Media: Video, film, and audio content are the media and entertainment industry's key assets. But only if their reuse, and sale can be managed and protected. A CAS system is an excellent digital asset repository solution because it addresses simultaneously the issues of long-term retention, protected ease of use and verified content authenticity.

Behind the Technology

EMC's Centera is an integrated software and hardware solution purpose built to deal with the storage needs of fixed content. The vast majority of customer value comes from Centera's software. It dramatically improves the ease of use and management of fixed content.

When an object is stored, Centera calculates a 128-bit claim check from the object's binary representation. Centera then translates the 128-bit result into a unique 27-character identifier, called the content address. The content address is derived from, and is unique to, that individual piece of content. Content addressing distinguishes Centera from other storage technologies (all of which are based on location addressing) because it eliminates the need to understand and manage the physical or logical location of information on the storage medium.

Centera links the fixed content object to the application and user via an intermediate data structure, called a c-clip descriptor file (CDF) that contains: time-stamp information, any application-specified meta-data, and the content address for the stored object. It is the CDF's Content Address, not the content object's content address, that the application holds as the virtual "claim check."

The advantages of content addressing:

Assured content authenticity: A content object can have only one content address. Any change to content is detected because it results in a different content address.

A globally unique, location-independent identifier: Using a content address to address the content results in a location independent reference to the content. The content address is independent of operating systems, file systems, and applications. For example, a user's application can store an object in Centera using a document management application running on Windows NT, and retrieve it through a legal application running on Sun Solaris.

Single-instance store: Centera maintains only one copy of a content object and a mirror. If, for example, an application attempts to store the same content object for twenty different clients, for all twenty users, the metadata within each user's CDF would be different, but the object itself will still be stored only once. Because mathematically only one content address exists for a single piece of content. The result is unprecedented capacity-savings and simplified management. Just one of the reasons why the TCO associated with Centera is so attractive.

Centera's implementation of content addressing protects the system from unauthorized access: The content address does not map to a data path, file name, or data type. A user has no "account" on Centera and there is no way to browse. The only way to access content on Centera is through an application, which must have the CDF's 27-character content-derived alphanumeric address.

A single CDF can reference multiple digital assets, such as a collection of files and folders.

The Centera CAS Solution

Centera employs a unique RAIN (redundant array of independent nodes) architecture. Each node consists of a CPU, large capacity disk drives, and runs the Centera operating environment, CentraStar--the heart and soul of Centera. The CentraStar operating environment automates content placement in Centera and runs on every node. A fully populated six-foot cabinet provides approximately 10 terabytes of mirrored content storage (20 terabytes of raw storage). Multiple cabinets can be configured into a single environment enabling Centera to scale up to more than a petabyte of capacity. Centera also can be configured so that content is automatically replicated from one Centera to a remote Centera while still being mirrored locally. This ability to store content in different geographical areas has important disaster recovery benefits. For example, in the event of an outage, replication of content would allow an organization whose local IT resources no longer function to quickly access data at the remote site with virtu ally no down time.

CentraStar is designed to handle minor hiccups and major issues without human intervention. It allows Centera to self-configure, self-heal, and self-manage. For example, adding capacity is a straightforward process requiring minimal time and effort, and is totally non-disruptive. Centera simply "auto-discovers" and uses new capacity as it becomes available. Centera configures and manages itself. At installation, the system administrator need only specify the desired number of nodes to be used as access nodes, based on bandwidth requirements, and configure their IP addresses. CentraStar handles the rest, automatically configuring the internal IP addresses of Centera's nodes and switches. CentraStar also takes responsibility for where content is physically stored within Centera. New content goes to the two nodes that are least busy and have available space--two nodes because every stored content object has a content mirror.

When not storing, retrieving or deleting content objects, CentraStar perform content scrubbing by recalculating the content addresses for all stored objects. If any inconsistency is found, CentraStar will automatically replicate a content object from its mirror residing within Centera. CentraStar also works autonomously to protect content and ensure uninterrupted operation in the event of a drive, node, connection or power failure. It ensures that all content is always mirrored and protected. For example, if a drive or node fails, CentraStar locates all unmirrored content and automatically generates a new content mirror within the Centera system.

EMC Centera's self-managed scalability to petabytes, plug and play installation, lights-out operation, and ability to ensure long-term content authenticity make it a truly unique solution, prompting the creation of an entirely new category of storage in CAS. Given the flexibility and platform-independence of Centera, the new business uses for an organization's fixed content are endless.

Tom Heiser is vice president and general manager of content-addressed storage at EMC (Hopkinton, Mass.)
COPYRIGHT 2002 West World Productions, Inc.
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2002, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:content addressed storage
Author:Heiser, Tom
Publication:Computer Technology Review
Geographic Code:1USA
Date:Oct 1, 2002
Previous Article:Storage vulnerability: security for storage is sparking action. (Security).
Next Article:Wireless browser standard: to realize the possibilities of iMode takes "persistence". (Storage Networking).

Related Articles
Content-addressable storage. (Storage as I See it).
Princeton Softech's database archiving technology supports EMC's Centera Compliance Edition.
Mobius announces ViewDirect TCM. (New Products).
Pentagon awards EMC $40 million contract.
The impact of regulatory compliance on storage: "the compliance landscape is a minefield."--Enterprise Storage Group.
New ILM solutions for regulatory compliance: case study on how a customer achieves both financial and operational efficiencies.
Content Addressed Storage.
Nexsan Assureon appliance solves security, business continuity and compliance issues.

Terms of use | Privacy policy | Copyright © 2018 Farlex, Inc. | Feedback | For webmasters