CAS: storage for fixed content; getting the most out of all your information. (Storage Networking).A study conducted by the University of California at Berkeley (body, education) University of California at Berkeley - (UCB) See also Berzerkley, BSD. http://berkeley.edu/. Note to British and Commonwealth readers: that's /berk'lee/, not /bark'lee/ as in British Received Pronunciation. , entitled "How Much Information" stated that the world will produce nearly 12 billion gigabytes of information this year, of which more than half is unchanging digital assets, otherwise known as "fixed content." Fixed Content is retained for active reference and value, it takes many forms such as critical business, legal, and reference documents; X-rays, email attachments, check images, broadcast content, satellite imagery Satellite imagery consists of photographs of Earth or other planets made from artificial satellites. History The first satellite photographs of Earth were made August 14, 1959 by the US satellite Explorer 6. , and much more. Unlike databases or files, which change or are constantly updated, the value of fixed content stems from the combined attributes of expanded use, authenticity and long life. Once relegated to storage archives or file cabinets, fixed content is being driven online. This is fueled by internal needs, regulatory requirements, digitization across virtually all industries, and the desire to leverage this content into new services and revenue streams. However, the business value of fixed content is largely untapped today, because securely storing and managing large amounts of fixed content online for years or decades has been cost prohibitive, especially when organizations attempt to provide access to unlimited numbers of users at Web speeds. Traditional disk storage systems with block or file access schemes are well suited for storage of tens to hundreds of terabytes of data typical of collaborative applications. But these systems lack the ability to easily and cost effectively scale and manage massive fixed content repositories that can reach hundreds of terabytes to petabytes in size. Balancing the logistics of data placement and capacity scaling with the need to authenticate (1) To verify (guarantee) the identity of a person or company. To ensure that the individual or organization is really who it says it is. See authentication and digital certificate. (2) To verify (guarantee) that data has not been altered. data over the content's life, no matter how long that life may be (months, years or decades) also presents a challenge to traditional storage. To solve the quandary of managing and accessing large amounts of fixed content, a new category of networked storage has emerged: "content addressed storage A storage technique from EMC for content that is in its final form (fixed content). CAS assigns an identifier to the files so they can be accessed no matter where they are located. " or CAS. Pioneered by EMC Corporation EMC Corporation (NYSE: EMC) is an American Fortune 500 and S&P 500 manufacturer of software and systems for information management and storage. It is headquartered in Hopkinton, Massachusetts, USA. , Centera is the industry's first implementation of CAS. CAS is optimized for managing, sharing, and protecting fixed content over its life, just as SAN has been optimized for block data and NAS (1) See network access server. (2) (Network Attached Storage) A specialized file server that connects to the network. A NAS device contains a slimmed-down operating system and a file system and processes only I/O requests by supporting the popular has been optimized for files. The content explosion is increasing storage capacity requirements in some industries by as much as 100% or more. Some industries where CAS will have the most immediate impact are: Healthcare: CAS effectively eliminates the traditional barriers to widespread distribution and online availability of crucial digitized medical information such as X-rays, MRIs, and medical records. CAS enables management costs to remain flat as digitial X-rays and other large medical images accumulate, while ensuring long-term retention and authenticity of these digital images. Financial Services The examples and perspective in this article or section may not represent a worldwide view of the subject. Please [ improve this article] or discuss the issue on the talk page. : CAS addresses two major needs: 1) adherence to stringent regulations that require long-term content integrity, and 2) cost-effective online access to financial information with assured content integrity allows the information to be re-purposed to improve customer services and deliver new revenue producing services/products. Film, Broadcast, and Media: Video, film, and audio content are the media and entertainment industry's key assets. But only if their reuse, and sale can be managed and protected. A CAS system is an excellent digital asset repository solution because it addresses simultaneously the issues of long-term retention, protected ease of use and verified content authenticity. Behind the Technology EMC's Centera is an integrated software Separate software components or applications that have been combined into one package. See integrated software package. and hardware solution purpose built to deal with the storage needs of fixed content. The vast majority of customer value comes from Centera's software. It dramatically improves the ease of use and management of fixed content. When an object is stored, Centera calculates a 128-bit claim check from the object's binary representation. Centera then translates the 128-bit result into a unique 27-character identifier, called the content address. The content address is derived from, and is unique to, that individual piece of content. Content addressing distinguishes Centera from other storage technologies (all of which are based on location addressing) because it eliminates the need to understand and manage the physical or logical location of information on the storage medium. Centera links the fixed content object to the application and user via an intermediate data structure, called a c-clip descriptor (1) A word or phrase that identifies a document in an indexed information retrieval system. (2) A category name used to identify data. (operating system) descriptor file (CDF (1) (Central Distribution Frame) A connecting unit (typically a hub) that acts as a central distribution point to all the nodes in a zone or domain. See MDF. ) that contains: time-stamp information, any application-specified meta-data, and the content address for the stored object. It is the CDF's Content Address, not the content object's content address, that the application holds as the virtual "claim check." The advantages of content addressing: Assured content authenticity: A content object can have only one content address. Any change to content is detected because it results in a different content address. A globally unique, location-independent identifier: Using a content address to address the content results in a location independent reference to the content. The content address is independent of operating systems Operating systems can be categorized by technology, ownership, licensing, working state, usage, and by many other characteristics. In practice, many of these groupings may overlap. , file systems, and applications. For example, a user's application can store an object in Centera using a document management application running on Windows NT (Windows New Technology) A 32-bit operating system from Microsoft for Intel x86 CPUs. NT is the core technology in Windows 2000 and Windows XP (see Windows). Available in separate client and server versions, it includes built-in networking and preemptive multitasking. , and retrieve it through a legal application running on Sun Solaris. Single-instance store: Centera maintains only one copy of a content object and a mirror. If, for example, an application attempts to store the same content object for twenty different clients, for all twenty users, the metadata within each user's CDF would be different, but the object itself will still be stored only once. Because mathematically only one content address exists for a single piece of content. The result is unprecedented capacity-savings and simplified management. Just one of the reasons why the TCO (1) (Total Cost of Ownership) The cost of using a computer. It includes the cost of the hardware, software and upgrades as well as the cost of the inhouse staff and/or consultants that provide training and technical support. See ROI. associated with Centera is so attractive. Centera's implementation of content addressing protects the system from unauthorized access: The content address does not map to a data path, file name, or data type. A user has no "account" on Centera and there is no way to browse. The only way to access content on Centera is through an application, which must have the CDF's 27-character content-derived alphanumeric alphanumeric (ăl'fən mĕr`ĭk) or alphameric (ăl'fəmĕr`ĭk), the set of letters and numbers. address. A single CDF can reference multiple digital assets, such as a collection of files and folders. The Centera CAS Solution Centera employs a unique RAIN (redundant array of independent nodes) architecture. Each node consists of a CPU CPU in full central processing unit Principal component of a digital computer, composed of a control unit, an instruction-decoding unit, and an arithmetic-logic unit. , large capacity disk drives, and runs the Centera operating environment In computing, an operating environment is the environment in which users run programs, whether in a command line interface, such as in MS-DOS or the Unix shell, or in a graphical user interface, such as in the Macintosh operating system. , CentraStar--the heart and soul of Centera. The CentraStar operating environment automates content placement in Centera and runs on every node. A fully populated A circuit board whose sockets are completely filled with chips. six-foot cabinet provides approximately 10 terabytes of mirrored content storage (20 terabytes of raw storage). Multiple cabinets can be configured into a single environment enabling Centera to scale up to more than a petabyte One quadrillion bytes (one trillion kilobytes). Also PB, Pbyte and P-byte. See peta, binary values and space/time. (unit) petabyte - 2^50 = 1,125,899,906,842,624 bytes = 1024 terabytes or roughly 10^15 bytes. 1024 petabytes is one exabyte. of capacity. Centera also can be configured so that content is automatically replicated from one Centera to a remote Centera while still being mirrored locally. This ability to store content in different geographical areas has important disaster recovery benefits. For example, in the event of an outage out·age n. 1. A quantity or portion of something lacking after delivery or storage. 2. A temporary suspension of operation, especially of electric power. , replication of content would allow an organization whose local IT resources no longer function to quickly access data at the remote site with virtu ally no down time. CentraStar is designed to handle minor hiccups Hiccups Definition Hiccups are the result of an involuntary, spasmodic contraction of the diaphragm followed by the closing of the throat. Description and major issues without human intervention. It allows Centera to self-configure, self-heal, and self-manage. For example, adding capacity is a straightforward process requiring minimal time and effort, and is totally non-disruptive. Centera simply "auto-discovers" and uses new capacity as it becomes available. Centera configures and manages itself. At installation, the system administrator need only specify the desired number of nodes to be used as access nodes, based on bandwidth requirements Bandwidth requirements (communications) The channel bandwidths needed to transmit various types of signals, using various processing schemes. Every signal observed in practice can be expressed as a sum (discrete or over a frequency continuum) of sinusoidal , and configure their IP addresses. CentraStar handles the rest, automatically configuring the internal IP addresses of Centera's nodes and switches. CentraStar also takes responsibility for where content is physically stored within Centera. New content goes to the two nodes that are least busy and have available space--two nodes because every stored content object has a content mirror. When not storing, retrieving or deleting content objects, CentraStar perform content scrubbing by recalculating the content addresses for all stored objects. If any inconsistency is found, CentraStar will automatically replicate a content object from its mirror residing within Centera. CentraStar also works autonomously to protect content and ensure uninterrupted operation in the event of a drive, node, connection or power failure. It ensures that all content is always mirrored and protected. For example, if a drive or node fails, CentraStar locates all unmirrored content and automatically generates a new content mirror within the Centera system. EMC (1) (EMC Corporation, Hopkinton, MA, www.emc.com) The leading supplier of storage products for midrange computers and mainframes. Founded in 1979 by Richard J. Egan and Roger Marino, EMC has developed advanced storage and retrieval technologies for the world's largest companies. Centera's self-managed scalability to petabytes, plug and play installation, lights-out operation, and ability to ensure long-term content authenticity make it a truly unique solution, prompting the creation of an entirely new category of storage in CAS. Given the flexibility and platform-independence of Centera, the new business uses for an organization's fixed content are endless. www.emc.com Tom Heiser is vice president and general manager of content-addressed storage at EMC (Hopkinton, Mass.) |
|
||||||||||||||||||||

mĕr`ĭk)
Printer friendly
Cite/link
Email
Feedback
Reader Opinion