Virtual storage and real confusion: a big disconnect between what vendors offer and what users want.
Sorting the Blocks
All computer data is basically made up of blocks, units of information that move quickly between servers and their storage. Many applications make sense out of blocks, creating files with added definitions and file headers, which in turn requires more sophisticated processing at the server and storage levels. Virtualizing file-based data requires file systems that can interpret the packets and store file metadata, while virtualizing block-based data works closer to the raw disk levels. Block-level storage virtualization distributes blocks of data to virtual storage pools across multiple storage devices, which helps to manage space and control devices at the hardware level. File-level virtualization manages the stored objects--the files--which may be scattered around different physical storage devices.
SANs are ideal for sharing block data over open systems networks such as Fibre Channel or iSCSI. Block-level virtualization generally refers to aggregating storage devices in a storage area network, increasing storage space to applications while increasing flexibility and ease of management. Block-level virtualization technologies often include data-protection technologies such as replication, snapshots, and local and remote mirroring. Since the SAN offloads these computer-intensive operations from the local network, this results in better storage consolidation and improved storage resource management (SRM). Physically consolidating storage helps control over-provisioning and improves ROI, while simplified SRM allows storage administrators to efficiently manage shared block storage as a single volume. By separating storage management and allocation from the physical hardware and specific application servers, storage administrators can manage and control escalating storage costs.
Although SANs are by nature shared storage environments, virtualization does not automatically follow. Even in a storage area network, there are technical limitations to servers sharing their storage devices over Fibre Channel or IP. Servers still communicate with their storage devices using SCSI protocols and must meet certain SCSI requirements. For example, servers do not want to be confronted by a large pool of amorphous storage. They expect to see specific targets with addresses containing a target ID and logical unit number (LUN). In addition, some hosts will grab any LUN they can see, regardless of what the storage administrator meant to do. Such hosts may also compete for the same device, unwittingly over-writing each other's data. Because of this, storage administrators commonly zone their Fibre Channel switches to block other servers from seeing the same devices. A related approach is called LUN masking, which renders certain LUNs invisible to other servers even in the same zone. This keeps the stora ge intact and secure, but whittles away at the SAN's reason for existence: the ability to share storage among application servers.
To get around these limitations, block virtualization presents virtual layers that appear between the servers and physical storage devices. Actual blocks may be stored across different storage devices, while the storage administrators create virtual devices by virtually partitioning a single disk, or aggregating multiple disks to widen the storage pool. The servers no longer see (and try to grab) specific physical targets, but instead "discover" logical volumes for their exclusive use. The servers send their storage directly to the virtual volumes, happily thinking they are their direct-attached property. In fact, these logical volumes are highly flexible.
For example, Fujitsu Softek's Virtualization application, which is built on a DataCore engine, builds a transparent layer between the application server and storage devices. The virtualized layer shows the application server a set of devices optimized for its needs. Meanwhile the virtualization engine maps the virtual devices to actual physical devices. As is common with these types of virtualization schemes, the engine also uses advanced caching, local and remote mirroring, and snapshot capabilities to optimize and protect the virtualized storage.
File-level virtualization requires global distributed file systems, or what passes for global file systems: proxy-like file systems that push data through a centralized server. These file systems process file requests from different operating systems and translate them into common commands for the target storage device. File virtualization (which can work in DAS and NAS as well as SAN), provides a virtual file system layer between files and block level storage. This virtual file system allows the administrator to manage files on virtual volumes, while in fact the files are scattered across different physical devices. Note that simply using a distributed file system does not equal virtualization--for example, NAS appliances and clusters often use such file systems to allow different applications or operating systems to access the same files. But if the global file system additionally allows the storage administrator to create and administer a storage pool, it is a virtualization technology.
File- and block-based virtualization are complementary, and it's not unusual to wed the two in converged environments. Two such implementations are virtual file systems in SANs and storage clusters. Block-based virtualization is already big on the SAN, and competition rages around where to locate the virtualization engine: single hosts or arrays, single-vendor or multi-vendor domains, or SAN-wide implementations. (The question there is in-band or out-of-band virtualization engines.) But SAN-based file serving suffers from a unique circumstance: SAN application servers are limited to a single storage device, represented by a unique LUN, so file virtualization systems must create file-based storage pools by striping multiple LUNs into a single logical volume.
Some SAN-based virtual file systems already exist, including Sistina's Linux storage cluster software. More storage companies are developing virtual file systems for multiple operating systems, which would allow users to store their files on the SAN as well as easing file management issues for storage administrators. For example, IBM's Storage Tank project would virtualize enterprise storage by adding an installable file system to storage system clients such as Windows 2000, Linux, and Unix. Files enter the SAN from the IP network, where Storage Tank logs metadata information for file attributes and locations, and enables file locking.
Storage clusters also benefit virtual file systems that allow the administrator to share storage devices with cluster servers. A storage cluster consists of a storage area network, shared storage devices, a cluster file system and a volume manager running on the storage cluster servers. Veritas uses its Cluster Server to enable storage area network clusters and shared storage. Combined with Volume Manager, Cluster Server virtualizes multiple attached storage devices by assigning them to a single disk group in the cluster.
For example, two clustered servers may share two 100GB disks in a storage cluster on the SAN. Server A runs an Oracle database and needs 90GB of storage space from one disk in the storage cluster. Server B also runs an Oracle database but only needs 30GB from its disk. Server A is in constant danger of overflowing its disk, while Server B is seriously under-utilizing its own. With virtualized storage, Server A can expand its database onto the second disk without compromising the Server B's existing storage allocation.
There is still great confusion in the marketplace over virtualization technology, and with reason--dozens of different vendors refer to their product as the way to virtualize storage, even though their approaches differ radically from one another. (The resulting confusion and hype have produced a high level of customer anxiety, which some analysts charmingly refer to as "FUD"--fear, uncertainty, and doubt.) End users want virtualization to mean:
* The ability to create logical pools of storage for all storage configurations--DAS, NAS or SAN, NAS front-ends/SAN back-ends, clusters, and disk or tape arrays.
* The ability to pool storage regardless of the applications, file systems, or operating systems involved.
* The ability to virtualize and manage storage across heterogeneous storage environments.
Many companies offer portions of the above wish list, but will only achieve market clarity as they approach the more comprehensive approach customers want. There are indications this is happening, such as IBM's sweeping storage initiatives, EMC's announced dedication to SNIA storage standards, Veritas' and Fujitsu Softek's heterogeneous virtualization engines, and Sun's recent acquisition of virtualization technology. The more comprehensive these virtualization technologies are, the better they'll meet user expectations.
RELATED ARTICLE: Common Virtualization Terms
Virtualization can be confusing, but common definitions help to explain it
Array: A group of disk drives and at least one controller that stripes data and redundancy across the drives according to RAID algorithms. (Enterprise tape libraries are similar in scope, and companies such as StorageTek have virtualized tape for several years now.)
Virtualization: A simplified view of storage derived from complex lower level views. The system presents this logical, or virtual, layer to the user.
SAN Storage Area Network: A network of storage devices and host CPUs that allows block-level storage sharing. SANs can also share files by adding virtual file system capabilities.
Host CPU: A computer system that consumes storage resources typically by running file systems, data-bases, and/or applications.
Virtual Array: A disk array with virtualization capability and the capacity to implement RAID algorithms.
Virtualization in the SAN: SAN virtualization may be limited to a host CPU, an individual array, or single-vendor domains. More comprehensive virtualization engines typically virtualize multi-vendor arrays, and may reside either in-band (within the SAN) or out-of-band (in the outside network).
|Printer friendly Cite/link Email Feedback|
|Author:||Chudnow, Christine Taylor|
|Publication:||Computer Technology Review|
|Article Type:||Industry Overview|
|Date:||Nov 1, 2002|
|Previous Article:||Buyer beware: XML and eForms come in various flavors. (Internet).|
|Next Article:||The straight skinny on disk vs. tape pricing; tip: you may be surprised.|