Applications For Shared Data Clusters.This article is the second in a two-part series. The first part appeared in the January issue of CTR See click-through rate. . A shared data cluster is a computer cluster A computer cluster is a group of tightly coupled computers that work together closely so that in many respects they can be viewed as though they are a single computer. The components of a cluster are commonly, but not always, connected to each other through fast local area in which multiple computers can access data in the same file system concurrently. This is a significant departure from the traditional availability ("shared nothing") cluster, in which each system's storage is essentially private, but can be rolled over to another system on demand. Part one of this article described how organizations that have done the work of creating a Storage Area Network (SAN) can leverage their investments with shared data clusters. Potential benefits from shared data clusters include: * Storage consolidation/simplified management, as there is one primary source of data to maintain. (For data availability Refers to the degree to which data can be instantly accessed. The term is mostly associated with service levels that are set up either by the internal IT organization or that may be guaranteed by a third party datacenter or storage provider. purposes, this data may be mirrored.) * Improved availability of applications through faster failover Invoking a secondary system to take over when the primary system fails. Up-to-date copies of all required data and applications are maintained on the secondary system in order to respond immediately if the primary system becomes unusable. Also called "fallover." See replication. . * Simplified scalability: to handle increased demand, you can simply add more processors accessing the same data. A shared data cluster requires shared volume and file system software that can ensure the file system integrity when multiple nodes access that data concurrently. Last month's article described the challenges of shared file systems in general, and described the VERITAS SANPoint Foundation Suite HA, one software solution that meets the challenges of data sharing The ability to share the same data resource with multiple applications or users. It implies that the data are stored in one or more servers in the network and that there is some software locking mechanism that prevents the same set of data from being changed by two people at the same time. in clusters of computers. This article describes how to achieve these benefits in specific system applications. The examples described in this article can be moved to a shared data configuration relatively easily (without application rewrites), and derive significant benefits from operating in a shared data environment Automation services that support the implementation and maintenance of data resources that are used by two or more combat support applications. Services provided include: identification of common data, physical data modeling, database segmentation, development of data access and . This is not an exhaustive list of applications; instead, these offer prototypical applications for shared data solutions. Continuous Availability For Enterprise File Servers Many organizations adopt some kind of clustering strategy to enhance the availability of enterprise file servers. In a common configuration, two or more servers are connected to the same clients and same storage, but serve separate file systems. If one should fail, the other server recognizes the failure and picks up the first file server's work. It mounts the file systems belonging to the failed server and starts responding to its clients. This simple configuration is illustrated in the left side of Figure 1. In this shared nothing cluster, each file system is bound to a specific server. System administrators must still manage each file system's usage and capacity on an individual, system-by-system basis. Using a shared file system, illustrated on the right side of Figure 1, creates a much more flexible configuration. Both file systems are shared on the SAN; both servers can serve any file system. In this case, clients can choose a file server based Refers to hardware or software that runs in the server. Contrast with client based. on server load or other considerations, or the cluster could use load balancing The fine tuning of a computer system, network or disk subsystem in order to more evenly distribute the data and/or processing across available resources. For example, in clustering, load balancing might distribute the incoming transactions evenly to all servers, or it might redirect them techniques to direct client requests to the least busy server. This immediately offers better overall performance by reducing server overload See information overload and overloading. conditions. This configuration offers other benefits: * Simplified administration. Administrators do not have to manage the addition of storage on a server-by-server basis, or decide which servers should serve new file systems. * Improved availability. If one file server fails, file service is not interrupted in·ter·rupt v. in·ter·rupt·ed, in·ter·rupt·ing, in·ter·rupts v.tr. 1. To break the continuity or uniformity of: Rain interrupted our baseball game. 2. . The file system does not need to be restarted or remounted; requests are filled by the remaining file server. * Improved quality of service. Pooling storage resources in shared file systems may make it easier to put more file systems on mirrored volumes or other highly available storage configurations. The rest of the applications described in this article implement shared data clusters. Web Servers Web serving is an increasingly common application today, and one in which some form of clustering is already common, and often business-critical. In a shared nothing web cluster See Web farm. , multiple web servers share common client access, but each accesses its own "read-mostly" copy of the data. A load balancing facility like Cisco Local Director directs requests to the least loaded web server, optimizing resource utilization. To handle site growth, the site can simply add another web server and another copy of the data to the web cluster, if the application is "read-mostly", then a back end application must periodically resynchronize the data copies. The shared nothing scenario; with each server accessing its own copy of the web data, is illustrated on the left side of Figure 2. If one server fails, the workload is distributed to the remaining servers. The main limitation of this model is an administrative one: the site administrator must maintain many copies of the site data. When site content changes, the updates must roll out at the same time, so clients experience consistent behavior. Managing site-wide updates requires considerable administrative effort. Maintaining many copies of the data increases the overall cost of ownership for the site, both in terms of incremental Additional or increased growth, bulk, quantity, number, or value; enlarged. Incremental cost is additional or increased cost of an item or service apart from its actual cost. storage and administrative overhead. By implementing a shared data cluster (illustrated on the right side of Figure 2), web sites can greatly simplify administration and scaling. In a shared data cluster, all web servers access a common data image. This means that clients see the same data, regardless of which server handles a request. Administrators no longer need to maintain multiple data copies. As demand increases, the administrator can add computing computing - computer capacity as web servers, or storage capacity for increased data or I/O (Input/Output) The transfer of data between the CPU and a peripheral device. Every transfer is an output from one device and an input to another. See PC input/output. I/O - Input/Output performance demands--but can make these decisions independently of each other. Overall, it is simpler to add capacity to this configuration than to a shared nothing cluster. For data protection, this one logical copy may be mirrored to separate devices. Video Post Production Video post production is a particularly challenging type of workflow application A workflow application is where various applications, components and people must be involved in the processing of data to complete an instance of a process. For example, consider a purchase order that moves through various departments for authorization and eventual purchase. . In a workflow application, a single piece of work passes between individuals (or computers) to complete a transition to a finished state. In video post production, the initial product is one or more raw, digitized images. A series of specialists transform these images with content, scene transitions, titles, light and color--each taking the work from a previous specialist in the chain and making their own changes. The extra challenge in video post production is that the files being handled are very larger--typically multiple gigabytes in size. Moving these files through the steps, between workstations, is time-consuming, whether it happens by network transfers or on tape. The left side of Figure 3 shows a simplified version of a workflow application. A shared file system, indicated in the right side of Figure 3, creates an easy to manage platform for this kind of application. The SAN interconnects the computers and the data, eliminating network-or tape-based transfers of copies. The shared file system arbitrates access to data, so that video objects in use by one workstation are not available to the next. The result is faster proceses with a reduced potential for "handling" errors in the process--attributes that are highly valued in deadline-sensitive film and video industries. Minimizing Backup Impact Maintaining backups is one of the most onerous on·er·ous adj. 1. Troublesome or oppressive; burdensome. See Synonyms at burdensome. 2. Law Entailing obligations that exceed advantages. administrative tasks in the IT environment. Administrators find themselves in a Catch-22 situation--the more the business relies on the data, the more it needs frequent backups to ensure fast recoverability. At the same time, the greater the reliance on the data, the smaller the tolerance for any backup window or potential performance impact from backups. There are two fundamental problems, regardless what backup technology you use: * Creating a consistent "point in time" image of a large file system or database without restricting access to the data during the time to create the image. * Performing backups of large amounts of data without disrupting performance (through server I/O) or client traffic. Storage management vendors are continually developing and enhancing technologies to mitigate mit·i·gate v. To moderate in force or intensity. mit i·ga tion n. these problems--including incremental backup See backup types. (operating system) incremental backup - A kind of backup that copies all files which have changed since the date of the previous backup. The first backup of a file system should include all files - a "full backup". Call this level 0. , hot database backups, and more recently LAN-free backup A LAN-free backup is a backup of server data to a shared, central storage device without sending the data over the local area network (LAN). It is usually achieved by using a storage area network (SAN). . LAN-free backup eliminates data transfer load from the client network by allowing a client to direct a server to back up data directly to a local or remote device. This is particularly effective in Storage Area Networks, as several interconnected servers can share tape devices serially. VERITAS also has developed a series of technologies for creating fast snapshots of existing file systems. These include VERITAS FlashBackup and the VERITAS Storage Checkpoint (programming) checkpoint - Saving the current state of a program and its data, including intermediate results, to disk or other non-volatile storage, so that if interrupted the program could be restarted at the point at which the last checkpoint occurred. capabilities of the VERITAS File System See VxFS. . A storage checkpoint creates a point in time image of a database by maintaining a "before" image copy of each file system block modified after the checkpoint is created. Any unmodified Adj. 1. unmodified - not changed in form or character unqualified - not limited or restricted; "an unqualified denial" modified - changed in form or character; "their modified stand made the issue more acceptable"; "the performance of the modified aircraft blocks are read from the primary file system; if the block, is modified, the original content resides in the storage checkpoint. A storage checkpoint takes only seconds to create. A Block Level Incremental Backup uses a Storage Checkpoint to deter mine which blocks in an Oracle database have changed and write only changed blocks on a backup. SANPoint Foundation Suite HA allows off-host backups from a different server in the cluster using a Storage Checkpoint as the point in time image. This minimizes any impact on the production system and, by reducing backup impact, enables more frequent backups for recoverability and quality of service. A Development Environment Software developers themselves, a vital part of the new economy, work in an environment that can benefit significantly from shared data and file systems. In many ways, their needs are similar to most work environments; they want to keep working productively, without delays or outages. They need to protect critical data, and to reduce or limit the administrative efforts to maintain these environments. In most development environments, dozens of developers access a common set of source code flies. To test their work, the developers need to build and test the entire product or components of the products with their changes, typically using build servers dedicated to that effort. Compilation is a resource-intensive activity. If many developers start compiling at the same time, it can create heavy demands on the build servers and slow compilation. Likewise, the build servers must be available for the developers to test their work. By adopting a cluster of build servers sharing common source files, a development team can balance loads between servers to provide better overall performance, and can continue operating if any single build server fails. Because developers typically work in their own home directories, file access is naturally partitioned par·ti·tion n. 1. a. The act or process of dividing something into parts. b. The state of being so divided. 2. a. , which reduces the potential for delays that would be possible if multiple servers request access to the same files. This configuration simplifies administration for the development team; because everyone accesses a single, master source code directory, there is only one logical copy of the source code, to manage. (Most development environments use a source code control system to "check out" and "check in" data for versioning control.) Consolidating all of the developers' home directories in a single location also makes it easier to ensure that critical work is backed up regularly, reducing the exposure to data loss. And availability is enhanced with multiple build servers accessing the same shared data. Should one server fail, the others remain available for compilation. Using a shared file system, failovers are very fast, as the file system is not tied to any one node and can be accessed by another available node in the cluster (Fig 4). This configuration also optimizes file access and network resources, as file traffic does not travel over the. LAN (Local Area Network) A communications network that serves users within a confined geographical area. The "clients" are the user's workstations typically running Windows, although Mac and Linux clients are also used. , and users access data at Fibre Channel speeds. This provides better performance than a standard NFS (Network File System) The file sharing protocol in a Unix network. This de facto Unix standard, which is widely known as a "distributed file system," was developed by Sun. See file sharing protocol and WebNFS. NFS - Network File System file server. As the applications in this article indicate, the potential uses for shared data clusters are widespread. Single host applications can benefit from the enhanced availability of a shared file system, as well as the capability to off-load See offload. backup operations to another server. Cooperative applications such as file servers can be made to work with multiple hosts for improved scalability and performance. For multi-host applications such as web applications running on several servers concurrently, a single shared file system available to all applications instances improves manageability and scalability. A shared file system allows these applications to grow through the addition of. servers, and also improves availability by enabling them to redistribute re·dis·trib·ute tr.v. re·dis·trib·ut·ed, re·dis·trib·ut·ing, re·dis·trib·utes To distribute again in a different way; reallocate. load when a server fails simply by reassigning network addresses. If you have implemented or are considering implementing a Storage Area Network, then you should consider how best to optimize optimize - optimisation this architecture for existing and future applications. Data sharing solutions such as the VERITAS SANPoint Foundation Suite HA can leverage the shared storage in a SAN by sharing data residing on the SAN. Paul Massiglia is the technical director, engineering, at VERITAS Software Veritas Software Corp. was an international software company that was founded in 1983 as Tolerant Systems, renamed Veritas Software Corp. in 1989, and merged with Symantec in 2005. It was headquartered in Mountain View, California. (Colorado Springs Colorado Springs, city (1990 pop. 281,140), seat of El Paso co., central Colo., on Monument and Fountain creeks, at the foot of Pikes Peak; inc. 1886. It is a year-round resort and a booming military, technological, and commercial city. , CO) and Anne Janzer is a technical marketing consultant based in Mountain View, CA. |
|
||||||||||||||||||||

i·ga
tion n.
Printer friendly
Cite/link
Email
Feedback
Reader Opinion