Building a 24 X 7 database.The Risk of Downtime The time during which a computer is not functioning due to hardware, operating system or application program failure. Over the last few years, corporations have invested billions of dollars to integrate the automations of core business systems into large Enterprise Resource Planning See ERP. (application, business) Enterprise Resource Planning - (ERP) Any software system designed to support and automate the business processes of medium and large businesses. (ERP (Enterprise Resource Planning) An integrated information system that serves all departments within an enterprise. Evolving out of the manufacturing industry, ERP implies the use of packaged software rather than proprietary software written by or for one customer. ) applications. Increasing reliance on the availability of ERP environments and the advent of "around the world, around the clock" business transactions via e-commerce exposes organizations to a great risk. Losing access to the ERP system or the e-commerce application for an extended period of time may cause the entire business to collapse. This, however, is only one part of the picture. Most people in decision-making positions are totally dependent on the application to achieve their productivity goals. Successful enterprises reduce the risk of downtime while increasing the day-to-day application response time by combining multiple state-of-the-an technologies. Your Definition of 24x7 When evaluating the need for high availability Also called "RAS" (reliability, availability, serviceability) or "fault resilient," it refers to a multiprocessing system that can quickly recover from a failure. There may be a minute or two of downtime while one system switches over to another, but processing will continue. , you must first define what 24x7 means to your organization. Answer the following questions to determine your 24x7 needs: 1. Can you assign a dollar amount for application downtime? Consider direct and indirect costs Indirect costs are costs that are not directly accountable to a particular function or product; these are fixed costs. Indirect costs include taxes, administration, personnel and security costs. See also
2. Do all application components/processes share the same level of importance? Different parts of the application often bear different levels of mission-criticality. For your company, data entry may be available only between 8:00 am. and 5:00 p.m., or a particular batch job Same as batch program. may need to be completed before 7:00 a.m. Such requirements may mean that although the criticality of the entire application does not warrant the substantial cost associated with higher availability, some investment should be made to minimize the potential of certain failures to occur during certain time frames. For example, with 2,400 data entry clerks A data entry clerk is a member of staff who reads hand-written or printed records and types them into a computer. They are sometimes employed on a temporary basis, but most large companies which have large amounts of data will hire on a near-permanent basis. working from 8:00 to 5:00, It is essential that the application does not experience more than two minutes of downtime during these hours. However, after 6:00 p.m., the criticality of this data entry application diminishes until the next morning. 3. is the availability of the application more important than data consistency Data consistency summarizes the validity, accuracy, usability and integrity of related data between applications and across the IT enterprise. This ensures that each user observes a consistent view of the data, including visible changes made by the user's own transactions and ? In financial applications, data integrity is paramount. Under no circumstances should a committed transaction be lost, even if it means more downtime for recovery. In many order entry systems, conversely con·verse 1 intr.v. con·versed, con·vers·ing, con·vers·es 1. To engage in a spoken exchange of thoughts, ideas, or feelings; talk. See Synonyms at speak. 2. , it is more important that the application remain available at all times even if a few orders are lost. This is especially true for e-commerce style applications. For many online shopping applications, it is essential to have the system accept new orders at all times. If the system goes down, another system must take over-even if it means that several transactions may be lost. The cost of downtime due to recovery is greater than the cost of lost transactions. 4. When can you do application and database upgrades? Determine "quiet times" in which availability of the system is not critical. Is it long enough to accommodate an upgrade? How often is that quiet period of time--nightly? Every weekend? Monthly? On national holidays? A different solution may be required for systems that are maintained at night versus systems that do not have a reprieve reprieve (rĭprēv`): in law, see pardon. . If your systems do not have quiet times, there should be some administrative time allocated. Performance's Impact on Availability Downtime is not the only concern when considering high availability--for many organizations, stringent processing needs force IT leaders to consider performance. On one hand, online users require good response time from many short transactions. On the other hand, large batch jobs (e.g., reports and complex extracts) entail high throughput of a handful of very large transactions. These conflicting needs cause response time to fluctuate, decreasing the reliability and availability of the application. This is especially true with applications that provide services directly to end users and consumers, such as an e-commerce application. Redundancy is the Key to Availability The logical solution for increased availability is to maintain the data in more than one place. This enables high availability and one of the best techniques for improving application response time--separating batch reporting and extract processing from the OLTP (OnLine Transaction Processing) See transaction processing and OLCP. OLTP - On-Line Transaction Processing processing. The criteria for a comprehensive high availability and high performance solution include: * Minimal impact on the availability and performance of the primary system * Full copy of the primary database--primarily for reporting and extracts--that is accessible even when there is no emergency * The copy of the database should be an up-to-date image of the primary database * Capacity to become the primary database (fail-over) in case of disaster * Failover to the secondary database should be very fast, without data loss * After the disaster, the solution should enable switching back to the primary system * Ability to modify some aspects of the copy database to accommodate the different processing on it and the ability to reverse them when a failover occurs, e.g., construction of special indexes to support reporting needs * The copy will not require its own database administration in addition to the administration of the primary system * Redundancy in CPU CPU in full central processing unit Principal component of a digital computer, composed of a control unit, an instruction-decoding unit, and an arithmetic-logic unit. as well as in the database * Remote location of the secondary system Range of Common Solutions There is a wide range of solutions to the high availability problem. The most common methods are: * Local disk mirroring and/or RAID--This solution provides protection against many disk-related failures, but the mirror is usually not breakable under normal circumstances. Once broken, the mirror becomes stale stale horseman's term for the act of urination by a horse. relative to the copy that is still operational. To resync (or re-silver), many disk mirror solutions perform a complete copy of the data from the operational copy to the stale copy. If the database is large, this process can take a very long time. Other disk mirroring techniques such as those provided by EMC (1) (EMC Corporation, Hopkinton, MA, www.emc.com) The leading supplier of storage products for midrange computers and mainframes. Founded in 1979 by Richard J. Egan and Roger Marino, EMC has developed advanced storage and retrieval technologies for the world's largest companies. and Veritas provide for a delta refresh (1) To continuously charge a device that cannot hold its content. CRTs must be refreshed, because the phosphors hold their glow for only a few milliseconds. Dynamic RAM chips require refreshing to maintain their charged bit patterns. See vertical scan frequency and redraw. , which is much faster. Local disk mirroring does not provide resolution for a local disaster. It also lacks protection against physical block corruption by Oracle or accidental loss of data due to a DBA error (such as dropping or truncating a production table). * Oracle standby database--This solution provides some protection against catastrophe that makes the primary database unavailable, however, Oracle's standby database has some shortcomings A shortcoming is a character flaw. Shortcomings may also be:
* Local clustering--Local clustering is a hardware solution that enables multiple computers to share a set of disks. Applications on these computers can freely migrate between the machines in the clusters using a technology known as "floating IP addresses." Unfortunately, the Oracle database relies on persistent memory persistent memory - non-volatile storage structures, so when a switch happens, the database has to be brought down and restarted. This solution provides good protection against most common failures. However, since there is only one copy of the database, there should still be consideration for protection of the disks. Moreover, since there is only one copy of the database, any physical block corruption or accidental dropping of a database object will cause the application to fail. Finally, with a local cluster, there are no provisions for performance improvement by any load sharing Distributing the workload between two or more computers. See load balancing. . Remote disk mirroring--Two types of remote disk mirroring exist: synchronous Refers to events that are synchronized, or coordinated, in time. For example, the interval between transmitting A and B is the same as between B and C, and completing the current operation before the next one is started are considered synchronous operations. Contrast with asynchronous. and asynchronous Refers to events that are not synchronized, or coordinated, in time. The following are considered asynchronous operations. The interval between transmitting A and B is not the same as between B and C. The ability to initiate a transmission at either end. . With asynchronous mirroring, the primary system does not wait for the data to be "commited" to the remote disk. For Oracle databases, however, asynchronous mirroring enables structural corruption in the mirrored database that would prevent a DBA from opening the remote database. For this reason, most remote mirroring implementations use the synchronous method, Wherein where·in adv. In what way; how: Wherein have we sinned? conj. 1. In which location; where: the country wherein those people live. 2. the application waits for the data to be committed to both the local and the remote disk. To prevent slowing the primary system, however, this method requires a wide bandwidth between the source and destination. Most sites use the remote disk mirroring from EMC with one or more T3 lines. Replication--Replication provides a live remote database both to reduce the workload of the primary system and for fail-over when a disaster happens. However, Oracle replication is resource-intensive, and has a substantial impact to the primary system. Moreover, because of the many limitations imposed by Oracle replication, it cannot successfully replicate many of today's applications--particularly, large ERP sites. SharePlex for Oracle from Quest Software The computer-software manufacturer Quest Software (Quest Software, Inc.) (NASDAQ: QSFT), headquartered in Aliso Viejo, California, dates from 1987. Quest develops, sells, and supports database management, Windows management, and application management software products overcomes these problems. SharePlex is a comprehensive and efficient solution that can support replication for most ERP sites. The live database on the remote site does require database administration, and application of patches to the application is not straightforward. Local clustering with Oracle Parallel Server--Oracle Parallel Server (OPS Ops (ŏps), in Roman religion, goddess of harvests. She was the wife of Saturn, by whom she bore Jupiter and Juno. At her festivals, the Opiconsivia and the Opalia, held in August and December, respectively, she was worshiped as a goddess of sowing ) offers another alternative for high availability systems. Using this facility, many instances of Oracle running on different hardware can access the same database on shared disks. This permits the hardware that would be allocated for a standby system to be actively used in production. Concurrent access The ability to gain admittance to a system or component by more than one user or process. For example, concurrent access to a computer means multiple users are interacting with the system simultaneously. to data by the different instances is managed by the Distributed Lock Manager A distributed lock manager (DLM) provides distributed applications with a means to synchronize their accesses to shared resources. DLMs have been used as the foundation for several successful clustered file systems, in which the machines in a cluster can use each other's (DLM See ILM. DLM - Distributed Lock Manager on distributed VMS systems. ), an application that assures that an instance always accesses a consistent block from the database. The DLM causes the instance holding a dirty data block to flush it to disk, so the reading instance can access a clean copy. The difficulty in using OPS for highly available solutions is that the application needs to be designed so that transferring blocks between instances (pinging See ping. pinging - ping ) is minimized. If not, application performance can be severely degraded de·grad·ed adj. 1. Reduced in rank, dignity, or esteem. 2. Having been corrupted or depraved. 3. Having been reduced in quality or value. . Also, with OPS, there is only one copy of the database that is not protected from disk failures, block corruption or human errors such as accidental table drops. The Integrated High Availability and High Performance Solution From the brief description of the high availability options shown above, it is clear that there is no one solution that can support all the requirements put forth above. Through a combination of hardware and software, EMC provides the disaster recovery components: Symmetrix disk drives on the production (some) and disaster recovery (target) systems with SRDF SRDF Symmetrix Remote Data Facility SRDF Symmetric Remote Data Facility running between these disks. A third mirror is created via EMC's TimeFindet software, which is maintained as an accessible, up-to-date reporting instance by Quest Software's SharePlex. Quest Software's SharePlex mininizes the impact on the source system, the source instance and the network. This enables businesses to eliminate report processing overhead without replacing it with replication overhead. SharePlex includes: SharePlex for Oracle--provides high-performance Oracle replication that maintains a continually updated, accessible target instance SharePlex reconcile option--enables SharePlex to continue optimally after an EMC refresh The Combined Solution EMC's SRDF performs disk-level replication of the entire environment from the source system to the target system. In synchronous mode See synchronous, SCSI synchronous mode and synchronization. , no change is made to the source system that isn't also made to the target, guaranteeing that the target system's second mirror is an exact replica of the production server. If the production server fails, a secondary system is available with the same data, and the disk drives need only be mounted in order for business to continue despite the outage out·age n. 1. A quantity or portion of something lacking after delivery or storage. 2. A temporary suspension of operation, especially of electric power. of the primary server. EMC's TimeFinder creates the Oracle database for the "third mirror," quickly generating the copy that SharePlex then maintains by replicating changes to selected tables and sequences from the source system to the third mirror. In addition to greatly reducing the time required to create the initial image for the third mirror, TimeFinder refreshes the environment surrounding the Oracle tables and sequences periodically. TimeFinder facilitates the propagation The transmission (spreading) of signals from one place to another. of DDL (1) (Data Description Language) A language used to define data and their relationships to other data. It is used to create the data structure in a database. Major database management systems (DBMSs) use a SQL data description language. changes, changes to stored procedures In a database management system (DBMS), it is an SQL program that is stored in the database which is executed by calling it directly from the client or from a database trigger. When the SQL procedure is stored in the database, it does not have to be replicated in each client. , and changes to applications, refreshing the image of the third mirror based on the image of the second mirror (which exactly reflects the primary system). Benefits of the Proposed Solution In the standard scenario in which the Reconcile Option is used, a customer has Symmetrix disk drives on the source and target systems, with EMC's SRDF between these disks. A third mirror is created via EMC's TimeFinder, which is maintained as an accessible, up-to-date reporting instance by Quest Software's SharePlex. This combination affords many benefits: Easy initialization in·i·tial·ize tr.v. in·i·tial·ized, in·i·tial·iz·ing, in·i·tial·iz·es Computer Science 1. To set (a starting value of a variable). 2. To prepare (a computer or a printer) for use; boot. 3. for 24x7 shops: SharePlex requires two matching copies of data with which to start replication. SRDF and TimeFinder, in conjunction with SharePlex, make this requirement easy to fulfill because SRDF and TimeFinder can create an initial replica instance. SharePlex's integration module for EMC will reconcile that instance with the information contained within SharePlex's queue files. Reduced disaster recovery time: Since SharePlex maintains an available standby instance, if a disaster occurs on the source system, this standby instance can be used until the application can be restarted on the disaster recovery copy maintained by SRDF. This can reduce the application downtime significantly. Provides more disaster recovery protection: SRDF, like every other mirroring solution, is prone to block corruption errors and to human errors such as accidentally dropping a production table. Since the replica maintained by SharePlex is a logical replica of the database, it provides a protection against both block corruption (known as Oracle Error 600) and accidental erroneous DDL. Fast, easy migration without downtime: SharePlex can replicate between different versions of Oracle. When you need to upgrade an Oracle version, you can perform the upgrade in a secondary system. SharePlex keeps the data current on the upgraded database. Once the upgrade has been fully tested, you can use EMC's fast refresh capability to upgrade your production system to the new version. SharePlex miniizes downtime to the production system Flexible configurations with WAN support: SharePlex can replicate between instances on the same system, to instances on a local area network, or to remote instances through a wide area network. Additionally, through a cascading scenario, SharePlex can replicate from me system to several and from those onto further systems. With this type of configuration, a global company could replicate from New York New York, state, United States New York, Middle Atlantic state of the United States. It is bordered by Vermont, Massachusetts, Connecticut, and the Atlantic Ocean (E), New Jersey and Pennsylvania (S), Lakes Erie and Ontario and the Canadian province of directly to Boston, D.C., Atlanta, and London, and then have the London office replicate to Madrid, Paris, Rome, Brussels, and Berlin, limiting the traffic across the ocean while keeping all remote offices up to date, replicate between instances on the same system, to instances on a local area network, or to remote instances through a wide area network. Additionally, through a cascading scenario, SharePlex can replicate from me system to several and from those onto further systems. With this type of configuration, a global company could replicate from New York directly to Boston, D.C., Atlanta, and London, and then have the London office replicate to Madrid, Paris, Rome, Brussels, and Berlin, limiting the traffic across the ocean while keeping all remote offices up to date. |
|
||||||||||||||||||||

Printer friendly
Cite/link
Email
Feedback
Reader Opinion