2005 storage year in review.
Hardly surprising. These pressures are threatening the existence and recovery of fast-growing data volumes. Stored data used to grow at the rate of 30 to 50 percent per year, and is now reaching levels of 60 to 80 percent per year. Some industries are experiencing 100 percent growth. Storage hardware costs have fallen but storage management costs are rising sharply.
Given a storage market valued at $50 billion, how did storage vendors react in 2005? The advances ran the gamut between established technologies and newer technologies just coming into their own. Some of the most important and interesting of these developing technologies include backup and recovery, archiving, tiered storage, storage networking, interconnects, CDP, NAS, iSCSI SANs, virtualization, security and encryption, and virtual tape.
Backup and recovery
Backup and recovery have been with us since the beginning of the written word. Once a piece of data is recorded--be it the Code of Hammurabi, a Shakespeare folio, or critical client records--how do you protect it so it can be recovered and consulted as needed? The Code and Shakespearean folios are well protected and viewed in climate-controlled museums, but the threat of loss is imminent for electronic data.
The threat runs through all sizes of business. Enterprise and mid-market are particularly affected because of their volume of data and pressures of compliance, governance, and litigation pressures. But SMB is hardly immune--their business survival can depend on dependable and consistent backup and recovery operations.
Unstructured data is complicating the issue. Structured data volumes are growing, but the number of emails and unstructured files are exploding. Meanwhile, backup windows are shrinking to nothing while customer service demands and expectations are strong and getting stronger. All of these pressures strongly impact backup and recovery--with an emphasis on recovery. A major trend in 2005 was the growing reliance and demand on recovery. Backup doesn't go away, but the only real reason to backup is being able to get the data back again. This means being able to search more powerfully and being able to restore much faster, in response to data loss, increased regulation and legal discovery.
This means disk, since restoring individual files and email from backup tapes is extremely difficult. Tape was never meant to be a long-term searchable repository. It's good for long-term archiving, but tape is meant to make volume-level restores for events like a downed server.
Networked attached storage (NAS) is more common than ever and so is the need to manage it. Users need increasingly large capacity in their NAS structure. They're experiencing an average of 40% growth every year in data stored on NAS. Enterprises in particular suffer without good NAS management, with hundreds of terabytes of NAS storage deployed and running mission critical applications. Smaller environments too like architecture and design firms may not have a large staff but they have large stores of active data, and are less likely to have SANs to share the load than the enterprise.
Users are also demanding ways to manage and automate NAS and its costs, including provisioning data, managing it throughout its lifecycle, and controlling both capital and ongoing costs. Data mobility is particularly challenging in view of provisioning and storage management needs. Enabling technologies like virtualization are crucial in this environment, where data management should be automated and transparent. Virtualization layers allow businesses to automatically balance filer provisioning, and to boost utilization by balancing and leveraging capacity.
NAS vendor development plans include expanding NAS management benefits globally, improving efficiencies with NAS-based backup, and optimally locating data. Users are also involved in tiering NAS storage, most commonly done today in two tiers, with an occasional three-tier structure. In these cases, the second tier is usually cost-effective online storage like SATA or SAS, and the third tier is an archiving medium such as tape or CAS (content-addressable storage).
Archiving remains a separate technology from backup, although there is still some confusion in the marketplace and it is possible to use backup as an archiving engine. Archiving involves making conscious decisions about data priority at changing points in its lifecycle. Businesses are making a conscious effort to treat data for what it is--the information that drives the business.
Jim Geis, Director of Storage Solutions at Forsythe said about the challenges of managing information, "I tend to separate storage and information management. Storage is basically independent of the storage management tools. There are so many alternatives and options in how you can access your information, and on top of that you have to keep it for a gazillion years. Like HIPAA, they're supposed to keep records until two years after the patient's death! That may take a while." In this challenging environment, it's vital that businesses archive data for compliance, legal protection, and information value.
The move to intelligent archiving requires use of disk in archiving schemes. Classic off-site tape can still work well for long-term backup and disaster recovery, but tape was never designed for granular searching and recovery. Businesses can use disk in long-term archives where it can be categorized and indexed for data recovery operations. Along with disk-based data replication to different locations, business can protect its information better than ever before.
Glenn Groshans, a Director of Product Marketing at Symantec, peered into his crystal ball. "What we see over the long haul is that backup and archiving are merging so you have a single policy for data protection over time. It is true that today they're separate applications for separate things, but ultimately they'll be in one storage management policy including replication, SRM, backup and recovery, and archiving. These are four separate operations now, but they will be singly managed."
Security and encryption
In an increasingly connected world, data security grows even more important. Physical loss is still an issue with tapes in transit: just ask Bank of America what it thinks about that. And physical and digital theft also looms large. Given the threats to sensitive information, security and encryption are hot topics in protecting storage.
Security is becoming synonymous with encryption. Many businesses still use password protection, but this method does not provide nearly the amount of security companies need. The technology to recover passwords is easily available and very simple--which means it's equally easy to crack the passwords. They key to security in this case is encrypting data so it cannot be read without the corresponding key.
Presently, the highest official level of encryption in the U.S. is the Advanced Encryption Standard. AES-128 is specified in a government document called the Federal Information Processing Standards publication (FIPS), and FIPS 197 was adopted by the National Institute of Standards and Technology (NIST) in 2001. There are even stronger levels of encryption in development, which should keep pace with development centered on hacking 128 bit encryption. As yet there is no technology that can break 128 bits (in less than a year, anyway), although it is possible to break poor implementations of the standard.
Virtualization and lifecycle management
The market has been mouthing off about virtualization for about half a decade, but the technology is finally reaching maturity and general acceptance. Virtualization technologies can help solve significant business problems like increasing performance, improving recovery time, decreasing cost of backup, simplifying storage pool operations and cutting the time needed to failover.
For example, NAS virtualization rates are doubling every year as administrators work to provision multiple filers. Without virtualization, some NAS filers stand at 12% and others at 80% with no good way to share capacity. And administrators commonly over-provision active filers to ensure enough capacity. Virtualization now provides storage pools and the means to automatically provision and balance files between networked NAS devices.
There are still mindset challenges around the technology. Administrators have been conditioned to not move data, and migration and consolidation are traditionally huge events requiring a big battle plan. Virtualization allows administrators to eliminate that complexity and scope in favor of painless daily migration. This also enables operations like real time capacity management and data tiering, and virtualization also has the advantage of working relatively well in a heterogeneous setting. The virtualization movement is cutting across industries with large capacity and migration needs, including manufacturing, high performance computing environments, healthcare, and financial services.
Development continues on relocating open and active files, and at present virtualization works best on closed files. This makes virtualization technologies ideal for tiered storage schemes and automating lifecycle-based migration. Many administrators have all but given up on manually managing data throughout its lifecycle because the operation is time demanding, resource intensive and never-ending. Technologies like virtualization lift the manual burden, and truly enable portions of ILM in storage environments. Claus Mikkelsen, Chief Scientist at Hitachi Data Systems remarked, "Administrators will ultimately rely on better storage management products, because the manual effort is not worth it. People trust their storage management tools more because they have to, and because the tools are better."
This is a good thing because businesses everywhere face massive and accelerating storage growth even in the short-term. For example, many businesses keep four to six copies of given files, and sometimes more. There might be one copy in primary storage, another in a replicated environment, a third in an online archive, a fourth in a tape archive and a fifth in CAS. And that only includes replicated and archived data, not email or attachments that are commonly distributed multiple times throughout the corporation. Managing all of these copies is a real challenge, but must be done to manage storage growth and data recovery.
Virtualization at both the block and file level remains a prime enabling technology in corporate-wide data management. More and more vendors are supporting both within the same array, which supports the philosophy of multiple storage targets as a pooled repository.
The concept of ILM continues to see traction in the marketplace, not yet with a full implementation but with individual tools. One such tool is the ability to accomplish mass nearline storage using cost-effective SATA. Nearline disk-based storage offers high capacity storage targets for online archiving and backup.
The advent of serial attached SCSI is also an important trend in this space, since SAS increases performance while remaining cost-effective. SAS impacts both the connectivity and drive sides of storage, and can rival traditional and expensive Fibre Channel drives for HPC environments. Serial ATA drives come close to Fibre Channel speeds with a much lower price point, making SAS ideal for high-end applications with large capacity needs. The emergence of SAS technology will continue to be a major trend in 2006.
This type of nearline disk is ideal for solving challenges typically associated with magnetic tape technology. Tape in and of itself does not have random access or fast access, and cannot achieve the same high availability as online disk can. And with prices dropping on disk, this type of disk-based online architecture can provide high capacity and availability at a low cost per gigabyte.
The next step is more challenging. With capacity and performance increasing at a low cost, it's now possible to observe data lifecycles by aligning data to optimal storage targets. Looking at applications, age, and usefulness, how valuable is given data to the business at specific points in time? And knowing that priority, what is the right price and where is the right place to store the data? As you might expect, this type of prioritizing is a challenge. Given the need to manage large and growing data volumes, users are increasingly asking for comprehensive approaches to data prioritization and lifecycle management. Point products already exist, but users are looking forward to seeing continuing development in enterprise-wide, heterogeneous lifecycle management tools. Over time, the real management gap and the true cost of storage is how business manages their information in terms of their resources.
Xyratex's Lisa Hart, VP of Marketing said, "People talk about ILM or tiered storage. It's here today, but we're at the very beginning of it. The challenge going forward in customer implementations and vendor delivery is being able to efficiently manage the implementation that we can do today."
Fibre Channel SANs remain the technology of choice in the data center, but iSCSI SANs undeniably have their place. The technology is scaling and growing rapidly, with IDC forecasting a compound deployment rate of 150% a year.
It hasn't been easy. iSCSI vendors pictured themselves as the David to Fibre Channel's Goliath. David hasn't slung any fatal stones--Fibre Channel SANs remain very strong. But as of 2005, iSCSI SANs began selling feverishly into marketplaces including the upper tier of the mid-market. The primary driver for iSCSI SAN acceptance and adoption is around its familiar topology and simplicity. Mid-market businesses often do not have IT staff dedicated to managing storage. Simplicity is a key driver for them, and systems must be easy to deploy and run. The enterprise side is also adopting iSCSI SANs, but more for its advanced features and capabilities. For instance, the enterprise might deploy a series of iSCSI SANs as links in a campus SAN, deploying iSCSI so they can use their network nodes over existing infrastructure. In their case simplicity may be a side benefit but high availability and scalability, along with leveraging value, are their top concerns. In both markets, cost-effectiveness is also a plus since iSCSI uses commoditized components like Ethernet. They are also increasingly powerful and scalable, and can be easier to operate in heterogeneous environments than Fibre Channel might be.
The future of iSCSI SANs holds the promise of high networking speeds propelled by advent of 10Gb Ethernet. This will bring iSCSI to a performance level where it can serve high-end applications and intensive computing environments. 10Gb Ethernet is not necessary to build high-speed SANs for the mid-market, but the enterprise will make use of that speed threshold.
Although 10Gb Ethernet speeds will outpace existing 4Gb Fibre Channel, 4Gb FC has seen a very strong uptake in a short period of time. Fast Fibre Channel remains extremely well suited to high performance computing environments that need high bandwidth capabilities and rock solid performance. Verticals where the product does well include oil and gas exploration, post-production processing for movies, and other high-end applications.
InfiniBand came on strong in 2005 ... at last. According to Mitch Seigel, Senior Director of Corporate Communications at LSI Logic, "Around 2000, InfiniBand was going to be the Next Big Thing, the successor to Fibre Channel. But then InfiniBand got hit hard by the dotcom failures. Lots of innovative development seemed to stop as customers decided they didn't really need it. But development did continue behind the scenes because it was great for supercomputing connections. Now in 2005, InfiniBand is reemerging because these HPC sites want InfiniBand connections, preferably a full InfiniBand infrastructure."
Several factors contribute to InfiniBand's renewed promise:
* InfiniBand has lasting traction in HPC.
* InfiniBand's low latency is excellent for clustering. This characteristic makes it ideal for grid computing and blade processing environments.
* Native InfiniBand technology for storage undergirds demanding storage markets like commercial data centers.
Virtual tape libraries (VTL)
Disk-based backup and archiving is undoubtedly important to data protection, but many applications and routines still require the presence of tape. Meeting this challenge is virtual tape, which appears as tape to an application but with similar advantages to disk. Virtual tape has distinct advantages in favor of physical tape, including decreased performance overhead since no one needs to change those tape cartridges. Integrity and reliability are also improved since virtual tape is much more tolerant to errors than physical tape. In environments that are maintaining traditional backups and backup windows, virtual tape shrinks backup windows back down to manageable levels.
VTLs can also cost-effectively fill the CDP (continuous data protection) bill, allowing administrators to use familiar tape technologies and backup routines while deploying continuous data protection. In 2005, the virtual tape market moved from an evangelist sell to a mainstream product as many enterprises adopted VTLs to meet SLAs, backup window requirements and compliance regulations. Since VTLs are a non-disruptive technology, businesses can replace their physical tape library with a VTL and experience significantly greater backup and recovery speeds plus the ability to keep tape online. VTLs enable the enterprise--especially verticals like financial services and healthcare--to retain familiar tape storage environments while increasing speed, performance and capacity. It is also easier to search and restore from online virtual tapes at the granular level.
Development trends in VTLs include a continuing need to increase capacity without crossing an upper cost limit. A good deal of capacity development is centering around content compression techniques in order to retain same footprint at around the same cost. VTL vendors are also working to refine search and recovery capabilities, which are driven by discovery costs and the need to restore individual files in case of inadvertent deletion. Added value is the name of the VTL game.
Continuous Data Protection (CDP)
Continuous Data Protection continuously intercepts and records all changed data, essentially serving as a never-ending backup. Because the process is (you guessed it) continuous, administrators are not confined to recovering from previously scheduled points in time. This offers very flexible Recovery Point Objectives (RPO) and faster Recovery Time Objectives (RTO). The enterprise has had similar types of operations for years with online replication and snapshots, but the concept of online data protection is largely new to the mid-market and its reliance on MS Windows.
Lacking the enterprise's heavy investment in online data protection products, the mid-market is happily adopting Windows-based CDP products for their mission-critical and high capacity environments. Cost-effective CDP handles unstructured files and email, which makes it very appropriate for mid-market Windows email environments. It is these environments that are a huge pain point for the mid-market, which struggles to manage huge and growing email data stores. Technologies like CDP allow these companies to efficiently back up very large data volumes without requiring a backup window or large amounts of processing power. CDP can also enable simple tiered storage schemes, allowing administrators to keep an active email archive online on SATA disk with an off-site copy of the archive.
2005 ... and counting
HDS' Mikkelsen summed up the past year in storage. "2005 is notable for a lot of maturing technologies. One thing we're always trying to do in the storage industry is to remove obstacles for customers to buy our products. We addressed complexity issues, document growth and virtualization. The next big challenge is to address the issues around lifecycle management. A lot of the issues have been solved, what needs to be addressed is how you manage information at the application level. At the storage level we're agnostic, that's just bits and bytes. But if you look at migration requirements, there is still potential for an ISP software to drive those products.
"Intelligence should be at the edge."
|Printer friendly Cite/link Email Feedback|
|Publication:||Computer Technology Review|
|Date:||Jan 1, 2006|
|Previous Article:||Next generation storage for next generation studios; Case study: BlueArc and the fantastic four.|
|Next Article:||Mainframe lessons: slowly moving to non-mainframe systems.|
|STATEMENT OF INTENT.|
|SUN'S STORAGE CAPTURES #1 SPOT IN UNIX STORAGE MARKET.|
|Where knowledge is power. (Storage as I See It).|
|Bonding Elastomers: A Review of Adhesives and Processes.|