2007 year in review: reducing the challenges of overwhelming data growth.
Data de-duplication was a concept relatively unknown at the beginning of 2007, yet ended the year as a $100 million market of its own. Known also as single instant store and duplicate data reduction, de-duplication identifies and stores only unique data at the sub-file level. If a data string or chunk has already been stored in the system, it is referenced by a pointer rather than stored a second time, third, forth and nth time.
Reducing disk capacity needs by 20:1 or more is where de-duplication technology changes the game: all the benefits of disk-based backup and archive can be achieved with significantly less disk capacity and at a cost similar to tape. Also, backups and archives can be retained for longer periods of time at no additional cost to support ever stringent regulatory requirements and eDiscovery needs. By implementing a de-duplication solution, IT organizations can reduce backup windows, RTOs, and have quick and reliable access to archives when needed, while gaining the ability to cost-effectively replicate data offsite for disaster protection, even in bandwidth-constrained environments.
However, there was a lack of education and a lot of hype regarding de-duplication technology in 2007. Storage vendors hyped ever larger 'de-dupe' ratios in an effort to show technological superiority. The danger is that de-dupe ratios alone do not measure or even indicate the final benefits of the solution. IT managers need to look at the larger picture and also evaluate storage systems with de-duplication on protection, performance and scope.
Protection is important because the more effective your de-duplication, the greater the impact of data loss. RAID protection alone is insufficient with de-duped data. IT managers need to consider storage technologies with new methods of protection that provide higher levels of data resiliency without taxing the precious storage space gained from the de-duplication process.
Performance disadvantages caused by de-duplication can outweigh the storage benefits of de-duplication. Most storage products lose performance during the de-duplication process. Whether performance is helped or hurt by de-duplication is a factor of how de-duplication has been incorporated into the storage system. Technology varies and most vendors have added de-duplication as a 'bolted on' capability to existing products. IT should consider new storage solutions built from the ground, such as grid-based arrays, that have de-duplication technology that not only does not degrade performance but actually improves performance as the de-dupe ratio grows.
Finally, IT managers know that a storage system, regardless of de-duplication ratios, needs to be able to scale to address the ongoing data growth that will continue in 2008 and beyond. Most systems today only offer siloed de-duplication due to limited scalability. Massive scale-out capabilities achievable with grid-based storage systems not only simplify management but eliminate silos of de-duped data.
Karen Dutch is the general manager of Advanced Storage Products for NEC Corporation of America. www.necam.com
|Printer friendly Cite/link Email Feedback|
|Publication:||Computer Technology Review|
|Date:||Jan 1, 2008|
|Previous Article:||2007 year in review: mid-market storage management strategy trends.|
|Next Article:||2007 in review: scaling business analytics applications.|