File sharing over the WAN. (Storage Networking).
These global organizations now wish to implement wide area network (WAN) based storage consolidation, to reduce storage resource management costs while also enabling remote employees to collaborate on projects just as if they were all on a local network. Given the current state of technology, however, enterprises are forced to limit the deployment of real time file sharing and networks to within each LAN, resulting in disconnected islands of file sharing throughout the enterprise.
Corporations have tried to fix this problem in various ways. For large files or large groups of files, companies may file transfer protocol (FTP) the files overnight. For ad hoc collaboration between employees, people might attach files to email messages. While these solutions are marginally acceptable, they each suffer from the same problem. The file sharing is not in real time with everyone working off the same version of the data. Rather than sharing a single copy among many users, these methods duplicate files and propagate private copies to each user. The moment a file is emailed or FTP'ed, two or more potentially out of sync copies are created. The dissimilar versions need to be manually reconciled, usually after every set of revisions by the people working on the project.
The pain users feel sharing files over the WAN causes frustration and reduces productivity. In response, users have developed elaborate workaround solutions. The waste in end user time is only one of the costs. As increasingly divergent versions of employee work product are saved in storage resources throughout the corporation, a storage resource management problem is created for the IT staff. Not only is the overall amount of storage managed greater than necessary, it also creates disaster recovery, backup and restore difficulties. These issues result in greater capital outlay and put pressure on the bottom line.
These approaches have been used because there has been no better way to accomplish the task. A better way would be to use the same protocols and methods over the wide area as the local. This would allow people and applications to work together as they always have, coherently and in real time.
Network File System (NFS) and common Internet file systems (CIFS) are network file sharing protocols used by Unix and Microsoft Windows operating systems, respectively. However experiments in extending NFS and CIFS over the wide area have met with great frustration.
These traditional file-sharing protocols were designed to share files over a local network, but perform poorly and prove unreliable in a WAN setting. Reading and writing files using standard file sharing protocols over WANs is orders of magnitude slower than over LANs. While this difference is dramatic, it is also a best-case scenario. In the real world, wide area networks are far less reliable than local area networks. The response times--both for opening or writing files--is, invariably, unacceptably slow.
That is because these protocols are "chatty:" the client and server frequently send messages back and forth to stay synchronized. In addition, these messages must receive an answer before the next message is sent, further slowing the response tune. There is a common misconception that adding bandwidth to the WAN will solve this problem. Bandwidth is only one variable, and not the most significant. The large delay in sharing files over long distances is actually caused by the speed of light coupled with the synchronous nature of the protocols.
The limitation imposed by the speed of light (plus the additional delays introduced by switching equipment) causes latency. In other words, it's not the size of the "pipe" that matters; it's how long the trip takes that creates the delay. For example, while one can fit more people in a 767 jumbo-jet than in a Gulfstream, both will still take the same amount of time to cross the country.
Remote file sharing protocols like CIFS and NFS are especially sensitive to latency. Each message incurs a latency penalty because each message causes the protocol to wait for a response. Although the latency penalty for each individual message and response can be small for WAN circuits (20-30ms for local Campus, 120ms for NYC to UK access), the number of messages and responses made during the reading or writing, of even a small file, add up to make the total delay quite large.
For example, saving an edited 2MB MS Word document normally takes only four seconds over a LAN. The same operation could suffer a latency penalty of more than two minutes over a very fast 1MB WAN circuit that has an 80ms delay--a delay that most users would consider to be unacceptable.
Now there is new technology that allows read/write file sharing over the WAN at near LAN speeds. In order to be useful, this technology must work with existing network infrastructure. Since users share files directly using NFS or CIFS or use programs that run over NFS or CIFS, any new technology that seeks to improve file sharing must interoperate with these standards.
There are several technologies that need to be integrated to accomplish this functionality. A WAN-optimized protocol must be used over the WAN segment. A protocol translator at each site is required to convert NFS or CIFS operations for WAN transport, and back again, at the remote site. These new NFS- and CIFS-compatible protocols include features such as streaming, bidirectional distributed differencing, compression, encryption and file caching.
Streaming protocols are well known for their ability to eliminate most of the message passing delays that slow file transfers. They are the key to countering latency penalties. Bidirectional distributed differencing minimizes bandwidth monopolizing file transfers by only transmitting the differences between file versions. Programs such as Microsoft Word, Excel, PowerPoint, and CAD/CAM design programs, graphic and multimedia editors rewrite the entire file to disk again on any change. The protocol must be capable of intercepting the data written before it is transferred over the WAN and extracting the differences. These are then sent over the WAN and applied to the file on the other end in a way that is transparent to users and applications.
Compression is another way to reduce bandwidth consumption and increase file transfer speed. Advanced protocols are file-type-aware and thus take no action on those files that have already been compressed (attempting to compress these files further would actually increase the file size and thus file transfer time). Because WANs often use the public Internet for transport, building optional encryption into the protocol is useful.
File caching allows file reads to be performed at LAN speeds on all file accesses after the first. Much like Web caching technologies of content delivery networks, once a file is transferred from the remote file server to the local cache, it resides locally, ready to be served to users instantly. This caching functionality requires storage at the remote site. Cache storage does not need storage management and can be implemented as an appliance.
Because of the inherent unreliability of WANs, this protocol must also behave well when the network fails. It might also include a distributed lock manager and write-back caching for higher performance. Write-back caching (also used in disk drives and RAID controllers) provides the same benefit for writes as read caching (described above) does for reads. It allows all writes to approach native LAN speeds, many times an order of magnitude faster than writes without this feature.
Because this protocol tunnels NFS and CIFS over the WAN, a protocol translator storage cache is required at both ends. This translator "speaks" the WAN--optimized protocol over the WAN, but NFS and CIFS to fileservers and client computers. No software load or other infrastructure changes are needed to realize these WAN file-sharing benefits.
Storage-caching protocols enable coherent, consistent and high-performance wide area access to consolidated NAS and fileserver resources, delivering LAN speed access to data over the WAN. Enterprises can now implement WAN-based storage consolidation to reduce storage resource management costs while also enabling remote employees to collaborate on projects just as if they were all on a local network.
Trevor Hughes is vice president of product management at Tacit Networks (South Plainfield, N.J.)
|Printer friendly Cite/link Email Feedback|
|Publication:||Computer Technology Review|
|Date:||Mar 1, 2003|
|Previous Article:||Anywhere computing opens endless possibilities ... possibilities to data loss. (Storage Networking).|
|Next Article:||Disk vs. tape: disk will win over time. (Technology Arena Disk vs. Tape).|