Data preservation vs. collection: How much is too much?
It's easy to understand why companies are so worried about preserving their data these days. Sanctions for incorrect data preservation and failure to preserve have skyrocketed in the past three years. As the number of regulatory and internal investigations continues to climb, the realm of relevant data that falls under preservation expands in tandem.
Companies see these pressures and preserve everything in sight, erring on the side of over-inclusivity and caution. Swayed by the impact and perception of federal rule amendments on collection, they decide they must also collect it all. The result, as we all know, is a massive amount of data that is extremely expensive to host, process and review.
It doesn't have to be this way. The mandate to preserve does not necessarily translate into a mandate to collect and process everything. There shouldn't be a one-size-fits-all approach. The key is to implement a program centered on smart, targeted collection and to build around the details and needs of each case. Rather than simply collect an unintelligible, heaping mass of data and facts, you should try to build a curated collection that is logical, manageable and resembles something real and useful.
We need to take a step back, consider the facts, and harness knowledge within our organizations, as well as analytical tools, to collect and review in a smarter, more efficient way. Here are a few things to keep in mind as you build your plan:
Trust your employees
Trust your employees and include them in the data identification process; they will often know where the data is and can point you in the right direction. If you have a reason to doubt them or if you think they're hiding something, then by all means -- collect everything. Otherwise, talk to your custodians with outside counsel, ask targeted questions, trust and collaborate. If efficiency is what you're after, then this is a no-brainer -- an organized custodian can get you to the facts and necessary data faster than if you were to just simply search through their whole mailbox or hard drive.
This doesn't mean self-collection. Inside counsel and outside counsel need to be key participants and help direct the identification process. Use a script and a checklist to be sure you look in all of the places where data could be stored and to be consistent. Keep this checklist to document the discussion. Then coordinate the collection with IT or operation team members that can facilitate the proper collection of the identified files.
RELATED STORIES: Language-based analytics: An innovative tool for cutting e-discovery costs Proportionality and preservation: Getting the horse in front of the cart Experts discuss social media, Cyturf wars' and more in e-discovery webcast
Depending on the size and the facts of the case it may be prudent to take a hybrid approach and run additional searches against a particular custodian's mailbox in concert with identified folders and/or files. And, of course there may be fact patterns that would make it unwise to rely on a custodian to identify relevant documents. Every situation is different and requires thoughtful consideration in developing an efficient and defensible strategy.
Hone in on the right documents
When launching a review, do an initial search of your data (such as a sub-set of custodians) then check with counsel to see if what you are finding is really what you need. From here, start refining. Starting with a smaller pool means that when the time comes to search across the larger universe, you'll have a better idea of where you're going.
In a recent matter, for example, we needed to find a way to establish some baseline findings before examining information for a few hundred custodians. We collected the mailboxes for 10 percent of the custodians, analyzed their communications and extracted certain information from these. We used this knowledge to extrapolate the information across the entire mailbox population -- the remaining 90 percent -- and to use analytical tools in a specific, cutting way. There was no need to collect everything. By conducting this initial examination, we were able to devise a scheme by which to analyze the remaining custodians. We were able to selectively collect data from remaining mailboxes, using a combination of search terms and entity domains once we had identified who was talking to whom.
Exercise your judgment
Ultimately, you have to find a risk threshold that you're comfortable with. When do the costs of collection begin to outweigh the benefits? How do you deal with all the data that you have preserved? Where is that stopping point, that point at which the main benefits begin to decline?
Smart, cost appropriate data collection starts with understanding that organizations have not given up their autonomy in responding to discovery requests. Ultimately, an organization need only craft a defensible, transparent collection and review process based on information gained from its employees and institutional knowledge of its IT structure. Opposing parties may challenge your process for collection, but as long as all the data has been preserved so that spoliation is not an issue, it is a natural course for adverse parties to negotiate or even litigate discovery issues.
We should not be afraid to defend smart targeted collection of an overbroad preservation universe. We should embrace it as an efficient approach designed to get to the most important documents quickly and efficiently.