Network shared drives: how to clean up files for better information management: his article offers recommendations about what an organization can do at the enterprise, workgroup, and personal level, both through process changes and technology, to prep and clean up their shared drives for better information management and for potential migration to an electronic content and records management (ECRM) system.
I recently went through the process of painting some rooms in my house. One of the things I had to do before painting was tape up all the doors and frames. This is one of the less satisfying household tasks I can think of since it takes time and, by itself, doesn't actually beautify the room.
It does, however, provide huge benefits: it gives a good perspective on the job at hand and any problems that might arise, and it results in straight edges and a lot less clean up. A similar lesson can be applied to the process of transforming an organization's shared drives; some prep work can add a lot of benefits.
Getting a Handle on Electronic Records
A number of large organizations are transforming shared drives--or cleaning up, organizing, and managing information located there--as part of their overall program of getting a handle on electronic records. This is often, but not always, done in anticipation of implementing an ECRM system.
Just as taping before painting pays off, content cleanup--leveraging classification tools, if possible--as preparation for the shared drive transformation can greatly simplify and make the outcome more accurate.
Shared drives frequently are thought of as the place where office computer files are stored for collaboration. However, based on this author's organization's experience with Fortune 500 clients, this type of usage represents only about 35-45% of the files found on shared drives for many organizations. The rest, which should not or cannot be migrated to an ECRM, comes from:
* Creating, downloading, and distributing software
* Putting convenience copies in multiple places
* Creating, sharing, and running databases
* Moving e-mails and .pot e-mail archives so they don't fill up the inbox
* Backing up a desktop when a new computer is installed or when an individual leaves the organization
* Developing and publishing web content to the organization (Technically, publishing web content should be done in an ECRM, but migrating this content from shared drives can be tricky and will likely break links.)
* Needing a temporary place to store such things as iTunes or company party photos
A proper information management strategy around shared drives is focused on segregating the different types of content so each can be managed more effectively. This can be done by:
* Putting databases, install files, and applications on dedicated servers separate from electronic records
* Putting the good content into structures that allows individuals to manage, classify, or purge it
* Getting rid of useless, non-business content so managing everything else becomes easier
All this can be done using the right tools and a solid approach for developing policies.
Using the Right Tools
Index and classification management (ICM) tools represent a fairly new category of tools that is available to help with the cleanup and categorization of unstructured content, which is any type of digital content that is not inside a database, from office documents and images to applications. These tools will inventory a network share and gather information from the file (e.g., dates and size), the file attributes (e.g., author), and file content (e.g., words, phrases, and numbers). All these pieces of data can be combined to categorize information in a way that can be acted upon.
ICM tools can support multiple taxonomy facets (e.g., retention category, content type, security, and status), which can be applied to content across the enterprise simultaneously. The actions can range from deleting and moving, to outputting migration scripts, to moving files and metadata into content repositories.
The quickest and most beneficial thing these tools do is provide insight into what content actually looks like. It is hard to size a content repository and plan migration unless an organization knows how much content it has.
Transforming shared drives requires much more than just installing an ICM tool, though, just as buying the paint brush is only one step in painting a room.
Preparing for Transformation--Enterprise Level
The first step in cleaning a shared drive is to make sure the right people are involved, as it requires input from a number of constituencies. All of these groups will benefit from an organized share drive, so they should all be willing to participate in defining requirements and policies for the project.
Those constituencies and their areas of focus include:
Information technology (IT)--IT policies and requirements often come from trying to meet the storage and information security needs of the organization. IT will have policies and requirements regarding whether employees can download applications. If so, decide if any applications can be deleted.
Human resources (HR)--HR can he very helpful in connecting content with business function by identifying who works where. This department is also involved with terminated employees and their content. Both of these areas are critical in developing a classification structure.
Litigation support--Legal is concerned that nothing gets deleted that the organization is required to preserve. If an organization's process for preservation is "in place" (leaving responsive files where they currently reside on shares, rather than copying them elsewhere), the legal department's participation is critical in identifying what needs to be preserved.
Records management--If the goal is to reduce the volume of storage on shares, there are some good opportunities using ICM tools. Quick-wins are the high-volume, short-retention categories. For example, general administration retention items, such as pencil requests, travel confirmations, and employee recognition files, can be identified and may be voluminous.
The organization--Employees across the enterprise should be primarily involved with figuring out how to set up the foldering structure based on what is important to them. At the enterprise level, the files on shared drives may be able to be organized in a more useful manner.
In addition to depending on a retention policy and schedule, transforming shared drives may also require dependence on some less-obvious policies that resolve questions, such as these:
* What will happen to files that someone has specifically identified as garbage, as a duplicate, or as outdated?
* Which duplicate (of several) is going to be kept?
* Is a rendition (a PDF, for example) a duplicate? What about a .zip file?
* When does a draft become obsolete?
* when is a computer program a record?
Preparing for Transformation--Work Group Level
Departmental or workgroup shares evolve over time, usually with little guidance or structure. Often, at the root level, file structures are organized around:
* Organizational units identified with acronyms, which change over time, making indexing and classification less accurate
* Individuals' names or initials (e.g., "Bob's folder") (These are the worst; after Bob leaves the organization, nobody will ever look in that folder.)
* Old versions
* Other miscellaneous topics The overall accuracy of the classification that comes from these types of shares can be improved with some reorganization.
File Structure Policy
Basing root level, first-tier, and second-tier folders on a functional model will help efforts to organize content. A function is a combination of a verb and a noun (e.g., "receive compliance reports," "plan conferences," "initiate projects," and "develop training").
It is not strictly necessary to adhere to the verb-noun combination (e.g., "conference planning" will also work), but using this functional approach describes content better and makes subfolders more consistent and will drastically improve overall classification accuracy: based on this author's organization's experience, up to 95% accuracy compared to an organizational model (80%) or ad hoc, three-letter initial model (65%).
For example, in one organization's legal department, folders at the root level were labeled with three-letter initials of the responsible attorneys. Because of this, each attorney and his or her team set up their own subfolders. One chose "Contracts," one chose "Contracts and Agreements," one chose "AGMTS," and so on, so there was less consistency in folder names.
When setting up new folder structures for content, keep the following guidelines in mind:
* Think about how someone who has no history with the organization will want to find content five years from now. This level of simplicity and clarity will help the ICM tools, as well.
* Don't set up a structure that easily provides for a single document to reside in multiple places. Folder structures work better if there is a mutually exclusive place to put things unless an organization can ensure the indexing allows for reorganization. If a folder for emails and a folder for specifications exist, an e-mail about specifications will be difficult to locate consistently. Don't set up a folder called "e-mail"; email is a delivery mechanism, not a document type.
* Use folder names that help identify common index fields for the documents placed in that area. Significant labor will be saved if a document automatically inherits the folder name as an index value.
Other Clean-Up Activities
In addition to foldering issues, a number of clean-up activities are best handled at the workgroup level. Chances are, many of the following items will be the responsibility of a limited number of individuals within the workgroup, providing maximum clean-up benefit with minimum user impact. ICM tools can report on and often fix such issues as:
* Odd characters--ECRM systems have a hard time dealing with odd characters in file names (e.g., !#$%^&()).
* Long file paths--Tools to migrate content to ECRM systems may also have a problem with files or folder paths that are too long. An ICM report can help identify these.
* Duplicates--Duplicates are made for a variety of reasons. Based on this author's organization's experience, around 20-30% of the files on a typical workgroup share are duplicated and usually with about three copies per duplicate set. As it turns out, 15% of the duplicates cannot be deleted without causing some larger collection of files to lose integrity because they are part of an application, database, or web collection. There is not a one-size-fits-all solution for dealing with duplicates.
Large files by themselves are not bad. But, if the file is unnecessary, the benefit from getting rid of it is higher than for getting rid of other types of files.
Preparing for Transformation--User Level
Using standard naming conventions is not just polite, it is helpful for individuals using the file shares. Standard naming convention policies also help establish criteria that ICM tools can leverage.
File and Folder Naming Policy-'Dos'
When naming files and folders, follow these rules:
Establish common spellings for common entities. In other words, spell out Department of Justice rather than use the acronym DOJ.
Establish a common method for abbreviations (e.g., removing vowels only). As long as there is consistency, the ICM tools can act upon them accordingly.
Structure the use of dates to be consistent across the organization. The majority of date-specific content is quarterly based, which can be identified by the month. If dates are used on files, begin with month (i.e., MM-DD-YYYY). If the year is the most important element, consider using only the YYYY format. Do not spell out months. Remember that the network system tracks dates, but that "last access" dates and "create" dates are frequently inaccurate because they have been changed by a computer process, such as backing up the files. An individual may need to rely on file and folder dates using an ICM tool.
Formalize the use of "draft," "ver. #," "superseded," and "final." This is a very important policy, as one of the greatest challenges with shared drives is being able to identify if a document has been declared as final. Also, a number of records' categories use "superseded" as the trigger for retention on a document. Therefore, if a document falls under such a category, and a new version is created, the old version should be renamed with the word "superseded." A brief review of the retention schedule may be required when identifying these content types.
Consider using postfix tags on folders indicating a lifecycle state and business function. This is useful for specific sets of documents that require greater control or that may be migrated to electronic content repositories in the future. The function may be one of several general functions (e.g., "ADM" for administration, "RM" for records management, "IT' for information technology, and "mixed" for more than one function), and the lifecycle may be represented, for example, by "drafts," "final," "reference," and "mixed"). This may result in a name, such as "Correspondence [ADM, Final]."
File and Folder Naming Policy--'Don'ts'
When naming files or folders, avoid using:
* Employee or author names. Rename instead to reflect the function being performed.
* Abbreviations that are not pre-defined. Consider creating a wiki for those abbreviations that may be used for employees.
* Organizational units. These names often change.
* Document formats, such as Word, Excel, spreadsheets, and e-mails. ICM tools can easily identify what the format is based on the file extension or the contents of the file.
* Folder names where the words are concatenated or connected with alternative characters, such as underscores. Often these conventions (e.g., "Sample_files") indicate the file is linked by external web pages and should be managed in a dedicated repository.
* Obsolescence indicators unless they also indicate when the files can be deleted. Examples of these are "historical," "archived," "to be deleted," "old files," and "old versions."
File systems evolve over many years through different individuals, so perfect consistency is hard to find. However, these shares can be cleaned and prepared at a variety of organizational levels to make the classification effort better. It is easier to tackle these issues before the cleanup or migration becomes the critical path in an organization's ECRM project.
Brian Tuemmler can be contacted at firstname.lastname@example.org.