Printer Friendly
The Free Library
14,557,981 articles and books
Member login
User name  
Password 
 
Join us Forgot password?

The truth about taxonomies.


At the Core

This article

* defines a taxonomy taxonomy: see classification.
taxonomy

In biology, the classification of organisms into a hierarchy of groupings, from the general to the particular, that reflect evolutionary and usually morphological relationships: kingdom, phylum, class, order,


* explains how an organization can use and develop taxonomies

* identifies types of taxonomies

Imagine opening up file cabinet drawers, credenzas, or desk drawers and seeing papers and materials piled up and scattered Scattered

Used for listed equity securities. Unconcentrated buy or sell interest.
 with no rhyme or reason sound or sense.

See also: Rhyme
. Imagine information on a computer stored in one or two big dumping grounds according to according to
prep.
1. As stated or indicated by; on the authority of: according to historians.

2. In keeping with: according to instructions.

3.
 the name of a person or titles that only make sense to the creator, with no breakdowns according to specified groupings. Chances are, in either case, it will take a long time to locate files. And what happens when there are new items to add? The valuable space being occupied in these examples is not being used and organized to provide the best benefit in terms of space and time efficiency.

In an office situation, a taxonomy or classification scheme to organize the paper and/or electronic documentation is required. Most organizations use some form of structure to manage their paper documentation. This may or may not be a documented procedure. It may or may not be a system that is widely understood by all employees. It may or may not reflect the business needs of the organization. When it comes to the electronic information in most organizations, it is often every computer or shared drive for itself. Often there are no guidelines guidelines,
n.pl a set of standards, criteria, or specifications to be used or followed in the performance of certain tasks.
 or procedures for how these repositories of corporate information and knowledge are to be handled. Organizations frequently overlook the management of one of their most important business assets--information. Information is the fuel that keeps an organization running smoothly. Why then do organizations not give more time and attention to the management of this important asset? Unfortunately, no one discusses the need for better management of information until a crisis hits.

WHAT IS A TAXONOMY?

According to www.whatis.com:
   Taxonomy (from Greek "taxis" meaning
   arrangement or division and
   "nomos" meaning law) is the science of
   classification according to a pre-determined
   system, with the resulting catalog
   used to provide a conceptual framework
   for discussion, analysis, or information
   retrieval. In theory, the development of
   a good taxonomy takes into account the
   importance of separating elements of a
   group ("taxon") into subgroups
   ("taxa") that are mutually exclusive,
   unambiguous, and taken together,
   include all possibilities. In practice, a
   good taxonomy should be simple, easy
   to remember, and easy to use.


Another definition, according to Jean Graef of the Montague Institute is:
   "... structures that provide a way of
   classifying things--living organisms,
   products, books--into a series of hierarchical
   groups to make them easier to
   identify, study, or locate. Taxonomies
   consist of two parts--structures and
   applications. Structures consist of the
   categories (or terms) themselves and
   the relationships that link them
   together. Applications are the navigation
   tools available to help users find
   information."


Other terms associated with taxonomy development and implementation are controlled vocabulary Controlled vocabularies are used in subject indexing schemes, subject headings, thesauri and taxonomies. Controlled vocabulary schemes mandate the uses of predefined, authorised terms that have been preselected by the designer of the controlled vocabulary as opposed to natural , thesaurus, and user warrant. A controlled vocabulary is an indexing language (i.e., a standardized standardized

pertaining to data that have been submitted to standardization procedures.


standardized morbidity rate
see morbidity rate.

standardized mortality rate
see mortality rate.
 set of terms and phrases authorized au·thor·ize  
tr.v. au·thor·ized, au·thor·iz·ing, au·thor·iz·es
1. To grant authority or power to.

2. To give permission for; sanction:
 for use in an indexing system to describe a subject area or information domain). A thesaurus is a type of controlled vocabulary that shows the hierarchical (parent-child), associative as·so·ci·a·tive  
adj.
1. Of, characterized by, resulting from, or causing association.

2. Mathematics Independent of the grouping of elements.
 (related), and equivalent (synonymous) relationships among terms. Often, controlled vocabulary, thesaurus, and classification structure (taxonomy) are used interchangeably INTERCHANGEABLY. Formerly when deeds of land were made, where there Were covenants to be performed on both sides, it was usual to make two deeds exactly similar to each other, and to exchange them; in the attesting clause, the words, In witness whereof the parties have hereunto . User warrant is a justification for the representation of a concept or for the selection of a preferred term because of individual user needs.

In essence, a taxonomy is a hierarchical classification of headings constructed using the principles of classification, and a thesaurus supplies the commentary and links to navigate the taxonomy. In today's information-dependent environment, where we are receiving, accessing, and using information in its many forms, it is absolutely imperative that there are well-defined and documented structures in place. This ensures that the person needing the information receives it in the timeframe required.

However, the reality in most organizations is a

* ack of standardized procedures

* "stovepipe" approach to information process

* ack of an information-sharing culture

* proliferation proliferation /pro·lif·er·a·tion/ (pro-lif?er-a´shun) the reproduction or multiplication of similar forms, especially of cells.prolif´erativeprolif´erous

pro·lif·er·a·tion
n.
 of legacy systems

Taxonomies can provide:

1. Identification--The taxonomy can help control the glut glut pronounced as rut, slut Vox populi An excess of a service or skilled labor in a particular area. See Physician glut.  of information and identify where information should be stored by filtering, categorizing, and labeling information.

2. Discovery--Additional information on a topic can be inferred by seeing where the entry is placed in context within the taxonomy and provide serendipitous ser·en·dip·i·ty  
n. pl. ser·en·dip·i·ties
1. The faculty of making fortunate discoveries by accident.

2. The fact or occurrence of such discoveries.

3. An instance of making such a discovery.
 guidance to the person working on the issue.

3. Delivery--The taxonomy can improve the retrieval process. The use of the taxonomy's controlled vocabulary enhances searching via browsing. The use of navigation paths or "breadcrumbs" based on the taxonomy's hierarchy provide context and enhance searching via free text. For example, if a free text search returns 100 hits for the word "bridge," the navigation path for each hit provides the context required to show whether the record refers to a structure, a card game, or financing. It is not necessary to open each returned record to see how the word "bridge" is used.

In addition to performing these basic functions, Graef suggests "A taxonomy should also inspire trust. The user should feel confident that the taxonomy will help him find the information he seeks--if it exists ... As more information gets into electronic format and becomes available over global networks, it gets harder to ensure that any one taxonomy is both sufficiently specific and comprehensive."

CLASSIFICATION THEORY

Although many logically different structures address taxonomies or classifications, two of the most widely known are the generic relationship and the whole-part relationship. In her article, "The Role of Classification in Knowledge Representation and Discovery," Barbara H. Kwasnik defines these relationships and provides the pros and cons pros and cons
Noun, pl

the advantages and disadvantages of a situation [Latin pro for + con(tra) against]
 of each.

1. Generic Relationship (Genus/Species) This theorem theorem, in mathematics and logic, statement in words or symbols that can be established by means of deductive logic; it differs from an axiom in that a proof is required for its acceptance.  is probably the most true of all taxonomies. It adheres to strict structural requirements and contains the following properties: genus/species, inclusiveness, inheritance, transitivity tran·si·tive  
adj.
1. Abbr. trans. or tr. or t. Grammar Expressing an action carried from the subject to the object; requiring a direct object to complete meaning. Used of a verb or verb construction.
, rules for association and distinction, and mutual exclusivity. The following, taken from the Medical Subject Headings (National Library of Medicine), is an example of such a structure:

* Eye Diseases

* Conjunctival con·junc·ti·val
adj.
Relating to the conjunctiva.



conjunctival

pertaining to or emanating from conjunctiva.


congenital conjunctival membrane
 Diseases

* Conjunctival Neoplasm neoplasm or tumor, tissue composed of cells that grow in an abnormal way. Normal tissue is growth-limited, i.e., cell reproduction is equal to cell death.

* Conjunctivitis conjunctivitis (kənjəngtəvī`təs), inflammation or infection of the mucosal membrane that covers the eyeball and lines the eyelid, usually acute, caused by a virus or, less often, by a bacillus, an allergic reaction, or an

* Keratoconjunctivitis

* Corneal corneal

pertaining to the cornea. See also keratitis, keratopathy.


corneal anomaly
includes microcornea, coloboma, megalocornea, dermoid, congenital opacity.

corneal black body
see corneal sequestrum (below).
 Diseases

Genus/species: A true hierarchy has only one type of relationship between its super and subclasses, which is known as the "IS-A" relationship. In a generic relationship, keratoconjunctivitis is a kind of conjunctivitis, which in turn is a kind of conjunctival disease, which in turn is a kind of eye disease.

Inclusiveness: The top class is the most inclusive class and describes the domain of the classification. The top class includes all of its sub-classes. Everything below eye diseases is an eye disease.

Inheritance: This ensures that everything that is true for entities in a given class is also true for entities in a subclass In programming, to add custom processing to an existing function or subroutine by hooking into the routine at a predefined point and adding additional lines of code.

subclass - derived class
. Whatever is true of eye diseases (as a whole) is also true of conjunctival diseases, and so on. Attributes are inherited inherited

received by inheritance.


inherited achondroplastic dwarfism
see achondroplastic dwarfism.

inherited combined immunodeficiency
see combined immune deficiency syndrome (disease).
 by a subclass from its super class. This is a downward flow of information.

Transitivity: All subclasses are members of not only their immediate super class but of every super class above that one. If keratoconjunctivitis is a kind of conjunctivitis, and conjunctivitis is a kind of conjunctival disease, then by the rule of transitivity, keratoconjunctivitis is also a kind of conjunctival disease. This is an upward flow of information.

Systematic and Predictable Rules for Association and Distinction: All entities in a given class are like each other in some predictable and predetermined pre·de·ter·mine  
v. pre·de·ter·mined, pre·de·ter·min·ing, pre·de·ter·mines

v.tr.
1. To determine, decide, or establish in advance:
 way, and these entities differ from entities in sibling sibling /sib·ling/ (sib´ling) any of two or more offspring of the same parents; a brother or sister.

sib·ling
n.
 classes in some predictable and predetermined way. Conjunctival diseases and corneal diseases are alike in that they are both kinds of eye diseases. They differ from each other in some predicable pred·i·ca·ble  
adj.
That can be stated or predicated: a predicable conclusion.

n.
1. Something, such as a general quality or attribute, that can be predicated.

2.
 and systematic criterion of distinction (in this case, the "part of the eye affected").

Mutual Exclusivity: A given entity can belong to only one class.

A well-known example of this type of taxonomy is found in biology, the Kingdom-Phylum-Class-Order-Family-Genus-Species classification of life as developed by Karl von Linne (also known as Linnaeus) and published in 1758. This marked the beginning of modern classification of plants and animals Plants and Animals are a Canadian indie-rock band from Montreal, comprised of guitarist-vocalists Warren Spicer and Nic Basque, and drummer-vocalist Matthew Woodley.[1] They are signed to Secret City Records. .

A generic relationship is most useful for representing knowledge in mature domains in which the nature of the entities and the nature of meaningful relationships are already known. It is useful for entities that are well defined and have clear class boundaries, for example, a subject body of knowledge.

2. Whole-Part Relationship (Tree) This type of relationship also progresses from the more general to the more specific. The following is an example of such a relationship:

* Automobile

* Body

* Engine Block

* Pistons Pistons can mean:
  • Piston, the engine and engineering part
  • Detroit Pistons, the basketball team


* Valves

* Interior

* Upholstery upholstery, general term for household fittings, hangings, curtains, cushions, and covers. It refers to stuffed, padded, and spring-cushioned furniture, such as chairs and sofas, or to the usually decorative materials and fabrics that cover them.

The most marked difference between this classification theorem In mathematics, a classification theorem answers the classification problem "What are the objects of a given type, up to some equivalence?". It gives a non-redundant enumeration: each object is equivalent to exactly one class.  and the generic relationship is that the whole-part relationship does not assume the rule of genus/species and, therefore, also does not assume the rule of inheritance. A body is not a type of automobile. Upholstery is not a type of interior.

As opposed to genus/species hierarchies, where the flow of information is both vertical and lateral, in whole-part classifications the flow of information is only vertical. There are systematic and predicable rules for distinction. Pistons and valves are known to be different parts of an engine block. However, this relationship does not assume systematic and predicable rules for association; pistons and valves are not both kinds of engine blocks, nor can it be assumed that they share many attributes because of their sibling position within the hierarchy. They share the attribute that they are part of the engine block, but that is only a partial explanation of what they are.

Whole-part classifications or taxonomies are found in the organization of most Web sites and, more generally, in function-based enterprise classifications and geographic-based classifications. They are also used as corporate directories. They are more popular than generic classifications. Some argue however, that they are not true taxonomies.

An interesting nuance nu·ance  
n.
1. A subtle or slight degree of difference, as in meaning, feeling, or tone; a gradation.

2. Expression or appreciation of subtle shades of meaning, feeling, or tone:
 in the relationship between those versed Versed® Midazolam Pharmacology A preoperative sedative  in traditional classification theory and those in the records management profession is the way in which each regards the utility of a taxonomy. The former seeks to define and represent the relationships between entities for the purposes of identification and retrieval and may adhere to adhere to
verb 1. follow, keep, maintain, respect, observe, be true, fulfil, obey, heed, keep to, abide by, be loyal, mind, be constant, be faithful

2.
 a more strict structure for a taxonomy. The latter seeks to classify information not only for identification and retrieval but also, it can be argued, ultimately to apply retention requirements--something that would not be considered by the former. The necessity to incorporate retention requirements may result in a looser application of classification rules. Determining the reason for constructing a taxonomy and the needs of its users is paramount.

DEVELOPING A TAXONOMY

Whether the taxonomies are rooted in the generic theorem or the whole-part theorem, various formats may be employed when building taxonomies for organizations. The choice of a primary method of organizing the information may be supplemented by the addition of metadata fields, which act as additional entry points to the information. Taxonomies (classification structures) and metadata (cataloguing) work well together to provide a rich description of information.

In some cases, more than one taxonomy is required within an organization. A subject-based taxonomy may already be in use within a document management software application; however, there also may be a need to organize corporate documents within a functional taxonomy. A functional-based taxonomy already may be developed, yet an organization also may wish to represent the more esoteric es·o·ter·ic  
adj.
1.
a. Intended for or understood by only a particular group: an esoteric cult. See Synonyms at mysterious.

b.
 elements of what each department does. In this case, a companion taxonomy based on organization structure may be useful. There also may be different taxonomies in use for each element in a metadata set. Each organization must assess its needs and requirements to decide on the taxonomy structure that is right for it.

According to Alan Gilchrist, a consultant with TFPL TFPL The Fantasy Poker League  Ltd., it has been shown through research that most organizations

* were aware of the need to develop a better information structure

* were prepared to commit substantial financial resources to the project

* understood the importance of using a standardized terminology

* recognized that user participation and feedback was important

Unlike libraries, which have classification systems such as Dewey Decimal Meaning 10. The numbering system used by humans, which is based on 10 digits. In contrast, computers use binary numbers because it is easier to design electronic systems that can maintain two states rather than 10.  or Library of Congress, there are no universal standard taxonomies for businesses. Each business taxonomy must be developed based on the uniqueness of an organization's individual business requirements, its users, and its industry.

The following are suggested steps to follow in developing a taxonomy.

Plan and Gather Data

This first step is the most important because it provides the direction and foundation for the development of the taxonomy and its accompanying tools (controlled vocabulary, thesaurus). It requires a variety of activities. First, survey the organization to define the stakeholders Stakeholders

All parties that have an interest, financial or otherwise, in a firm-stockholders, creditors, bondholders, employees, customers, management, the community, and the government.
. Include a cross sampling of the various levels (i.e., frontline front·line also front line  
n.
1. A front or boundary, especially one between military, political, or ideological positions.

2. Basketball See frontcourt.

3. Football The linemen of a team.
 workers, management, and executive) to ensure that each has had an opportunity to express his/her interests and experiences.

Then define the goals, scope, and rules for the project--much easier said than done--in order to provide the basis for measuring success. Next, develop a communications plan to keep the project team members and the organization as a whole informed about the importance of the project and its progress.

Once these items are in place, data gathering can begin. A taxonomy is media-independent; it represents the intellectual content of any format of materials. Use questionnaires, individual interviews, walkthroughs, and reviews of documentation (e.g., annual reports, research summaries, existing file plans) to gather enough information that represents the body of the data set to be classified. This material will provide not only the basis for the logical arrangement of the taxonomy but also the names of the individual headings within it. This process also provides an opportunity to develop key contacts with subject experts.

Build a Draft Taxonomy

During this phase, the project team will need to confirm the type of taxonomy that will be used. It is a good idea to contact other organizations within the same industry to review their taxonomy structures or conduct other research to identify existing taxonomies or thesauri elsewhere. Organizations may be able to benefit from the work done by others to reduce "reinventing the wheel Reinventing the wheel is a phrase that means a generally accepted technique or solution is ignored in favor of a locally invented solution. To "reinvent the wheel" is to duplicate a basic method that has long since been accepted and even taken for granted. " themselves.

If a business plans to use a technology application (e.g. electronic document management systems, categorizing software), it will need to research it thoroughly and understand its capabilities/limitations. Many categorizing software applications require a pre-built taxonomy.

Whichever taxonomy format is chosen, the most important decision is determining the "first cut" or how the rules of distinction will be applied. Kwasnik says, "this determines the shape and the representational rep·re·sen·ta·tion·al  
adj.
Of or relating to representation, especially to realistic graphic representation.



rep
 eloquence Eloquence
Ambrose, St.

bees, prophetic of fluency, landed in his mouth. [Christian Hagiog: Brewster, 177]

Antony, Mark

gives famous speech against Caesar’s assassins. [Br. Lit.
 of the classification." Is it best to build the taxonomy from the top-down or the bottom-up? Should an organization build the top-level buckets first (based on pre-defined business requirements) or should it organize the buckets at the most granular granular /gran·u·lar/ (gran´u-lar) made up of or marked by presence of granules or grains.

gran·u·lar
adj.
1. Composed or appearing to be composed of granules or grains.

2.
 level first (based on the content of the information) and decide on the major groupings for the top levels later? Some argue that a top-down approach Top-down approach

A method of security selection that starts with asset allocation and works systematically through sector and industry allocation to individual security selection.
, with its pre-defined business requirements view of the world, can unduly influence the development of the taxonomy--that a different approach of organizing the information may be overlooked. In actual practice, the method of building a taxonomy usually employs a marriage of the two approaches.

Now begins the arduous ar·du·ous  
adj.
1. Demanding great effort or labor; difficult: "the arduous work of preparing a Dictionary of the English Language" Thomas Macaulay.

2.
 task of grouping similar documents together, deciding on the group names, and developing the controlled vocabulary and/or thesaurus. Users interviewed in step 1 will, for the most part, have supplied the vocabulary for these headings. This part of the development process is a matter of trial and error and requires a good deal of patience. Some people write the concepts or terms on cards; some use Post-It notes Post´-it note

n. 1. A small sheet of paper having the back part partly covered with a non-permanent gum which allows the note to be attached temporarily to another object, and easily removed without leaving any trace of glue on the object to
. Others use an Excel spreadsheet or employ specialized software packages, such as Visual Mind or Mind Mapper, to construct tree diagrams. A rule of thumb for the number of top buckets is seven plus or minus two; this is especially relevant for Web site taxonomies. As for its depth, anything beyond four levels may inhibit a user's ability to navigate easily within the structure. Although a document may be classified in more than one place in the taxonomy, the document itself should be stored in only one location. Names of the bucket headings should not repeat unless--and this is critical--a navigation path showing the super classes can be supplied to provide the context of the bucket.

There may be multiple facets or characteristics of topics that must be represented throughout the taxonomy. For example, policies, standards, research, or a geographic location may apply to numerous topics. A taxonomy may be constructed with those headings as subclasses under a super class. However, this may result in a very large taxonomy. Instead, consider constructing a list of such terms as metadata in a separate field that can be accessed via a controlled vocabulary search.

Once a draft has been constructed, it must be distributed and reviewed by a cross sampling of the organization in order to promote buy-in and provide feedback to refine the hierarchy. It also may be necessary to consult with subject matter experts at this time. When developing only a small area of the taxonomy for a pilot, keep in mind how other areas will affect the taxonomy's structure once they are added. Try to anticipate what may need to be changed. The draft then should be presented to a validation committee for approval.

Remember that the taxonomy will continue to develop over time. It is never truly finished; instead, it may be thought of as a "living" document. What is approved now may need to be modified later as the organization grows and evolves in order to ensure that the organization's assets are represented in the taxonomy.

Pilot

Now it is time to populate To plug in chips or components into a printed circuit board. A fully populated board is one that contains all the devices it can hold.  the taxonomy with records or information. At this point, an automatic categorization software application may be used to assist with the categorization of the concepts across the pre-developed taxonomy. Reviewing the pilot project with users and getting their input during this phase is crucial. Time well spent in the development of the taxonomy ensures its success.

Refine and Finalize fi·nal·ize  
tr.v. fi·nal·ized, fi·nal·iz·ing, fi·nal·iz·es
To put into final form; complete or conclude: "They have jointly agreed ...


As a result of user feedback during the pilot project, the taxonomy will need to be refined; the controlled vocabulary and/or thesaurus may require modification. At this point, continue to launch the taxonomy in "chunks" if the organization is large; a small- to medium-sized organization, or one with limited resources, may decide to launch it in its entirety. It is crucial at this point to market and build on the success of the pilot project. This will encourage other groups to come forward to participate.

User Training

User training must be designed, developed, and implemented based on user needs. Front-line workers will need more information and hands-on experience than the executives. However, all employees will need to be informed. This is a great opportunity to reiterate re·it·er·ate  
tr.v. re·it·er·at·ed, re·it·er·at·ing, re·it·er·ates
To say or do again or repeatedly. See Synonyms at repeat.



re·it
 the value of the taxonomy and how it enables better management of and access to an organization's most important asset--information.

Ensure Continued Development

The taxonomy and its related tools will evolve over time. To ensure their continued value to the organization, policies and procedures Policies and Procedures are a set of documents that describe an organization's policies for operation and the procedures necessary to fulfill the policies. They are often initiated because of some external requirement, such as environmental compliance or other governmental  must be written to outline such things as who owns the taxonomy and at what level will an end user or content manager be able to add, if allowed at all, a heading.

Ensure there is a mechanism in place for updating and revising the taxonomy through the use of a committee, a regular auditing process, or informal reviews. Proper staffing is also necessary to ensure that the development and ongoing maintenance of the taxonomy is sustained. What began as a project with specific goals and objectives evolves into an ongoing process that needs to be managed and maintained over time.

What is the role of records managers in taxonomy development? They have a good understanding of the organization and business processes, as well as the terminology that may be unique to their industry. In addition, they have a deep appreciation of the needs and benefits of information retrieval information retrieval

Recovery of information, especially in a database stored in a computer. Two main approaches are matching words in the query against the database index (keyword searching) and traversing the database using hypertext or hypermedia links.
. Those who develop a taxonomy, especially a functional one, become invaluable to an organization because they become an expert on its overall operations. In addition, a byproduct by·prod·uct or by-prod·uct  
n.
1. Something produced in the making of something else.

2. A secondary result; a side effect.

Noun 1.
 of the process is the development of a strong network of key contacts--an important marketing asset.

AUTOMATIC CATEGORIZATION AND TAXONOMY

Automatic categorization is the process by which technology is used to create clusters of documents based on criteria specified by the user, usually via a pre-supplied taxonomy, thesaurus, or controlled vocabulary list. Automatic categorization software provides the potential means to automatically file documents into either a predefined taxonomy or self-defined categories. Although it sometimes may be referred to as taxonomy or classification software, it does not currently perform this function well, if at all. Automatic categorization software cannot produce a well-formed hierarchical taxonomy or classification scheme. In reality, the software acts as an indexing agent. Its strengths lie in its ability to process (i.e., categorize cat·e·go·rize  
tr.v. cat·e·go·rized, cat·e·go·riz·ing, cat·e·go·riz·es
To put into a category or categories; classify.



cat
 or index) large data sets and to identify concepts and relationships that may not be readily apparent from the raw data in order for an individual to build or refine a taxonomy. [Editor's Note Editor's Note (foaled in 1993 in Kentucky) is an American thoroughbred Stallion racehorse. He was sired by 1992 U.S. Champion 2 YO Colt Forty Niner, who in turn was a son of Champion sire Mr. Prospector and out of the mare, Beware Of The Cat.

Trained by D.
: See related article, "So You Want to Implement Automatic Categorization?" on page 60.]

There are two main types of categorization software: rules-based (e.g., Entrieva, formerly Semio, and Inxight) and statistical clustering (e.g., Autonomy, Mohomine, Hummingbird hummingbird, common name for members of the family Trochilidae, small, strictly New World birds, related to the swifts, and found chiefly in the mountains of South America. Hummingbirds vary in size from a 2 1-4-in. ). The former employs predefined "if-then" statements to define the clusters. It uses linguistic analysis (the rules of grammar, language detection, proximity analysis, stemming) to extract concepts, not keywords, from the documents and assigns the documents into clusters. This method is not dependent on the information in the collection itself; that is, the rules may be applied against multiple collections. The latter employs mathematical algorithms to cluster like concepts together. Popular methods include term co-occurrence analysis and neural networks neural network or neural computing, computer architecture modeled upon the human brain's interconnected system of neurons. Neural networks imitate the brain's ability to sort out patterns and learn from trial and error, discerning and extracting . This method categorizes documents based on the information in the collection itself.

Catalogue-by-example is an additional technique used by either camp (e.g., Inxight, Autonomy) to refine results. This technique compares new documents to a collection of exemplary documents, usually 20-30 documents for each heading, which is known as the "training set." During the training process, humans evaluate the appropriateness of the software-categorized documents, and shift documents from one category to another as required. The software learns through an iterative it·er·a·tive  
adj.
1. Characterized by or involving repetition, recurrence, reiteration, or repetitiousness.

2. Grammar Frequentative.

Noun 1.
 process what documents should be categorized cat·e·go·rize  
tr.v. cat·e·go·rized, cat·e·go·riz·ing, cat·e·go·riz·es
To put into a category or categories; classify.



cat
 into which clusters. In this way, the software refines its understanding of a concept. This technique works best if the taxonomy is stable; otherwise, time must be taken to retrain re·train  
tr. & intr.v. re·trained, re·train·ing, re·trains
To train or undergo training again.



re·train
 the system. It also works well when dealing with disparate data, for example, televisions and telephones, not robins and sparrows.

The terms classification and taxonomy will continue to be used in their respective professional spheres; however, according to Liz Edols' article, "Taxonomies Are What?" in Free Pint e-zine: "Good taxonomies, based on the use of classification and controlled vocabularies, result in more efficient information retrieval. This ensures better productivity and less user frustration. Where do taxonomies fit into the information architecture paradigm? They are one part of it, though they may not always be referred to as a taxonomy."

Isn't this the ultimate goal? By using the standards and principles set out for the development of a hierarchical structure See hierarchical. , whether it is called a taxonomy or a classification scheme, more efficient information retrieval, better productivity, and less user frustration can be achieved.

Types of Taxonomies

The following are some ways of representing information within taxonomies:

Functional: This type of taxonomy organizes itself along the different functions performed by an organization--both administrative and operational.

Pros

* is most in tune with organizational goals and business processes

* reduces silos of information

* reduces duplication

* makes it easier to find the most recent official document

* shows the flow of information

* naming of headings is unaffected by department name changes

* is recommended by ISO (1) See ISO speed.

(2) (International Organization for Standardization, Geneva, Switzerland, www.iso.ch) An organization that sets international standards, founded in 1946. The U.S. member body is ANSI.
 Technical Report 15489

Cons

* requires a new way of thinking about information

* needs buy-in from everyone

* requires one person to oversee the major shared "buckets"

* requires a liaison within each department that contributes to the "bucket"

* requires more training of employees

Department: This type of taxonomy is department-based and mirrors an organizational chart An organizational chart is a chart which represents the structure of an organization in terms of rank. The chart usually shows the managers and sub-workers who make up an organization. .

Pros

* is easy to build

* is easy to understand

* preserves the chain of command, avoiding internal "politics"

* allows an individual to work in only one area of the taxonomy

Cons

* requires taxonomy headings to be changed frequently

* department mergers and splits will force parallel changes

* splits information on a project or topic across the taxonomy if two or more people from different departments contribute information

* encourages a proprietary way of thinking

* does not represent what an organization actually does

* is difficult for new employees to use

* requires the management of departing employees' documents/files

Subject: This type of taxonomy is based on the subjects of information with which an organization might deal.

Pros

* is appealing if need to classify a discrete body of knowledge

* allows for greater depth if required

* is an excellent application for an EDMS (Electronic Document Management System or Enterprise Document Management System ) See document management.

EDMS - Electronic Document Management System


Cons

* is limited to that one body of knowledge

* may be difficult to select the terminology for the subject headings if the users of the taxonomy are both novices and experts in the subject field

Product/Services: This type of taxonomy is based on the products and services that the organization provides.

Pros

* provides good representation of information for product-or service-centered organizations

Cons

* is more of a stand-alone taxonomy rather than a way to represent an entire organization

Location: This type of taxonomy is based on the organization's geographic location.

Pros

* is ideal for large multinational organizations

* allows for customization based on location

* allows for the incorporation of customs, culture, and regulations that are specific to the location

Cons

* is challenging to split information between the corporate office and branch locations

* requires specialist in each country to create the taxonomy because of language and cultural nuances

* is difficult for centralized control 1. In air defense, the control mode whereby a higher echelon makes direct target assignments to fire units. 2. In joint air operations, placing within one commander the responsibility and authority for planning, directing, and coordinating a military operation or group/category of

Managing Taxonomies Strategically

The following pearls of wisdom are gleaned from experience:

* Prepare for politics. The development of a taxonomy requires an enormous amount of diplomacy, tact, and negotiation.

* Taxonomy development is a process, not just a project. A taxonomy is never truly "finished."

* Serve the real needs of users rather than produce an "ideal" textbook taxonomy. Remember the concept of user warrant.

* Write policies and procedures for everything.

* Ensure there is a mechanism in place for the taxonomy's continued development and maintenance. This includes budgeting for the appropriate staff support required.

References

Adams, Katherine C. "Word Wranglers: Automatic Classification Tools Transform Enterprise Documents from `Bags of Words' into Knowledge Resources." Available at www.intelligentkm.com/feature/010101/featl.shtml (accessed 9 January 2003).

BRINT Institute. "Classification." Available at http://portal.brint.com/cgi-bin/getit/ links/Reference/Knowledge_Management/Knowledge_Retrieval/Classification (accessed 9 January 2003).

Content Wire. "Taxonomies." Available at www.content-wire.com/Taxonomies/Index.cfm (accessed 9 January 2003).

Edols, Liz. "Taxonomies Are What?" Free Pint e-zine. October 2001.

Graef, Jean. "Managing Taxonomies Strategically." Montague Institute Review. March 2001.

Hagedorn, Kat. "Extracting Value From Automated Classification Tools." Argus Center for Information Architecture, March 2001.

ISO Technical Report 15489-2. "Information and Documentation--Records Management, Part 2: Guidelines" 2001.

Kwasnik, Barbara H. "The Role of Classification in Knowledge Representation and Discovery." Library Trends, Vol. 48, No. 1, Summer 1999.

Lubbes, R. Kirk, 2001. "Automatic Categorization: How Does it Work, Related Issues, and Impact on Records Management." Presented at ARMA International 46th Annual Conference and Expo, 30 September 2001, Montreal, Quebec, Canada.

New South Wales New South Wales, state (1991 pop. 5,164,549), 309,443 sq mi (801,457 sq km), SE Australia. It is bounded on the E by the Pacific Ocean. Sydney is the capital. The other principal urban centers are Newcastle, Wagga Wagga, Lismore, Wollongong, and Broken Hill.  Government. "Designing and Implementing Recordkeeping Systems (DIRKS Dirks, as a person, may refer to:
  • Jerald F. Dirks, an American author
  • Nicholas Dirks, an American academic
  • Rudolph Dirks (1877-1968), a German American cartoonist
Dirks may also refer to:
)" and "Australian Standard 4390--Records Management." Available at www.records.nsw.gov.au (accessed 9 January 2003).

NISO (National Information Standards Organization, Baltimore, MD, www.niso.org) A non-profit organization founded in 1939 that deals with bibliographic and related information standards.  Standard Z39.19-1993. "Guidelines for the Construction, Format, and Management of Monolingual mon·o·lin·gual  
adj.
Using or knowing only one language.



mono·lin
 Thesauri." Available at www.niso.org/standards/standard_detail.cfm?std_id=518 (accessed 9 January 2003).

Nua Knowledge Base. "Classification: The Cure for Information Overload A symptom of the high-tech age, which is too much information for one human being to absorb in an expanding world of people and technology. It comes from all sources including TV, newspapers, magazines as well as wanted and unwanted regular mail, e-mail and faxes. ." Available at www.nua.ie/nkb/classification/index.shtml (accessed 9 January 2003).

Olsen, Christopher J., 2001. "Buy It, Build It, Steal It--But Your Organization Needs a Taxonomy for Its Information Survival." Presented at ARMA International 46th Annual Conference and Expo, 30 September 2001, Montreal, Quebec, Canada.

Rosenfeld, Louis and Peter Morville. Information Architecture for the World Wide Web: Designing Large-Scale Web Sites, Second Edition. Sebastopol, CA: O'Reilly & Associates, 2002.

"Where to Find Taxonomy Information." Montague Institute Review. Available at www.montague.com/review/taxoninfo.html (accessed 9 January 2003).

Willpower Information. "Publications on Thesaurus Construction and Use." Available at www.willpower.demon.co.uk/thesbibl.htm (accessed 9 January 2003).

Denise Bruno, MLS See multilevel security. , is an independent information management consultant. She may be contacted at denisebruno@sympatico.ca.

Heather Richmond, CRM (Customer Relationship Management) An integrated information system that is used to plan, schedule and control the presales and postsales activities in an organization. , is Vice President, Marketing and Sales, at CONDAR Consulting Inc. She may be contacted at hrichmond@condar.ca.
COPYRIGHT 2003 Association of Records Managers & Administrators (ARMA)
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2003, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

 Reader Opinion

Title:

Comment:



 

Article Details
Printer friendly Cite/link Email Feedback
Author:Richmond, Heather
Publication:Information Management Journal
Geographic Code:1USA
Date:Mar 1, 2003
Words:4734
Previous Article:Tragedies, controversies, and opportunities: redefining RIM's role in a turbulent time: given recent issues and critical developments, the central...
Next Article:MoReq: the standard of the future? Want to understand what electronic records management systems (ERMS) should do? The Model Requirements for the...
Topics:



Related Articles
The legislation that roared. (In focus: a message from the editors).
Searching and categorising data. (Searching Data).
A roadmap for proper taxonomy design: Part 1 of 2.(Internet)
Virus taxonomy: one step forward, two steps back.(Commentary)
XBRL-US releases updated taxonomies for public review.(technology)(US GAAP - Commercial and Industrial Taxonomy)(Generally Accepted Accounting...
XBRL-US, the American chapter of XBRL International--a global consortium of more than 200 accounting, technology and financial services...
XBRL taxonomy released for public comment.(extensible business reporting language)(US Financial Reporting Taxonomy...
Creating order out of chaos with taxonomies: the increasing volume of electronic records and the frequency with which those records change require...

Terms of use | Copyright © 2009 Farlex, Inc. | Feedback | For webmasters | Submit articles