Introduction.THE THEME OF "ORGANIZING THE INTERNET" brings to mind the late 1950s folk-rock singer Jimmie Rodgers's song titled "The World I Used to Know." A great many developments have transpired in the world of information science since the seminal works of S. C. Bradford, Claude Shannon Noun 1. Claude Shannon - United States electrical engineer who pioneered mathematical communication theory (1916-2001)
Claude E. Shannon, Claude Elwood Shannon, Shannon , Vannevar Bush (person) Vannevar Bush - Dr. Vannevar Bush, 1890-1974. The man who invented hypertext, which he called memex, in the 1930s.
Bush did his undergraduate work at Tufts College, where he later taught. , and numerous other pioneers. To those of us who have been in the information science field for several decades, the peek-a-boo devices such as Termatrex, Mortimer Taube's Uniterm cards, and discussion of pre- and postcoordinate indexing have given way to the world of browsers, HTML HTML
in full HyperText Markup Language
Markup language derived from SGML that is used to prepare hypertext documents. Relatively easy for nonprogrammers to master, HTML is the language used for documents on the World Wide Web. , XML XML
in full Extensible Markup Language.
Markup language developed to be a simplified and more structural version of SGML. It incorporates features of HTML (e.g., hypertext linking), but is designed to overcome some of HTML's limitations. , and numerous other ways of coding text and multimedia. The Internet and the World Wide Web have had a profound impact on how we go about storing and retrieving information. Document integrity has become transient, with little assurance that the location, existence, or even the content of a publication will be the same tomorrow as even a few minutes ago. We are often hard-pressed to determine if the failure to retrieve a publication is one associated with network infrastructure of the publisher. The dream of universal bibliographic control seems quite remote. By being able to bypass traditional publication channels, anyone can publish virtually at will. The situation becomes more chaotic when we consider the increasing redundancy of knowledge and the rampant proliferation of misinformation mis·in·form
tr.v. mis·in·formed, mis·in·form·ing, mis·in·forms
To provide with incorrect information.
mis and disinformation dis·in·for·ma·tion
1. Deliberately misleading information announced publicly or leaked by a government or especially by an intelligence agency in order to influence public opinion or the government in another nation: , to say nothing of social concerns with pornography, copyright violations, and other flagrant obtrusions into personal rights. Nevertheless, it behooves the information worker and the information user to make some sense of order if good information is to remain the basis of learning and decision making, and if documents are to continue as an archive of human knowledge.
As I reflected on writing this introduction, I began to ask myself just how far have we come from the world I used to know. The biggest paradigm change has not been that of technological development. Rather, the Internet has enabled virtually anyone with access to a computer to become intimately involved with the entire information cycle, namely, publishing, acquiring, organizing, and retrieving information, thereby bypassing information intermediaries such as indexers, reference librarians, and publishers. There is no question that the technology is vastly different from the early days of information retrieval information retrieval
Recovery of information, especially in a database stored in a computer. Two main approaches are matching words in the query against the database index (keyword searching) and traversing the database using hypertext or hypermedia links. . At the same time, the paperless office Long predicted, the paperless office is still a myth. Although paper usage has been reduced in some organizations, it has increased in others. Today's PCs make it easy to churn out documents.
As one technology eliminates paper, another comes along to increase usage. never materialized, nor are libraries being phased out as a result of the public's ability to access information directly from the desktop. More importantly, we still do not understand what constitutes information or how people make relevance judgments. Information retrieval (IR) to most searchers consists of character string matching between a query posed to a data source. In some ways, IR has even regressed, since now the trained search intermediary is no longer needed. The Internet consists of a vast unchecked sea and searching is referred to as "surfing." The issue is further complicated by the proliferation of document formats, incompatibility between generations of hardware, and questionable scalability of software. Even in doctoral seminars that I teach, I find the need to explain Boolean logic The "mathematics of logic," developed by English mathematician George Boole in the mid-19th century. Its rules govern logical functions (true/false) and are the foundation of all electronic circuits in the computer. and patiently teach students how to develop search strategies, formulate queries, and even how to compute the precision of searches. While the Internet has empowered the general public to perform tasks once done by professionals, it has also created a large body of knowledge needing organization. Vocabulary control is extremely limited at best. The average Web searcher has little understanding of the search process much less a fundamental ability to determine the effectiveness or exhaustivity of a search. People rely on a limited set of search tools, especially general search engines such as Google, not realizing that less than 20 percent of all indexable documents are being accessed. Beyond that, there are many electronic text and multimedia publications that are not indexed at all by Web crawler See crawler and WebCrawler. software. This part of the Internet is called by many names, such as the Invisible Web See deep Web. , the Opaque Web, the Hidden Web, the Dark Web, and so on.
In all fairness, the Internet, especially the Web, is still in its infancy. Techniques for publishing, organizing, and accessing content are changing rapidly as a result of new technological developments, the competitive information marketplace, and the growing sophistication so·phis·ti·cate
v. so·phis·ti·cat·ed, so·phis·ti·cat·ing, so·phis·ti·cates
1. To cause to become less natural, especially to make less naive and more worldly.
2. of searchers. As always, libraries are instrumental in promoting access to online publications, especially to those that belong to the invisible Web. Librarians are also educating users through the cooperative development known as information literacy Several conceptions and definitions of information literacy have become prevalent. For example, one conception defines information literacy in terms of a set of competencies that an informed citizen of an information society ought to possess to participate intelligently and . Developed by AECT AECT Association for Educational Communications and Technology
AECT Aeromedical Evacuation Control Team
AECT African Elephant Conservation Trust
AECT Association for Electronics Distributors
AECT Average Engine Combustion Time (the Association for Educational Communications and Technology The Association for Educational Communications and Technology is an academic and professional association dedicated to the effective use of technology in education. Members provide leadership in the field by promoting scholarship and best practices in instructional technology. ) and AASL AASL American Association of School Librarians (American Library Association)
AASL American Association of School Libraries
AASL Association of Architecture School Librarians
AASL Arkansas Association of School Librarians (American Association American Association refers to one of the following professional baseball leagues:
ACRL Administrative Cost Reimbursements to Localities (Association of College and Research Libraries) supports similar standards for higher education higher education
Study beyond the level of secondary education. Institutions of higher education include not only colleges and universities but also professional schools in such fields as law, theology, medicine, business, music, and art. . The dynamic nature of the Internet is going to require methods of organization way beyond the relatively static classification schemes that have served libraries for many years. New methods of organization must take into consideration more sophisticated techniques for content description in order to minimize such problems as retrieving pornography or to be able to detect plagiarism Using ideas, plots, text and other intellectual property developed by someone else while claiming it is your original work. and copyright violations. Eventually the exponential growth Extremely fast growth. On a chart, the line curves up rather than being straight. Contrast with linear. of the Web will itself subside. The Internet is not free. Market regulations will eventually restrict the free ride enjoyed by Web publishers. Publication patterns will be easier to recognize as publication activity becomes more linear. The end result will be that users will be able to discriminate in terms of specifying what they want or avoiding the retrieval of unwanted items.
In terms of what "organization" means, I took a fairly broad approach. As in many natural systems, information on the Internet is self-organizing. For example, some search engines determine what is important to index or in what order items are viewed from a search based on link counts that point to a site. Other knowledge bases define themselves by document type, such as usenets, or come into existence by their uniqueness--blogs (Web Logs) come to mind. It seems that for many Web users, ease of use and access appear to dictate knowledge sources. At the same time, there are more organized efforts to identify and make Internet sources accessible. These efforts may simply be a subject sampler of links to relevant sites supporting a subject, area field, or discipline. For example, the invisibleweb.com site provides classified links to Web-based databases that are not indexed by general search engines. Other sources, such as the Internet Public Library Internet Public Library - (IPL) A project at the University of Michigan School of Information and Library Studies to provide an on-line, 24 hour public library, chaired by an assemblage of librarians and information industry professionals. (http://www.ipl.org/ or http://www.libraryspot.com/), are portals that offer classified access to information on a much broader basis. The Open Directory project, also referred to as DMOZ DMOZ Directory Mozilla , attempts to create a definitive catalog of the Web. The Open Directory is the most widely distributed Adj. 1. widely distributed - growing or occurring in many parts of the world; "a cosmopolitan herb"; "cosmopolitan in distribution"
bionomics, environmental science, ecology - the branch of biology concerned with the relations between organisms database of Web content classified by humans. The Open Directory powers the core directory services for the Web's largest and most popular search engines and portals, including Netscape Search, AOL (A division of Time Warner, Inc., New York, NY, www.aol.com) The world's largest online information service with access to the Internet, e-mail, chat rooms and a variety of databases and services. Search, Google, Lycos, HotBot, DirectHit, and hundreds of others.
Ad hoc For this purpose. Meaning "to this" in Latin, it refers to dealing with special situations as they occur rather than functions that are repeated on a regular basis. See ad hoc query and ad hoc mode. classification systems are offered by directory search engines such as Yahoo, and other search engines like Google permit users to search by media type or document format, such as newspapers. Efforts are underway to improve basic document description beyond the limitations of HTML. Xtensible Markup Language markup language
Standard text-encoding system consisting of a set of symbols inserted in a text document to control its structure, formatting, or the relationship among its parts. The most widely used markup languages are SGML, HTML, and XML. (XML) and various permutations are but one example. In the library field, the Dublin Core A set of meta-data descriptions about resources on the Internet. Used for resource discovery, it contains data elements such as title, creator, subject, description, date, type, format and so on. Dublin Core descriptions are often included in HTML meta tags. Metadata Initiative (DCMI DCMI Dublin Core Metadata Initiative (Online Computer Library Center)
DCMI Disclosure of Classified Military Information ) is a notable example. Beyond large-scale efforts to identify and organize Internet content, many local efforts structure learning tools that provide quality information filtering of relevant Web information. They go by names such as WebQuests, scavenger hunts, and Tracer Bullets. Perhaps someday these efforts will fuse into clear-cut methods of organization that lead to the development of information standards by which Web content can be created. At this time, all such projects can be construed as efforts to organize the Internet.
The purpose of this issue of Library Trends is to describe some of these efforts. Leading educators, librarians, and researchers have contributed articles that represent an integrated set of ideas but also serve to reflect the diversity embodied in the theme of "Organizing the Internet." The articles consist of general surveys designed to inform as well as in-depth investigations of specific issues and services.
It is appropriate to have the first article by John Carlo Bertot address the contributions and activities of libraries in a networked environment. Ever since ancient times, libraries have acted as organizers and caretakers of recorded knowledge. In addition to creating and maintaining major classification schemes such as Dewey, Library of Congress, and UDC UDC
universal decimal system
UDC (Brit) n abbr (= Urban District Council) → Stadtverwaltung f (Universal Decimal Classification The Universal Decimal Classification is a system of library classification developed by the Belgian bibliographers Paul Otlet and Henri la Fontaine at the end of the 19th century. It is based on the Dewey Decimal Classification, but is much more powerful. ), libraries also pioneered the first major foray into Verb 1. foray into - enter someone else's territory and take spoils; "The pirates raided the coastal villages regularly"
encroach upon, intrude on, obtrude upon, invade - to intrude upon, infringe, encroach on, violate; "This new colleague invades my electronic information retrieval. The Dialog system A Dialog system is a computer system intended to converse with a human, with a coherent structure. Dialog systems have employed text, speech, graphics, haptics, gestures and other modes for communication on both the input and output channel. at the Lockheed facility in Palo Alto Palo Alto, city, California
Palo Alto (păl`ō ăl`tō), city (1990 pop. 55,900), Santa Clara co., W Calif.; inc. 1894. Although primarily residential, Palo Alto has aerospace, electronics, and advanced research industries. laid the groundwork for online searching and related software utilities that provide unique indexing capabilities for electronic files. Libraries have also contributed to knowledge organization through a variety of OPACs (Online Public Access Catalogs) and other public and technical services innovations. As libraries move away from these traditional systems grounded in service quality and outcomes frameworks, Professor Bertot discusses the challenges information professionals face in the networked environment.
To continue on the track developed by Bertot, the contribution from Adrienne Franco focuses on finding quality information on the Internet. She makes the point that librarians have long sought to select, organize, and evaluate information on the Internet. Her discussion includes the initial production of "webliographies" by librarians and then focuses on librarian-produced portals and portals with a high level of librarian participation.
Jerry D. Campbell examines portals from a more theoretical perspective. He discusses the Scholar's Portal project that builds on the need for a research library portal. Essentially, a scholar's portal (SP) describes efforts to create specialized subject portals for researchers, until such time as the Web becomes a digital library with seamless access to scholarly information. He builds on an earlier article by outlining the larger context within which SP falls.
As mentioned earlier, document organization is often by media type or even by domain name. A particularly good example of this is government information. Greg R. Notess provides a history of the government on the Web. He makes the point that the government is not only a major content provider on the Internet but also a source for the organization of the content. Patricia Diamond Fletcher continues the discussion of the government's involvement in organizing the Internet by providing a firsthand analysis of FirstGov.com based on a recent National Science Foundation-funded research project. FirstGov is the portal to U.S. government information and services. Her case study analyzes the reasons leading to the success of the portal.
Quite often the value of portals is to expose users to sources that they might not normally encounter in using general search engines. Even the best search engines index less than 20 percent of what is termed the indexable or "visible" Web. Many persons, even professional researchers, are not familiar with the invisible Web. Any discussion of organizing the Internet needs to address the invisible Web. The invisible Web consists of major databases and document formats that are not indexed by most general search engines. Less familiar, even to experienced searchers, are terms such as the "opaque Web" and the "Private Web." Chris Sherman and Gary Price discuss various permutations of the invisible Web. Their article should be of interest especially to end-users of the Web.
Classification of Web-based information is often determined by popularity, thus user preferences often prompt new methods of organization and access. Amanda Spink provides an overview of recent research exploring
what we know about how people search the Web. Her paper reports selected findings from studies conducted from 1997 to 2002 using large-scale Web user data provided by Excite, AskJeeves, and AlltheWeb. The results of the research will have an impact on subsequent methods of organizing the Web according to use.
Any discussion of publication activity or use cannot avoid the topic of copyright. More than ever before, Web publishers are blatantly ignoring intellectual property rights, especially with respect to multimedia. This leads one to ask if organizers of Web publications are also contributing to copyright violations by inadvertently facilitating access to questionable material. Part of the problem lies in attempting to interpret current legislation regarding ownership of electronic publications. Rebecca P. Butler discusses implications for organizing the Internet from the viewpoints of both the owners/publishers and users. She analyzes several strands within the dilemma of the Internet and copyright. Web-based copyright issues are also addressed by Jane L. Hunter in the context of XML-based vocabularies developed to define usage and access rights associated with digital resources.
The next two contributions focus on specific aspects of organization, including discussion of metadata standards and issues of access based on document structure and content. Jane L. Hunter provides an overview of key metadata research issues and current projects and initiatives for improving our ability to discover, access, retrieve, and assimilate information on the Internet. Of particular interest to the end user is her review of metadata search engine research. Kevin Crowston and Barbara H. Kwasnik continue the issue of vacabulary control in a somewhat different light. Their paper discusses the possibility of improving information access in large digital collections through the identification and use of document genre as a facet of document and query representation. They begin with a framework of the information retrieval problem with respect to genre and finish by outlining a research protocol that would provide guidance for identifying, using, and representing Web document genres
Sometimes the larger efforts to make Internet documents available fail to fit the local needs of individuals. For example, a teacher in the classroom may have his/her own idea of appropriate resources to complement a lesson plan. Also, traditional methods of classification fail to reflect the constructivist con·struc·tiv·ism
A movement in modern art originating in Moscow in 1920 and characterized by the use of industrial materials such as glass, sheet metal, and plastic to create nonrepresentational, often geometric objects. paradigm popular in some educational environments. The belief is that, in order to engage students for maximum learning, there must be some way to not only identify relevant Web sites but also develop ways to explore them. Thus, educators and librarians like to develop customized resource lists that are then also made accessible to other Web users. Don E. Descy describes a variety of tools and techniques that essentially represent an hoc method of organizing Internet resources. He makes the point that teachers can construct Web learning environments containing sale sites for students. These can also act as quality information filters similar to the current awareness services as implemented in special libraries in the early days of automation.
In summary, the authors have addressed several dimensions surrounding efforts to organize the Internet. The contributions are of particular value because the content should be of interest to a wide spectrum of users, including librarians, educators, and academic researchers. Furthermore, many of the topics are treated in a fashion that ensures their relevance for a significantly longer period of time than that associated with most activities in a rapidly changing technological world.
Andrew G. Torok, Professor, Department of Educational Technology, Research, and Assessment, Northern Illinois University , DeKalb, IL 60115
ANDREW G. TOROK is Professor Emeritus in the Department of Educational Technology, Research, and Assessment at Northern Illinois University, DeKalb. Formerly he taught for several years in the department of Library Science, also at Northern Illinois. He teaches classes in computer networking, online education, instructional technology, and several seminars that support a large doctoral program. Dr. Torok has been active in the information industry for four decades, working as a teacher, researcher, indexer, and abstractor. He has published and presented papers nationally and internationally and served as a consultant. His research interests have included ergonomics issues relating to technology, online user studies, and communication studies. His current research interests include technology ROI (Return On Investment) The monetary benefits derived from having spent money on developing or revising a system. In the IT world, there are more ways to compute ROI than Carter has liver pills (and for those of you who never heard of that expression, it means a lot). and electronic learning. He also continues to engage in technical writing.