Reviewing the World Wide Web--Theory Versus Reality.
THREE ATTEMPTS AT CONSENSUS LISTS OF EVALUATION criteria for the World Wide Web are compared with reviews in Choice magazine. Not only is there little agreement among sources on the most important or appropriate criteria for the value of a Web site, but few of the criteria appear in the sample reviews, suggesting a continued lack of consensus in these criteria. The extreme rapidity of change in the Web is suggested as a primary reason for this continuing state of disagreement.
The World Wide Web has been likened to a bookstore or library in which all the items lack titles, pages, indexes, or even covers, and in which the entire stock is just piled up in the middle of the floor (Gorman, 1995). While this may be an exaggeration, the rapid growth of the Web, and the ease of access both for the reader and for the publisher has meant an amazing growth in a new communications format in a very short time. Such growth, even without the lack of traditional "bibliographic" apparatus, would mean difficulties in selection.
Yet, this growth has been essentially uncontrolled. Within very broad limits, nearly anyone can "publish" anything on the Web without the usual limitations of publishing. The results of this "anarchy" or "democracy" (the preferred term seems to vary with the observer) may be seen by a recent analysis of Web sites (Connell & Tipple, 1999). A sample of one week's worth of ready reference questions asked at a public library was searched on the Web using the respected AltaVista search engine. Answers to each question were first verified in two separate printed sources and then sought through AltaVista. The first two screens, or up to twenty Web sites, were then each rated for accuracy in answering the questions.
A total of 1,160 different cited Web pages were retrieved in answer to the sixty questions. Of these, 144 citations were dead links--the pages were no longer available when the search was done. In addition, a total of 241 of the citations were duplicates of other pages (thirty-five of the duplicates were also dead links). However, more to the point of the present article, of the 1,010 sites, 160 (15.8 percent of live sites) provided complete and correct answers to the questions, and an additional 115 (11.4 percent) provide correct but incomplete information (such as a phone number without the area or country code). Eighty-nine (8.8 percent) sites gave incorrect information. The remainder--646 sites or 64 percent of the sites found--provided no information to answer the question at all. In brief, this study, the first of its kind to analyze a Web search engine as if it were a ready reference tool, found that the vast majority of sites obtained by an experienced searcher were irrelevant to the question, and a fourth of the sites which did contain relevant information provided incorrect information. Given these data which tend to confirm a common impression of the Web--that there is a high proportion of "noise"--analysis of site quality is therefore a critical need.
However, in addition to the quality of the sites themselves, one must also consider the user. In a classic article, Marcia Bates (1984) has noted a tendency in a given search for the user to be satisfied with a final set of about thirty items regardless of the size of actual retrieval or, apparently, of the precision of the search. Thus, if a search strategy retrieves less than the "magic" thirty, the searcher attempts to broaden the search; if a search retrieves much more than this number, the searcher attempts to limit the size. The problem here is that retrieval size rather than usefulness or relevance to need or subject becomes the most important (if not the only) criterion. Bates provides a number of possible reasons for the phenomenon, but the important point here is that the phenomenon not only appears (at least in this writer's experience in teaching information retrieval) to exist still, but also has become institutionalized in electronic search systems. Notably, many library-oriented systems, especially on the Web, tend to have a maximum default of forty to fifty items in a print or download. For example, while the Web-based versions of H. W. Wilson's databases permit a search result of apparently any size, not more than fifty items can be printed or downloaded regardless of the retrieval set (WilsonWeb, 1999).
Going beyond general impressions, the fact that this number (whether thirty or fifty) appears to be developing into an industry standard suggests that most users are comfortable with these retrieval results. And, given the fact that most Web search systems provide relevance ranking (often with the criteria for this determination quite vague or impossible to obtain), there is a high probability that users will only review about fifty sites before selecting those that they will use. Thus, it would seem important that the top sites retrieved be of a fairly high level of quality as well as relevance.
Interestingly, however, there seems to have been little concern about quality as such in the online environment until about 1990. A review of the literature in 1989 found that there were no guidelines, lists of criteria, or other tools for evaluation (Juntunen, Mickos, & Jalkanen, 1995, p. 207). However, this appeared to be changing, as some claimed that the 1990s had become the "decade of quality" (Jacso, 1997, p. 236).
The fact that Jacso's literature review barely mentions the end user and does not refer to the World Wide Web at all, is an indication of the rapidity of growth and degree of change in the online world. Yet barely three years later a considerable amount of literature exists specifically on the evaluation of Web sites with a substantial fraction of this literature concerning the Web itself (Auer, 1999). However, even given this discussion, there seems to be little consensus on how far traditional quality measures of the sort discussed in the Jacso piece apply to the Web, and what, if any, new measures should be used to supplement or replace these.
Since a number of attempts have been made to derive a comprehensive list of Web quality measures, it should be helpful at this rime to examine these to see how close the profession is to approaching a consensus. In this process, a useful reality check may be had by referring to reviews in Choice magazine.
Choice is well known as a reviewing medium for academic libraries. As such, beginning with 1997, it has published a supplement reviewing Web sites, applying much the same approach to selection and reviewing as it has to more traditional formats (including other online and CD-ROM sources). The test sample used for this article is selected from the 1998 edition (Choice, 1998). Issued as a separate supplement, this review source includes a total of 482 reviews, most based either on the 1997 supplement or reviews that appeared in volume 35 of Choice, although there are also ninety-two reviews written for the 1998 supplement. All sites reviewed were verified in June 1998, and titles, URLs, and text rewritten as necessary (Graf, 1998, p. 3). For use in this discussion, forty-eight of these reviews were derived in a systematic random sample, with reviews coming from all major subject sections of the source.
Given Choice's strong reputation in reviewing academic materials and its considerable experience in reviewing, the sample should thus reflect not only high quality reviews (as reviews), but a reasonable sample of what criteria are actually used by academics in evaluating Web sites. These reviews are compared with three lists of Web site quality with the understanding that the criteria apply to the sites rather than to the reviews themselves. However, it is not unreasonable to judge that, if a reviewer feels a need to mention a characteristic of a site, such mention implies a quality measure.
The first set of criteria discussed are those developed by the Southern California Online Users' Group (SCOUG) in 1990. The fourth annual retreat of this group was attended by librarians and searchers from all over the United States, as well as representatives of several online services and producers of online databases working on the theme "Measuring the Quality of the Data." The original goal was to provide a consumer-oriented guide to "judging the quality and reliability of databases in terms of their design, content and accessibility" (Basch, 1990, p. 18). The degree of change in the online database field may be indicated by the fact that this group considered only three types of databases--bibliographic, full text, and directory. There was no discussion of image databases, nor, for that matter, of CD-ROM or other laser disk formats and apparently no reference to the Internet/Arpanet at all. However, the guidelines were widely disseminated through conference presentations at NFAIS and publications and seem to have led to work by other groups to develop similar checklists of quality criteria (Basch, 1995, pp. 6-7).
It is telling that, even as late as 1995, the discussion of quality in the electronic environment dealt only with commercial online services and with CD-ROM databases, with very rare, if any, mention of the developing Internet information systems. In fact, it may be relevant, although no research appears to have been done on the topic, that the quality discussion starting in 1990 seems to have diminished by 1999, while the huge growth in the World Wide Web started about 1994/1995.
A number of the discussions of database quality did address the growing number of end users who were searching, but generally this is in passing--the assumed searcher was the professional. Whether search intermediary or subject expert, the searcher was a person who had at least some experience and some training in the principles of information retrieval. And it was assumed that this person would search databases of some kind which were produced by a commercial, academic, or relatively traditional "publisher"--the concept of the author being a common producer of the database was not mentioned at all.
Curiously, this quality-of-database literature often cites the Total Quality Management literature, sometimes explicitly stating that TQM, just being applied to the "manufacture" of information in electronic form, appeared to be the cause of the interest in quality (Jacso, 1997, p. 232). The curiosity, of course, is the lack of reference to the vast literature on information quality in a more traditional form--namely, the book review. Ignoring the vast number of actual reviews published, an ongoing analysis of the literature has confirmed the existence of over 3,000 items (primarily books and articles) describing the process, recommending criteria for reviewers and reviewing, or providing analysis of the process (Sweetland, in progress).
One of the issues connected with the SCOUG guidelines was the lack of distinction between responsibilities of the database producer and the database vendor/service (Granick, 1991). This issue does not seem to occur in current Web evaluation--in effect the developments of the last decade have verified the SCOUG approach--input distinctions are of little interest to the actual user of online information. Only the results are considered. Of course, this is not at all a new approach--in the vast literature on evaluation of printed source materials, there is very little discussion of the distinctions among the author, the editor (or the publisher), and the technical issues of printing and binding. In a book, for example, access to the content is also affected by such technical issues as kerning, size of text block versus page size, fonts used, clarity of printing (especially for graphics), as well as by the table of contents and the indexing. All of this is part of the clearly identifiable objective existence of the book in the reviewer's hands. Certainly one finds reviewers distinguishing between the author and the publisher in such comments as "the editor should have caught these spelling errors," but there is no question that there was one single entity, the publisher, who should have done these things. In this sense, then, developments on the Internet at the end of the century are actually returning at least this part of perception back to the more familiar print environment--the user cares about the product as seen, and comments on it with no real concern (and no real need for concern) for exactly who did exactly what.
Be that as it may, the SCOUG criteria do reflect the online information environment as of the early 1990s. They do not deal with non-text images (such as photographs) at all, and they do not mention CD-ROMs or the Internet. However, the criteria are still of interest for what they do and do not include.
SCOUG CRITERIA APPLIED TO CHOICE REVIEWS
The SCOUG criteria, and the comparison with Choice reviews, follow in order of their presentation in Basch's (1990) report.
There is very little information on the specific meaning of this term in comments but, overall, it appears to mean that each record in a database should follow the same rules and patterns. One could argue that this criterion, which is listed first, by the way, is too often violated by Web designers.
Only two of the reviews comment on this feature. In essence, the comments were regarding "bibliographic" sites that provided access to a number of other sites but for which the reviewed site provided some sort of consistent access.
Most of the specifics here relate to selection of material--e.g., are periodicals indexed cover to cover? However, the primary questions are there: How well is the field covered and how authoritative is the database? These questions also appear in the more recent criteria.
Forty-six of the Choice reviews commented on this aspect of databases at some length, often providing detailed discussions of the contents of the database; two reviews in effect relied on the title for the only information about coverage. However, few of the reviews commented specifically on the authoritativeness of the database or any apparent gaps in coverage.
In addition to what one would expect, one of the specific questions relates to differences in load cycles among database services. Such a question, relevant to an environment in which there are multiple sources for the content, remains relevant today, where the Web may not be the only source, but in fact does not appear in most lists of Web criteria.
Fourteen reviews include comments on timeliness, in most cases referring to how quickly time sensitive material was updated (e.g., news or social statistics); frequency was not mentioned (other than occasionally by words like "often"). None of the samples mentioned the date(s) of production. Nor, for that matter, did any sample reviews note differences in updating between the Web version and any other platform.
Questions here include reference to typos and a number of quality control questions as well as sources of data. Current criteria refer to the errors but do not ask about quality control--the difference being presumably in the lack of an identifiable entity which is supposed to engage in quality control and which could be queried about the process. In the Web environment, the last question in this category is particularly interesting--are searchers compensated for unusable information?
Only eleven reviews comment in this category, usually by referring to sources of the information; none referred to errors other than commenting on dead hyperlinks. However, one of the SCOUG questions did ask if the database allowed for user suggestions to correct errors. Although none of the Choice reviews mentioned this use of a contact system, six of the reviews did indicate that the site permitted an e-mail contact.
Accessibility/Ease of Use
For the most part, the specific questions here relate to the databases as they existed in the late 1990s. However, given the recent assumption of the Web, and its search engines, as the "way cool" wave of the future, many of the questions indicate what we have lost in the typical Web search. The following are among the elements that are suggested as part of an ideal database: full and variable proximity searching; word adjacency limits, if any; search of literals and stopwords as part of phrases; automatic pluralization, with ability to turn on/off; equivalencies (such as English versus American spelling); selection of terms directly from an online thesaurus; ability to save a search strategy and reuse; multilingual thesaurus; online thesaurus; depth of subject indexing; and which data elements are searchable versus only displayable. SCOUG also asks several questions about KWIC (key word in context) display.
None of the Choice reviews went into such detail. However, thirty-seven of them did comment on the ease of access (usually merely to say the site was easy to use)--often, however, with some comments on search capability, such as free text keyword or Boolean search. A sign of how much the Web has changed the online environment may be found in the fact that only one review mentioned the existence or use of a thesaurus--one of the specific issues in the SCOUG list of questions on this topic.
This category essentially asks how well the whole system behaves consistently--e.g., can multiple files be searched, is the data structure similar in different files? Twenty-three reviews note other sites or print resources that cover similar material or provide general comments such as "this is the most complete such site." Thirty reviews, however, specifically refer to the availability of hypertext links, a question obviously not directly asked in 1990.
Again, a number of questions are still relevant, among them: availability of custom formats designed by user, ability to print partial pages or partial documents, and ability to download search output. Twenty-one reviews commented on the aesthetics of the site. The rest of the SCOUG questions, referring to the more traditional databases, are not mentioned at all--e.g., the ability to download or print partial documents. Since this sort of ability is now based on the browser used rather than a given database, there is, of course, no reason to comment.
A number of these questions are also still relevant and remind the reader of what used to be considered common. Among these are availability of a print thesaurus; timely online and print documentation; regular newsletters and search aids; information about the limits of the database and the like provided upon login; and information on selection, coverage, currency, and the like.
Only four reviews in the Choice sample made any reference to documentation of any type; again only one referred to the use of a thesaurus. One Web feature found in some sites is the capability of signing up for an update service--the user then receives e-mail when the database is updated. One Choice review noted the availability of such a feature.
Customer Support and Training
Where and when is it available and how much (if anything) does it cost? Are there user groups and are they supported by the producer or service? None of the sample had any mention of such a thing.
Value to Cost Ratio
These questions assume there will be some sort of charges, and most do not directly apply to nominally free databases and thus are less relevant to the Web. However, several again are still relevant such as how long does it take for a screen to fill, can documents be scrolled, can search results be sorted or relevance ranked?
This category refers more to commercial databases which charge some sort of fee. Only two Choice reviews discussed prices (although five sites do require a fee). None of the sample answered the other sorts of questions suggested by SCOUG.
SCOUG also presented guidelines for three specific kinds of files--bibliographic, full text, and directory, these being the main types of files at the time the guidelines were produced. Those for full-text databases are still relevant to the Web, including: "Fully searchable records, with field searching possible as well; ... On/off toggles for automatic pluralization, equivalencies, synonyms, etc." (Basch, 1990, p. 22). These topics do not appear in the reviews.
UNIVERSITY OF GEORGIA CRITERIA: WILKINSON AND COLLEAGUES
Just as the earlier criteria were based on the "best practices" as determined by a number of expert searchers, many more recent sets of specifically Web criteria are also based on some sort of consensus. One approach is based on an examination of the criteria operationally used by those who have reviewed sites. Since the very nature of the Web seems to require some attempt to organize or at least guide users, there are a number of Web sources now available which do in fact evaluate, and thus implicitly, if not explicitly, have established criteria for evaluation.
The most comprehensive attempt to date to develop Web evaluation criteria uses this approach (Wilkinson, Bennett, & Oliver, 1997; Oliver, Wilkinson, & Bennett, 1997). This project at the Department of Instructional Technology at the University of Georgia, led by Gene L Wilkinson, began with a compilation of a very lengthy list of quality indicators based on a combination of sources. In their case, these include a review of the extant literature (as of 1996-1997) and authorities on reviewing and library reference materials but also contact with compilers of respected Web sources and online guides to selected Web resources. The primary source of criteria, as it turned out, was a combination of contact with the compilers of Web directories (via personal contact) and examination of the stated selection criteria of Web and print sources that provided lists of recommended Web sites.
The list of compilers of highly regarded Web sites was based on use of one of the Web guides, The Clearinghouse for Subject-Oriented Internet Resource Guides, now the Argus Clearinghouse, based at the University of Michigan. Wilkinson and colleagues examined the 116 Web guides on the database as of March 1996 and selected the fifty-eight guides that received a rating of four or greater (on a one- to five-point scale) on two scales--the overall rating and the quality of the sites' resource evaluation. This generated a total of fifty-eight sites with high ratings (Wilkinson, 1996).
Many of the problems in evaluating Web sources are indicated by this process. First, of course, is the fact that only one "reviewing" source was used to select high quality sites. While not to criticize Argus, this is in effect using only one reviewing source of reference sources, such as Reference and User Services Quarterly, to compile a similar list of the "best" reference sources. The reasoning behind the selection, while not stated, obviously includes the assumption that a university-based rating service will include subject expertise and lack any bias toward, say, particular "publishers." In addition, unlike many of the other sources, Argus includes a detailed description of its rating criteria and again, unlike many Web rating sources, includes the sort of criteria that have over time been applied to other information sources--or, to put it bluntly, Argus does not include "coolness" or "fun" as criteria.
Another problem with the Web indicated by the Wilkinson methodology is its dynamic nature. Of the fifty-eight guides found, only forty-four were still actively maintained when contacted in 1996--or only about 75 percent of the sources were still "active" (Wilkinson, Bennett, & Oliver, 1997, p. 53). Since the Web could hardly be said to exist before the mid 1990s, in effect this means that one-fourth of the highly rated reference "bibliographies" were out of print within less than five years.
In any event, using the individuals noted above, plus the other online and print sources, Wilkinson and colleagues collected 509 possible rating criteria. Naturally, there was significant overlap in these criteria--after elimination of duplicates and of purely subjective criteria (such as "good items"), the final list of potential criteria include 125 items. These were consolidated into eleven categories with these arranged into a logical progression--in other words, the order of these categories has importance.
Since 125 criteria were still too many for practical application, the next step of the project was to send these to a panel of "reviewers" who voted on their importance. The panel consisted of thirty (of thirty-six) compilers of sites from which the original list was developed and thirty-four new people, again based on use of the Argus list of highly regarded sites. Thus, the 125 criteria are ranked by sixty-four people who actually have created sites which recommend sites (Wilkinson, 1996).
The ranking was based on a six point scale (1 = irrelevant, 6 = essential). In addition, the raters were asked, for each question, whether the criterion applied primarily to the quality of the information, the quality of the site, or equally to both (Oliver, Wilkinson, & Bennett, 1997, pp. 2-3). The criteria were then classified as site or information related if at least 50 percent of the raters so stated--thus there are criteria on the final list that appear on both lists. Having received the responses, the authors then presented two final lists, including apparently all criteria which received a rating of "important" (Wilkinson, Oliver, & Bennett, 1997).
The final list consists of thirty-six information quality criteria and thirty site quality criteria; given the overlap, the consolidated list of what, for lack of a better word, could be called the operational list of Web site developers' evaluation criteria consists of fifty-two elements (Oliver, Wilkinson, & Bennett, 1997, pp. 4-5). While this was done in 1997, little work on these criteria has appeared since then, except for a consolidated list of the criteria added to one of the Web pages in 1998--apparently a recommended evaluation form.
THE GEORGIA CRITERIA AND CHOICE
Site Access and Usability
This category involves many elements which could be construed as bibliographic identification, such as sponsor, price, and URL, but also includes such things as price, rules for use, security of information entered, and others not needed for any traditional medium. This category included only one of the fifty-two criteria found in all forty-eight of the Choice reviews--title of site, as well as the third most commonly used criterion, sponsor, listed by twenty-seven of the reviews. The only other of the six critical questions here, answered by five reviews, is "is the site commercial?" Of course, one could argue that this only need be mentioned when the site is, in fact, commercial.
Resource Identification and Documentation
Note that this would appear to be more "bibliographic description" but is actually treated separately and includes not only such things as title and URL (of the document versus the site, as in the previous criterion), but also the apparent audience, mission, or scope of document, description of the document and, interestingly, "Is the user informed of improper or controversial materials (e.g., adult language, sexually explicit material, gratuitous violence, and so on) within the document?"
This section also asks for the title, in this case, of the document. In the Choice reviews, site and document are, in effect, interchangeable, thus title can be said to be listed by all forty-eight of the reviews. Another question in this category is that of contents description, actually given in detail by only thirty-eight of the reviews; audience, commented on by twenty-eight (but of course, one could argue that all Choice reviews assume a general liberal arts college audience), and mission/scope of the site, noted by twenty-four reviews.
Includes both name, affiliation and the like, as well as training or experience of author, and such contact information as e-mail address, phone number, and other information regarding the funding of the site. This set of criteria is not well covered by Choice. Fifteen reviews give a personal author name (or names), and thirteen also give the author affiliation.
Authority of Author
This is treated as a separate category from either the bibliographic information or the author identification. It includes questions about training and experience, other publications, and the nature (as opposed merely to the name) of any affiliations. This question is not once answered (other than by affiliation) by any Choice review--Choice tends to refer to authority by corporate body or by the author's general affiliation.
Information Structure and Design
Although the criteria listed are said to be quality related, most of the quality questions involve access issues, such as variety of features, use of icons, language of document, and so on as well as such questions as does the content fit the stated scope, purpose, and audience? This version of the scope question asks whether or not it is clearly stated, a question answered only thirteen times as such. But, as noted earlier, scope is mentioned in thirty-eight reviews--thus one might assume that the scope of the site is clear. The most commonly answered question here, however, deals with the issue of whether the content actually fits the scope and the audience, a question addressed by twenty-four of the reviews. Choice also indicates whether the graphics and design contribute to the content of the site fifteen times and mentions the variety of features (search engines, photography plus other graphics, and so on) nine times.
Relevance and Scope of Content
Note this is the sixth set of criteria in order of importance and the first which could be said to be similar to traditional review/evaluation criteria. These issues are rarely directly addressed in the sample, with seven reviews indicating currency of the material, and five specifically commenting on how well the content relates to the user's apparent needs. Of course, thirty-eight do discuss scope in the context of describing the content.
Validity of Content
This includes a rather broad set of criteria, including existence of bibliographic documentation, links to other sites, and existence of links to this site from recognized authority. Interestingly, one of the initial criteria was also does the author follow a recognized style manual to cite references and quoted materials? The final set of criteria asks only one question--is there any documentation? Only one of the sample reviews comments at all on this issue.
Accuracy and Balance of Content
This category includes questions about presence of stereotypes, apparent bias of either author or sponsor of site, how clearly any biases are identified, and "are there any obvious errors or misleading omissions in the document?" This category does not fare much better--reviews only comment on potential bias twice and indicate a vested or commercial interest in five reviews (and then never in the context of bias as such versus mere identification).
Navigation within the Document
Organization scheme, use of image maps, indexes and the like, existence of search function, and the existence of a help system (including the important question how helpful is the "help" system?). Actually, twenty-nine of the Choice reviews do comment on the presence or absence of an index or table of contents, and twenty-one refer to the overall quality of organization of the site. These questions do relate to the last category, number eleven, which asks generally about the aesthetic aspects of the site.
Quality of the Links
In some ways this set of criteria is similar to the (rarely asked) question of what sort of citations are in a printed source. The thirteen questions in effect ask a number of ease of use and content quality questions about the links--questions which are almost never answered in traditional reviews--such as Are the links evaluated in any way prior to inclusion? and What are the link selection criteria, if any? Only eighteen of the reviews make statements about the quality of the links, usually along the lines of appropriateness, although eleven reviews also comment on whether or not the links are up to date.
Aesthetic and Affective Aspects
Essentially, these are questions similar to the format and style questions asked in a traditional review--originality, creativity, quality and ease of design features, legibility, consistency, and the like. Twelve of the sample reviews mention the use of creativity or how hypertext and other Web features add or subtract from the contents.
GEORGIA CRITERIA NOT APPEARING IN CHOICE
Given the fact that Wilkinson's panel ended up with fifty-two criteria, it is of interest how many do not appear in Choice. Those which never appear in the sample include: availability of secure transactions; a clear warning of the controversial nature of the site; whether or not the site moved recently and its older address; whether or not the author is an authority on the subject; presence of any obvious errors; availability of menus or similar ability to narrow the retrieval from the site; whether or not the links are annotated; the type of file to which a link connects; and whether or not the interface from one page to the next is consistent within the site.
There are several important criteria mentioned only once or twice in the Choice sample: can the user usually access the site (e.g., is it often down?); is the price of the site clear (if commercial); the date of the last revision of the site; whether the title clearly describes the content; any obvious gaps or omissions in the content; presence of a bibliography or other documentation; clarity of how-to-use instructions on the site; and selection criteria for links. Since five of the Choice sample sites are commercial, it is of some concern that there was no comment on whether or not the price was clear in four of the reviews (although the five reviews of the fee-based sites did indicate the price of the site). However, aside from the question of the price sticker, all the other questions do appear to have relevance to academically oriented sites.
On the other hand, Choice reviews also make comments which do not appear at all in the Wilkinson list. Among these are comments on the quality of the writing and of the sound, the price (Wilkinson merely asks if the price is clear, not what it is), and the ability of the reader to add notes to the site.
GEORGIA CRITERIA REDUX
Although the Georgia project apparently was intended to continue, little has appeared since 1997 except an abbreviated form (Wilkinson, Oliver, & Bennett, 1998). This form (unfortunately provided with little discussion) in many ways approaches the sort of form suggested by Rettig and LaGuardia and other librarian reviewers. In addition to four (versus eleven) categories of quality plus an overall quality rating, the form includes a set of introductory information, not apparently considered as quality rankings--the document URL and title, author's name and position, and sponsor/host name. This information (roughly the bibliographic citation to the site) is followed by the four specific categories, each having four questions to consider, after which the reviewer is supposed to rate the element on a 1-5 scale (1 = poor, 3 = average, and 5 = excellent). Contrary to the earlier versions, the first set of criteria is now "quality" followed by "organization," "links," and "graphics" in that order. While this order of possible importance and the nature of the questions asked appear much closer both to traditional librarian criteria and to the actual comments made in the Choice reviews, unfortunately the authors do not provide any commentary or discussion on how or why the former list of questions and order of criteria have been collapsed.
Interestingly, the overall rating (again on a 1-5 scale) asks the question How well does this document/site address your problem or meet your information needs? In effect, then, this most recent version of the Georgia criteria ultimately boils down to Ranganathan's (1931) First Law of Librarianship--"Books are For Use."
OTHER CONSENSUS LISTS
There are a number of other lists of criteria similar to the SCOUG and Georgia lists, although possibly not as elaborate. In the interests of completeness, however, they should at least be mentioned.
Alastair Smith's Criteria
This list appeared in 1997 in a Web-based periodical (Smith, 1997). While citing a number of traditional, as well as Internet, evaluation criteria, the major contribution of this article is a handy table summarizing a "toolbox" of evaluation criteria and providing ten Web reviewing sites' use of these. Although Smith lists twenty-six criteria, the only one appearing in all ten sites is "graphic design" (Smith, 1997, p. 7). The next most common of his toolbox are "currency" and "browsability," both found in eight sites, and references to "content," found in seven. Overall, as with the data already seen above, the more common criteria relate to appearance and ease of access rather than authority or content.
Another project, this time based in Great Britain and intended to guide the selectors (as opposed to the reviewers) of sites, appeared as one of several "deliverables" from the Development of a European Service for Information on Research and Education (DESIRE, 1996). The project conducted a literature review and an examination of Internet reviewing and selection sites, but its primary contribution was the examination and survey of a number of selective subject gateways (academic sites which emphasized the human element in selecting quality sites) (DESIRE, 1996, pp. 6-11). As with other similar studies, the final project was a long list of criteria, that was then reduced by further consolidation and user reactions. Other than this list and a lengthy bibliography based on Auer (1999), the attraction of this product is the inclusion of comments on the report, including the criteria, from several peer reviewers. After some further work, the project arrived at a total of 125 criteria but since then appears to have become more interested in the "cataloging" aspects of the Web, with the criteria appearing in one or more metadata fields (DESIRE, 1999).
An earlier effort, conducted in 1993-1994, but only reported in the literature in 1998 (Wilson, 1998), was conducted by the EQUIP consortium, which included the European Association of Online User Groups (EUROLUG). Based heavily on the SCOUG criteria, a survey was sent to EUROLUG members in twelve European nations with separate forms being used for CD-ROM and for online databases. The overall response showed coverage, accessibility, and timeliness as the most important criteria, followed by consistency, accuracy, and value (all rated over 2 on a scale of 0-3) (Wilson, 1998, p. 348). However, the most important point of this study is the finding that the ranking of the SCOUG criteria varied among countries (Wilson, 1998, pp. 349-50).
This same study, along with the DESIRE project, also became involved with a variant of the SERVQUAL methodology. This approach, which apparently is becoming popular in library and information circles although rooted in the manufacturing sector (Hernon & Altman, 1998), uses a standard set of questions to obtain user expectations and perceptions of how well these are fulfilled. The results of this part of the project, sent to users of CAB Abstracts, show that users consider time lag, indexing, coverage, availability of manuals, error correction facilities, and comprehensiveness as the most important criteria with reliability ranking seventh and validity twenty-ninth (Wilson, 1998, p. 354).
RETTIG AND LAGUARDIA
While the tendency of online users is to assume quality or to be more interested in aspects of quality other than content, validity, reliability, and the like, a number of librarians have also taken part in the ongoing development of review and evaluation quality criteria. One of the most useful of these attempts is itself based on librarian-created and maintained Web sites as a basis of the consensus.
James Rettig is a highly respected reviewer of more traditional reference materials who has a Web site and a pattern of commentary on reviewing as a professional activity. He had already dealt with the issue of Web reviewing in the past (Rettig, 1995, 1996), where he both analyzed the extant reviewing sources and suggested the development of Web-specific criteria for reviewing along the analogy of the criteria for reference books. The publication of "Beyond Beyond Cool" about three years later is, in effect, Rettig's answer (with LaGuardia) to the challenge, based heavily on analysis of criteria in eight other sources (Rettig & LaGuardia, 1999). These are, of course, filtered through Rettig's own substantial skill and experience in reviewing added to his and LaGuardia's experience with Web resources.
Rettig and LaGuardia's Criteria
Here is how the Choice reviews stack up against the librarians' Web-based review criteria.
Provenance (roughly the equivalent of the title page, giving author, producer, and some background such as purpose, age, and scope of organization). Other than giving the Web address, the URL (Uniform Resource Locator), only forty-two of the reviews explicitly tell what person or organization (s) sponsors and develops the site, plus one other that gives a fairly vague indication. In other words, five of the reviews, or about one in ten, do not actually clearly identify the responsible body for the site.
Authority. Rettig and LaGuardia's criteria seek some indication of the creator's expertise, background, experience, and the like as a source of authority for the content of the site. As they note, and as even most novice users of the Web rapidly find out, since almost anyone can publish almost anything on the Web, the question of authority is important. Other than, for example, giving the creator's address or affiliation, only twenty-four reviews provide such information, although eleven more do imply the authority, for example, by indicating such statements as "a group of experts." Or thirteen (about 27 percent) do not provide any definite statement of background.
The issue of authorship and its authority in the anarchy of the Web to date, is quite important. Normally, aside from so-called vanity publishing, one can assume that there has been some selection process in the chain from manuscript submission to acceptance to final publication and, usually, some editorial fact checking. Thus, although in a broad sense there are some problems of "establishment" bias in relying on author affiliation, for example, at least some basic credibility can be assumed if one knows where the author lives and works. However, even in printed publications, the affiliation of a person may merely represent the fact that they are a student at a given university, not that they particularly have a broad or deep background as a scholar and teacher of the subject.
Content. All but two of the Choice reviews comment on content and in fact many spend the preponderance of space in describing the nature of the contents. This is not surprising since content is the only criterion Rettig and LaGuardia found listed by all eight of their librarian-designed criteria. About the only surprise here is that two of the forty-eight reviews do not comment on this feature.
Creation and Currency (date of creation and update, update frequency, and existence of live links). In effect, all these are variations of time--an element usually found in reviews of more traditional sources in the bibliographic citation, except for the comments on recency of the bibliographic citations, which are probably the closest print equivalent to Web links. As it happens, very few of the reviews state the creation or update date or frequency (a total of eleven, in fact). On the other hand, twenty-two reviews comment on the currency of links--mostly in the context of indicating if all links were still active or not. On the other hand, thirty-five of the reviews made some comment on links--obviously ten of these did not mention whether the links were all current.
Usability (this relates to a knowledge of the audience, another criterion of the set of criteria). Rettig and LaGuardia recommend that a site clearly indicate the nature of the assumed audience, and that there be clear evidence that the site is designed for immediate use and that it is understandable to the audience. Choice reviewers either do not see the audience as worthy of comment all that often or perhaps they feel that, since the audience of the journal is presumably librarians in academic institutions, the users of any site reviewed would be academics. In any case, thirty-nine of the reviews make a specific recommendation for audience, two more are rather vague, and the rest (seven) do not comment at all on the most suitable audience. Usability garners comments in forty-one of the reviews, although there is rarely a link between the audience as such and the comments.
Design and Use of the Medium. The two last criteria relate to more subjective judgments and, to a great degree, esthetic ones. These are design, including the availability of internal links, and general good use of the medium. The latter is an attempt to answer whether or not the information conveyed would have been as well done via another medium.
Since Choice book reviews, for the most part, say little about the binding, paper, or other esthetic aspects of the item, it is perhaps surprising that a total of thirty-four Web reviews do comment on the design. However, the comments are usually related to usability rather than to esthetics as such and rarely include reference to the presence, absence, or utility of internal links. References to the medium as a whole are much less common with only twenty-one reviews making even a cursory mention of whether the use of hyperlinks, audio and video, and other multimedia features are present, let alone if the information could have been better presented in some other medium. When such comments do appear, they are usually of the nature that an online source's information can be updated faster than print, but they do not compare the Web with other forms of online databases (such as purely text-based ones).
LINKS AS A QUALITY MEASURE
While a number of the categories in all three of the comprehensive lists of criteria clearly are similar to, if not identical to, the criteria for a good book, one is just as clearly new--the links. Unlike the traditional footnote or bibliographic reference, the link, properly considered, provides a direct connection to the item cited. Ideally, as from the above, a link should include the actual live connection as well as the title and, preferably, some indication of its nature. However, unlike the more traditional citation, an important part of the link is that it be "live"--that it actually does connect to the site which is cited.
Unfortunately, the Web is notoriously dynamic--links change their nature, they move, they disappear. Unlike a more traditional citation, one cannot assume that the existence of the citation actually means the document is available anywhere, at least not to the general public. This issue has already been addressed in a recent article by Sweetland (1992), but the nature of the problem is even greater than that discussion suggested. A recent report of an ongoing study by the Online Computer Library Center, using IP addresses as the definition of "site," found that 44 percent of the addresses identifying a site in 1998 no longer did so in 1999 (OCLC, 1999). While there has been some concern by librarians and publishers about the time during which books go out of print, the statistics regarding Web sites disappearing are still shocking. After all, the mere fact that a book may go out of print does not mean that it ceases to exist in libraries but, unless a library has downloaded and cataloged a Web site, a change in a Web address does, in effect, mean that the site has ceased to exist.
Since many of the Web sites followed by OCLC may be personal sites (roughly the equivalent of vanity publishing), the 44 percent figure may overstate the problem. A more conservative estimate may be found by examining a more or less traditional Web site, which has a selective list of the sort of sites not likely to disappear.
One of the oldest and most comprehensive "bibliographies" on the evaluation of resources on the Internet was developed by Nicole Auer (1999), originally for a panel discussion at the University of Wisconsin, and kept up to date by her since then. Since this document is regularly cited by other Internet and print sources which discuss the issue, it presumably meets the consensus criteria for a "good" site, as well as for an important one, and thus provides a bit of a test case for how well such a site meets the criteria discussed above.
First, it is of interest that a number of the references to this cite, both in printed material and in Web sites, is incorrect. The original site appeared under the title "Bibliography on Evaluating Internet Resources" as of 1998 at the URL http://refserver.lib.vt.edu/libinst/CRITTHINK.HTM. This URL still appears not only in print sources, of course, but also in a number of other sites, including several of the current search engines, such as AltaVista. Perhaps of more interest is the fact that some of the citations to this site generate the expected "site not found" messages, which are the bane of the Web searcher, but that several of the citations actually do link to an earlier version of the site--presumably downloaded to the site indicated at an earlier date. The current version of the site may be found at http://www.lib.vt.edu/research/libinst/evalbiblio.html. This version, as of September 15, 1999, had been updated on June 6, 1999.
The site consists of two sections, "Internet Resources," occupying about four and a half pages, and "Print Resources," taking up about three and a half pages. The first section includes the author(s), title, URL, the sponsor (apparently if available), and the last date visited. Many of these dates are 1996 and 1997 dates; many are, of course, more recent. In checking the links, however, one finds that not all this information is wholly correct:
* Title changed but the URL has remained the same (3)
* URL has changed but the old address provides a direct, automatic link (1)
* URL has changed but the old address gives the new address and a link, which must be clicked in order to access the site (1)
* URL has changed, site exists, but is now difficult to access (2)
* Site is not at the old address and has no reference to a more recent address (5)
* Server has been down for some time (i.e., a month or more) (1)
A total of thirteen out of fifty-six citations are technically incorrect; six of these, in effect, no longer exist. Presumably, all the print sources still exist in libraries, even in the cases in which the publisher has let the document go out of print. Thus, an academic site, with both citations from print and links from online sources, shows about 11 percent of its URLs (not quite the same as IP addresses, of course) no longer exist.
For comparison purposes, it is useful to look at the Choice sample of a set of reviews current as of June 1998. Of these forty-eight sites, we find that two have a new URL but the same name, two have a new URL plus a new name, and four cannot be found. The search for these sites was conducted several times in the period from February through August 1999. On each try, several of the forty-eight sites were not available on the day of the search but then appeared in later searches.
Three consolidated lists of criteria for a good database have been compared with an actual set of reviews and appearing in a respected source routinely used by academic librarians as a selection tool. Unlike the criteria themselves, which do not directly deal with such practicalities as the length of the review, the Choice reviews must fit into practical guidelines, notably for length. Thus, as is true for any of the short review formats, as in Library Journal and Booklist, to name two other sources, the reviewer is constrained to make assumptions and to make every word count. Thus, one can argue that Choice reviews provide a useful practical guide to the criteria for a good Web site which must be mentioned in order to make an informed selection.
Perhaps the most important result of this examination of Web review criteria is the general lack of comments on authority, reliability, and the like, along with rather sparse commentary on the content (beyond merely a listing of the topics covered). As far as Choice is concerned, one could argue that the review editor and the reviewer already describe high quality sites, may well have compared them with other similar sites, and have selected the best. However, unlike Choice's book reviews, it is surprising how few reviews provide any comparative information on alternative sites.
The general low degree of concern for traditional quality measures may be related to the above phenomenon. Clearly, if the item evaluated changes constantly, not only is the production of a considered review rather difficult, but one could argue that it is of little point. By the time the reviewer has considered the material, actually written the review, and the user/reader has read it and then seeks out the item reviewed, if it has changed, then there is no real point to the review since it no longer corresponds to the evaluation.
Overall, though, the current impression is that users and developers of Web sites are more concerned about ease and variety of access and even aesthetics than with traditional aspects of quality--i.e., reliability, validity, accuracy, and the like. As the World Wide Web continues to develop, it is not beyond the realm of possibility that such "old fashioned" criteria will cease to exist as a major element in selection. However, it is also likely that those with the training and attitudes of a "librarian" (a word with a certain amount of negative connotation at the turn of the millennium) will provide some level of such quality so that the users need not be so worried. Of course, this latter will only apply if Internet sites become selected by professionals, much as other forms of material have been selected in the past.
Auer, N. (1999). Bibliography on evaluating Internet resources. Retrieved July 1, 1999 from the World Wide Web: http://www.lib.vt.edu/research/libinst/evalbiblio.html.
Basch, R. (1990). Measuring the quality of the data: Report on the fourth annual SCOUG retreat. Database Searcher, 6(October), 18-23.
Basch, R. (1995). An overview of quality and value in information services. In R. Basch (Ed.), Electronic information delivery: Ensuring quality and value (pp. 1-10). Brookfield, VT: Gower.
Bates, M.J. (1984). The fallacy of the perfect 30-item online search. RQ, 24(1), 43-50. Choice. (1998). Current [Web] reviews for academic libraries (vol. 35, supplement). Choice, 35, 35.
Connell, T. H., & Tipple, J. E. (1999). Testing the accuracy of information on the World Wide Web using the AltaVista search engine. Reference & User Services Quarterly, 38(4), 360-368.
DESIRE. (1996). Specification for resource description methods part 2: Selection criteria for quality controlled information gateways. Retrieved July 1, 1999 from the World Wide Web: http://www.ukoln.ac.uk/metadata/DESIRE/quality/report.rtf.
DESIRE. (1999). Research deliverables: D3.1. Quality ratings in RDF. Retrieved July 1, 1999 from the World Wide Web: http://www.desire.org/html/research/deliverables/D3.1.
Gorman, M. (1995). The corruption of cataloging. Library Journal, 120(15), 32-33.
Graf, F. (1998). WebII: A Choice Internet reference tool. Choice, 35(suppl.),
Granick, L. (1991). Assuring the quality of information dissemination: Responsibilities of database producers. Information Services & Use, 11, 117-136.
Hernon, P., & Altman E. (1998). Assessing service quality: Satisfying the expectations of library customers. Chicago: American Library Association.
Jacso, P. (1995). Content evaluation of databases. In M. E. Williams (Ed.), Annual review of information science and technology (vol. 32, pp. 231-267). White Plains, NY: Knowledge Industry Publications.
Juntunen, R.; Mickos, E.; & Jalkanen, T. (1995). Evaluating the quality of Finnish databases. In R. Basch (Ed.), Electronic information delivery: Ensuring quality and value (pp. 205-219). Brookfield, VT: Aldershot.
OCLC. (1999). Web characterization project. June 1999 Web statistics. Retrieved August 12, 1999 from the World Wide Web: http://www.oclc.org/oclc/research/projects/webstats/statistics.htm.
Oliver, K. M.; Wilkinson, G. L.; & Bennett, L. T. (1997). Evaluating the quality of Internet information sources. Retrieved August 12, 1999 from the World Wide Web: http://itech1.coe.uga.edu/Faculty/gwilkinson/AACE97.html.
Ranganathan, S. R. (1931). The five laws of library science. Madras & London: Madras Library Association.
Rettig, J. (1995). Putting the squeeze on the information firehose: The need for Neteditors and Netreviewers. Retrieved January 8, 1999 from the World Wide Web: http://www.swem.wm.edu/firehose.html.
Rettig, J. (1996). Beyond "cool": Analog models for reviewing digital resources. Online, 20 (5), 52-64.
Rettig,J., & LaGuardia, C. (1999). Beyond "beyond cool": Reviewing Web resources. Online, 23(4), 51-55.
Smith, A. G. (1997). Testing the surf: Criteria for evaluating Internet information resources. Public-Access Computer Systems Review, 8(3). Retrieved January 8, 1999 from the World Wide Web: http://info.lib.uh.edu/pr/v8/n3/smit8n3.html.
Sweetland, J. H. (1992). Humanists, libraries, electronic publishing and the future. Library Trends, 40(4), 781-803.
Sweetland, J. H. (In progress). The literature of book reviewing--a review.
Wilkinson, G. L. (1996) Evaluating the quality of Internet information sources: Panel of experienced Internet users. Retrieved August 12, 1999 from the World Wide Web: http://itech1.coe.uga.edu/Faculty/GWilkinson/panel.html.
Wilkinson, G. L.; Bennett, L. T.; & Oliver, K. M. (1997). Evaluation criteria and indicators of quality for Net resources. Educational Technology, 37(3), 52-59.
Wilkinson, G. L.; Oliver, K. M.; & Bennett, L. T. (1997). Quality indicators as ranked by experienced Internet users. Retrieved August 12, 1999 from the World Wide Web: http://itech1.coe.uga.edu/Faculty/GWilkinson/rankings.html.
Wilkinson, G. L.; Oliver, K. M.; & Bennett, L. T. (1998). Internet information evaluation form. Retrieved August 12, 1999 from the World Wide Web: http://itech1.coe.uga.edu/Faculty/GWilkinson/EvalForm.pdf.
Wilson, T. D. (1998). EQUIP: A European survey of quality criteria for the evaluation of databases. Journal of Information Science, 24(5), 345-357.
WilsonWeb. (1999). WilsonWeb: Print entries. Library Literature. Print Records. Retrieved September 1, 1999 from the World Wide Web: http://vweb.hwwilsonweb.com/cgi-bin/webspirs.cgi.
James H. Sweetland, School of Library and Information Science, University of Wisconsin--Milwaukee, Box 413, Milwaukee, WI 53201
JAMES H. SWEETLAND is Associate Professor in the School of Library and Information Science at the University of Wisconsin--Milwaukee, where he teaches courses in information retrieval, reference and information services, and collection management. Online since 1976, he is the author of numerous articles and papers on the subject of his teaching.
|Printer friendly Cite/link Email Feedback|
|Author:||SWEETLAND, JAMES H.|
|Date:||Mar 22, 2000|
|Previous Article:||Love's Labour's Lost: The Failure of Traditional Selection Practice in the Acquisition of Humanities Electronic Texts.|
|Next Article:||Collecting Full-Text CD-ROMs in Literature: Theory, Format, and Selection.|