Tools for creating your own resource portal: CWIS and the Scout Portal Toolkit.ABSTRACT
Creating a full-featured resource portal on the Web is no small task, and it can be even more of a challenge without a team of Web designers and programmers. In the fall of 2000 the University of Wisconsin-Madison's Internet Scout Project (Scout) received funding from the Mellon Foundation Mellon Foundation, officially the Andrew W. Mellon Foundation, philanthropic trust formed (1969) through the merger of the Avalon Foundation (est. 1940 by Ailsa Mellon Bruce) and the Old Dominion Foundation (est. 1941 by Paul Mellon). to build an open-source software package intended to enable collection developers to share their collection's metadata via the Web. In October of 2002 Scout began a new effort, funded by the National Science Foundation (NSF NSF - National Science Foundation ) as part of the National Science Digital Library The National Science Digital Library (NSDL) is a free online library for education and research in science, technology, engineering, and mathematics.
The National Science Digital Library (NSDL) Program was established by the National Science Foundation (NSF) in 2000 as a free (NSDL NSDL National Science Digital Library
NSDL National Science, Technology, Engineering, and Mathematics Digital Library
NSDL National Securities Depository Limited, India
NSDL Non Secure Data Link ) initiative, to build upon prior work and create a software package that would help STEM (Science/Technology/Engineering/Math) content authors and collection developers share their work online and integrate it into NSDL. The software packages resulting from these two projects, the Scout Portal Toolkit (SPT (Sectors Per Track) The number of sectors in one track. ) and the Collection Workflow Integration System (CWIS (Campus Wide Information System) An information retrieval system used in colleges and universities before the Web became popular. Students and faculty would Telnet to a CWIS location to find course catalogs and schedules, job openings and the like. ), are very inexpensive to maintain and operate and easy for nontechnical staff to download, set up, and populate To plug in chips or components into a printed circuit board. A fully populated board is one that contains all the devices it can hold. with metadata. Conforming to international standards for metadata, data harvesting, and Web technology makes SPT and CWIS useful for and usable by a wide variety of projects and organizations, allowing and encouraging collaboration and record sharing among projects.
In today's Internet, with information overload A symptom of the high-tech age, which is too much information for one human being to absorb in an expanding world of people and technology. It comes from all sources including TV, newspapers, magazines as well as wanted and unwanted regular mail, e-mail and faxes. prevalent even within a single discipline, scholars and researchers struggle to find the precise material they need in the tangled tan·gled
Complicated and difficult to unravel. See Synonyms at complex.
Adj. 1. tangled - in a confused mass; "pushed back her tangled hair"; "the tangled ropes"
untangled - not tangled
2. web of online information. The major search engines do not offer great precision or any guarantee of authority; the best sites in a given field are spread around the nooks and crannies Noun 1. nooks and crannies - something remote; "he explored every nook and cranny of science"
nook and cranny
detail, item, point - an isolated fact that is considered separately from the whole; "several of the details are similar"; "a point of information" of the Internet and need to be located and then individually searched for relevant information; and even topical electronic mailing lists An electronic mailing list, a type of Internet forum, is a special usage of e-mail that allows for widespread distribution of information to many Internet users. It is similar to a traditional mailing list — a list of names and addresses — as might be kept by an can require substantial effort to monitor and sift through to extract useful information. In addition, most scholars and researchers lack the extra time needed to roam the Web trying to stay abreast of all the new resources and tools that, ironically, could save them time by making the task of locating useful information easier.
In some disciplines this problem is being addressed by organizations that take a leadership role by building Web sites called "subject gateways" or "discipline-based resource portals." These Web sites usually focus on a specific topic or scholarly discipline, and they often provide information in a variety of forms and from many sources. For example, a discipline-based portal may feature the following:
* a browsable directory of online resources, described and arranged by subject
* a search facility that includes only resources related to the field and that allows searching by title, author, subject, etc.
* current news stories related to the field
* forums for discussing specific discipline-related issues
* facilities for scholars to comment and share information about specific resources
By bringing together various collections and access points into one integrated Web site, a resource portal can bring coherence coherence, constant phase difference in two or more Waves over time. Two waves are said to be in phase if their crests and troughs meet at the same place at the same time, and the waves are out of phase if the crests of one meet the troughs of another. to the body of online information available in a given field of study, providing scholars and researchers with a facility that will save them substantial time and increase their awareness of other work in their field.
Given all of the above, a discipline-based resource portal sounds like a fine thing to put online, but building a high-quality portal with even a portion of these facilities can be a daunting daunt
tr.v. daunt·ed, daunt·ing, daunts
To abate the courage of; discourage. See Synonyms at dismay.
[Middle English daunten, from Old French danter, from Latin undertaking. Although the benefits of setting up a resource portal are clear, many organizations with a strong focus on a particular discipline do not have ready access to extensive technical resources, and even those organizations that do are likely to have those resources already committed to existing projects or working to support the organization's day-to-day operations.
The Scout Portal Toolkit (SPT) and the Collection Workflow Integration System (CWIS), open-source software packages developed by the Internet Scout Project under grants from the Andrew W. Mellon Foundation The Andrew W. Mellon Foundation is a foundation endowed with wealth accumulated by the late Andrew W. Mellon. It is the product of the 1969 merger of the Avalon Foundation and the Old Dominion Foundation. and the National Science Foundation (NSF), respectively, were created to address this problem. They allow a group or organization (or even an ambitious individual) to share a specific knowledge base via a full-featured portal on the Web, with little or no investment in technical resources or infrastructure. In fact, many groups and organizations already have available the minimal resources needed to put a resource portal online using SPT or CWIS.
The Internet Scout Project (Scout), based in the Computer Sciences Department at the University of Wisconsin-Madison “University of Wisconsin” redirects here. For other uses, see University of Wisconsin (disambiguation).
A public, land-grant institution, UW-Madison offers a wide spectrum of liberal arts studies, professional programs, and student activities. , was started in 1994 to develop better tools and services to find, filter, and present online information and metadata. Scout's flagship publication is the Scout Report, an electronic periodical periodical, a publication that is issued regularly. It is distinguished from the newspaper in format in that its pages are smaller and are usually bound, and it is published at weekly, monthly, quarterly, or other intervals, rather than daily. that is published weekly and identifies and describes the best new online resources of interest to educators, students, and researchers. In 1996 the content from the Scout Report and related Scout publications began to be collected into a searchable and browsable online database labeled the Scout Archives. As the Archives grew for the next six years, two things gradually became apparent: (1) that the original Scout Archives infrastructure had outgrown its content and user base, and (2) that there were many other groups and organizations who had the subject knowledge and desire to assemble and share collections of online resources but did not have the needed technical expertise. In 2000, Scout received funding from the Mellon Foundation to develop a software package to meet these needs, and the Scout Portal Toolkit was born.
CWIS (pronounced "see-wis") was developed after SPT; it was chartered by the NSF to build on prior work to provide software that would enable STEM (Science/Technology/Engineering/Math) content creators See content provider. and collection developers to quickly and easily put their work online and integrate it into the National Science Digital Library (NSDL). Because new technology developed for CWIS has also been integrated into SPT, both SPT and CWIS have similar capabilities, differing primarily in that CWIS includes a user interface, default nsdl_dc metadata schema, and a number of other features designed specifically to help with integration into NSDL, while SPT comes with a user interface that is more easily customized and a less complex default metadata schema intended to be less intimidating in·tim·i·date
tr.v. in·tim·i·dat·ed, in·tim·i·dat·ing, in·tim·i·dates
1. To make timid; fill with fear.
2. To coerce or inhibit by or as if by threats. to the metadata neophyte ne·o·phyte
1. A recent convert to a belief; a proselyte.
2. A beginner or novice: a neophyte at politics.
a. Roman Catholic Church A newly ordained priest. . For practical purposes we will refer just to CWIS for the remainder of the article, but all features and capabilities discussed as part of CWIS can be assumed to also be available in SPT, unless otherwise noted.
DEFINING AND CATALOGING
Setting up CWIS is intended to be (and usually is) a simple process. (Detailed information about hardware and software requirements, as well as where to obtain the software, can be found at the end of this article.) Once CWIS has been installed, the first task at hand to make the resource portal useful is the entry of resource metadata records.
This section will explain what types of data can be entered into CWIS and what tools are built into CWIS to allow for metadata addition and manipulation. The software is distributed with a set of sample data that is loaded during installation, so that administrators and resource editors can see how the portal works with data in place. When it is no longer needed, administrators may easily delete To remove an item of data from a file or to remove a file from the disk. See file wipe, trash and undelete.
1. (operating system) delete - (Or "erase") To make a file inaccessible. this sample data and enter new data into the portal.
Defining a Schema
Out of the box, CWIS comes with a default metadata schema that includes the full Dublin Core A set of meta-data descriptions about resources on the Internet. Used for resource discovery, it contains data elements such as title, creator, subject, description, date, type, format and so on. Dublin Core descriptions are often included in HTML meta tags. Element Set (Dublin Core Metadata Initiative, 2003a) with several extensions taken from the qualifier qual·i·fi·er
1. One that qualifies, especially one that has or fulfills all appropriate qualifications, as for a position, office, or task.
2. sets defined by the DCMI DCMI Dublin Core Metadata Initiative (Online Computer Library Center)
DCMI Disclosure of Classified Military Information Education (Dublin Core Metadata Initiative, 2004) and DCMI Administrative Metadata (Dublin Core Metadata Initiative, 2003b). This schema may be more than sufficient for many project and user needs. However, some groups will wish to modify these fields or add additional fields to better fit their specific workflow and needs. By using the Metadata Field Editor, a user with administrative privileges can add new fields. The nine basic data types supporting the creation of these new metadata fields are
Text--A free-form field that may contain any textual tex·tu·al
Of, relating to, or conforming to a text.
textu·al·ly adv. data
Paragraph--A free-form field that may contain any textual data. To ensure proper display and entry, a Paragraph field differs from a Textfield in that the Paragraph field is expected to normally hold several lines of text.
Number--A numeric field A data field that holds only numbers to be calculated. Contrast with character field. that may have limits imposed and can be compared to other values when setting up searching criteria
Date--A field containing a date value or date range. Date fields can contain whatever level of precision is appropriate and support several additional attributes, such as a prepended "c" to indicate that the value entered represents a copyright date.
Flag--A Boolean field containing a true or false value. Labels may be assigned to True and False for clarity.
Controlled Name--A field containing an entry from a list of values that is maintained separately to ensure consistent naming. Publisher, Creator, Contributor, and Subject are examples of controlled name fields that are part of the default schema.
Option--A field containing one or more of a number of attributes. Resource Type, Language, Audience, and Format are examples of option fields that are part of the default schema.
Tree--A field containing one or more entries taken from a hierarchy of values that is maintained separately. A Tree field might be appropriate to contain values from a standard classification taxonomy taxonomy: see classification.
In biology, the classification of organisms into a hierarchy of groupings, from the general to the particular, that reflect evolutionary and usually morphological relationships: kingdom, phylum, class, order, .
Image--A field containing an image uploaded to the portal
In addition to basic field attributes like type and name, the Metadata Field Editor allows an administrator to set default values and a number of type-specific field attributes, such as minimum and maximum value for numeric fields and on and off labels for flag fields. The Metadata Field Editor also allows tailoring the performance of the search engine (discussed below) by indicating which fields to consider for keyword searches and how to weight the fields when ranking search results.
Once metadata fields are defined, the primary method of entering resource metadata into CWIS is the Metadata Tool.
The Metadata Tool is designed to speed resource cataloging and support the accurate and consistent assignment and recording of metadata required to build and maintain a useful discipline-based resource portal. It allows resource editors to add, edit, duplicate, and delete records. The Metadata Tool also provides special features that aid in resource management. Drop-down menus See pull-down menu.
drop-down menu - pull-down menu are available for option, controlled name, tree, and flag fields, and these menus speed entry of commonly used values and help keep metadata vocabulary consistent. Of course drop-down menus are only viable for fields with a modest number of choices so, for controlled name or tree fields with more than a few hundred entries, the Metadata Tool also provides searching and browsing interfaces for selecting those fields. For example, resources in the (Internet Scout Project's) Scout Archives, which uses CWIS, are cataloged using a subject hierarchy (adapted from Library of Congress Subject Headings The Library of Congress Subject Headings (LCSH) comprise a thesaurus (in the information technology sense) of subject headings, maintained by the United States Library of Congress, for use in bibliographic records. ) with more than 20,000 entries. As it is impractical im·prac·ti·cal
1. Unwise to implement or maintain in practice: Refloating the sunken ship proved impractical because of the great expense.
2. to display that many selections on a drop down menu, CWIS automatically switches to a searching/browsing interface for assigning entries to that field.
The Metadata Tool also provides fields that support workflow management and editorial review. For example, the default CWIS schema includes a Release Flag flag field, which defaults to Not Okay for Viewing. When a user is browsing or searching, this field is checked for each resource and, if it is set to Not Okay, the resource is not displayed, which allows editorial review of material before it becomes available to the general public. If editorial review is not desired, the default value for this field can be set to Okay for Viewing, via the Metadata Field Editor, and new records will be available immediately.
To make use of the Metadata Field Editor, the Metadata Tool, or any of the personalized portal See personal portal. services, users must create an account on the portal with a login Signing in and gaining access to a network server, Web server or other computer system. The process (the noun) is a "login" or "logon," while the act of doing it (the verb) is to "log in" or to "log on. name and password. Once logged in, access to various features is controlled by eight permission flags, which may be assigned from any account with System Administrator access. These permission flags are:
Personal Resource Administrator
Master Resource Administrator
Release Flag Administrator
Controlled Name Administrator
[FIGURE 1 OMITTED]
Users may be granted any of these flags independent of the others, allowing, for example, the portal administrator to designate des·ig·nate
tr.v. des·ig·nat·ed, des·ig·nat·ing, des·ig·nates
1. To indicate or specify; point out.
2. To give a name or title to; characterize.
3. certain individuals as responsible for maintaining the controlled name lists or classification hierarchies, while other individuals handle more administrative matters like monitoring activity in the portal discussion forums or posting news to the front page of the portal. This means that resources editors, and others involved with the portal, could potentially be spread out geographically and still effectively work together to contribute or edit resource records in a coordinated fashion.
SEARCHING AND BROWSING
Once the proper metadata fields have been configured con·fig·ure
tr.v. con·fig·ured, con·fig·ur·ing, con·fig·ures
To design, arrange, set up, or shape with a view to specific applications or uses: and populated pop·u·late
tr.v. pop·u·lat·ed, pop·u·lat·ing, pop·u·lates
1. To supply with inhabitants, as by colonization; people.
2. with initial resource metadata records, collection developers may be ready to share their efforts with end users. Of course, those users need some way of locating the specific information on the portal that best meets their needs, and CWIS provides several routes toward achieving that end.
The simplest and most familiar method for locating information on a Web site is browsing. CWIS supports hierarchical browsing interfaces, based on classifications (tree fields) assigned to the resources.
Since classification hierarchies may be either wide (a large number of entries at the top level) or deep (a large number of levels) or both, the CWIS browsing interface is dynamically generated, based on the structure of the classification tree. This prevents the browsing interface from becoming unwieldy when a given section of the tree is very broad, while still minimizing the need for users to search through multiple pages to find the classification they seek. Because resources may be classified with more than one taxonomy, CWIS generates a separate interface for each field marked as available for browsing and remembers which of these interfaces each user most recently selected to help users orient o·ri·ent
1. To locate or place in a particular relation to the points of the compass.
2. To align or position with respect to a point or system of reference.
3. themselves more easily each time they log into the portal.
In most collections the resources will not be evenly distributed throughout the taxonomies used to classify clas·si·fy
tr.v. clas·si·fied, clas·si·fy·ing, clas·si·fies
1. To arrange or organize according to class or category.
2. To designate (a document, for example) as confidential, secret, or top secret. and subsequently browse through the resources. To provide users with some idea of the distribution of entries through the classification tree, CWIS displays the number of resources present under any branch of the tree. This can prove particularly useful to collection developers in assessing where their collection's strengths lie and where additional effort may be needed to round out the collection. Branches or leaves containing no resources are not displayed in the browsing interface, which allows collection developers to enter or import a large taxonomy for use when cataloging resources without unnecessarily cluttering cluttering Speech pathology A condition characterized by an excessive rate of speech with an irregular rhythm, collapsing of sounds and words, and loss of syllables; cluttering can range in severity from garbled, but generally intelligible, to virtually the selection presented to end users.
Because searching is often faster and more effective than browsing (provided the user has concrete insight on at least some aspects of the desired items), many people prefer to search rather than browse to locate data on the Web. CWIS provides two separate search mechanisms, both based on Scout's OSMASE search engine.
Keyword searching, the method familiar to most Web users, is very similar to the approach presented by Google[TM] and other general Web search engines A Web site that maintains an index and short summaries of billions of pages on the Web, Google being the world's largest. Most search engine sites are free and paid for by advertising banners, while others charge for the service. . Users enter terms that are related or may appear in the entries that they are looking for Looking for
In the context of general equities, this describing a buy interest in which a dealer is asked to offer stock, often involving a capital commitment. Antithesis of in touch with. , and those terms are used to determine which resources best fit the search. CWIS supports most of the conventions offered by sites like Google, such as phrase searching (enclosing en·close also in·close
tr.v. en·closed, en·clos·ing, en·clos·es
1. To surround on all sides; close in.
2. To fence in so as to prevent common use: enclosed the pasture. several words in quotation marks quotation marks
the punctuation marks used to begin and end a quotation, either `` and '' or ` and '
quotation marks npl → comillas fpl
to indicate that the user is looking for the words in that specific sequence) and term exclusion (prepending a minus sign to a word to indicate that the user only wants results that do not include that term). Keyword search results in CWIS are ordered by their relevance to the terms entered. How various metadata fields are weighted to determine this relevance (for example, terms appearing in Title are more significant than those appearing in Description), as well as which fields are considered for keyword searching, can be adjusted via the Metadata Field Editor, allowing portal administrators to tweak To make minor adjustments in an electronic system or in a software program in order to improve performance. See calibrate.
1. tweak - To change slightly, usually in reference to a value. Also used synonymously with twiddle. search performance to best fit their portal content and audience.
To better take advantage of the precision offered by the metadata assigned to resources, CWIS also supports fielded searching. With a fielded search, users enter terms in a fashion similar to a keyword search, but along with those terms users can specify the fields in which to look for the terms. For nontextual fields, fielded searching also allows users to specify constraints that can be used to narrow the search results. For example, when searching through a collection of digitized rare books stored online, a user might be able to specify only nonfiction non·fic·tion
1. Prose works other than fiction: I've read her novels but not her nonfiction.
2. The category of literature consisting of works of this kind. books that were published between 1885 and 1890.
Because of the potential complexity of a fielded search, CWIS provides the ability for each user to save a set of search parameters and recall them at a later date to run the search again.
Combining this ability to save fielded searches with what has become known in Internet jargon jargon, pejorative term applied to speech or writing that is considered meaningless, unintelligible, or ugly. In one sense the term is applied to the special language of a profession, which may be unnecessarily complicated, e.g., "medical jargon. as "push technology," CWIS also offers a feature sometimes referred to as "user agents." This capability allows users to set up and save a fielded search that returns items they may find of interest and then have that fielded search automatically performed nightly night·ly
1. Of or occurring during the night; nocturnal: the cat's nightly prowl.
2. Happening or done every night: the physician's nightly rounds. or weekly by the portal, with any new results found being assembled into a report that is sent to the user via email. These reports can be invaluable to users by keeping them abreast in a timely fashion of new resources that may become available, and they benefit the portal developer by actively maintaining awareness of the portal among the user community.
[FIGURE 2 OMITTED]
For resource metadata administrators, CWIS also offers the option to run these searches on an hourly basis, which may facilitate editorial review or other workflow processes set up among a group of collection developers. For example, rather than requiring catalogers to notify editors when new records have been entered, the editors can be notified automatically of new entries awaiting their review by user agents they have set up on the portal.
RATING AND RECOMMENDING
An active user community and its contributions are key components of most successful Web portals See portal. . To leverage user community participation, CWIS offers three features: resource ratings, resource recommendations, and resource comments.
Resource rating allows users to indicate their opinion on the usefulness of an individual resource and to generate a cumulative rating for the resource based on these opinions. The cumulative ratings are beneficial both to other users, who can use them when determining which resources are most likely to meet their needs, and to the collection developers, who can use the ratings to monitor what portions of the collection users are finding most beneficial. Cumulative rating values are displayed graphically when browsing, searching, or viewing the full resource record, and users may interactively rate resources from the browsing and search result interfaces.
Resource ratings provide information about the perceived usefulness of a resource to other users and the collection developers, but they can also represent a body of information about the needs and preferences of the users who assigned the ratings. To take advantage of this information, CWIS includes a recommender system See collaborative filtering. .
One recommender system with which many people are familiar is that provided by Amazon.com[TM], where a user rates a number of books and then, based on those ratings, Amazon recommends other books that they may find of interest. The facility provided by CWIS operates in a similar fashion, although it is a content-based recommender system rather than a collaborative recommender system such as Amazon's. A content-based system, which bases recommendations on item attributes, was chosen over a collaborative system (Breese, Heckerman, & Kadie, 1998) because collaborative systems, which base recommendations solely on preferences expressed by groups of users, typically require a very large number of ratings before they begin to offer useful recommendations.
Notes and Comments
As an adjunct adjunct (aj´ungkt),
n a drug or other substance that serves a supplemental purpose in therapy.
adjunct to the resource ratings, SPT also provides users the ability to post comments on resources, which are then displayed along with the individual resource record. Again, this can benefit both other users and the collection developers by providing more detail about why users may have found a particular resource useful.
To help insure the integrity of these facilities, only registered portal users may rate resources or post comments, and an interface is provided to allow administrators to quickly review (and edit or delete) recent comments, without having to step through every resource record to learn what has been posted.
[FIGURE 3 OMITTED]
DISPLAYING AND SHARING
Entering data into the portal and helping users find the portion of that data that meets their needs have both been discussed, but there still remains the problem of presenting that data to the end user in an effective fashion.
Effective presentation can vary widely depending on the subject matter and intended audience. Fortunately, CWIS provides several mechanisms that can be used to tailor a portal to meet these specific needs. These mechanisms do require some technical expertise, but they do not have to be employed to build a useful discipline-based portal. However, if the technical expertise is available, in some situations these mechanisms can be used to dramatically improve the portal experience for the end user.
Some portals may need to serve disparate user communities, presenting a different face or offering different features depending on the user. For example, a portal focused on educational resources about paleontology paleontology (pā'lēəntŏl`əjē) [Gr.,= study of early beings], science of the life of past geologic periods based on fossil remains. may want to serve both grade school children and high school students, but a Web site design that can catch and hold the attention of an eight-year-old will likely be judged by a sixteen-year-old as too childish child·ish
1. Of, relating to, or suitable for a child or childhood: a high, childish voice; childish nightmares.
a. or condescending, and site designs well-suited for either of those groups will likely not be optimal for use by their teachers.
To address this type of situation, CWIS supports multiple user interfaces, assignable on a per-user basis. In practical terms, this means a portal can have two or more user interfaces that differ significantly from one another in appearance and functionality while still using the exact same underlying CWIS installation, configuration, and metadata.
All CWIS pages for a given interface are built from a common CSS-based page template, to which page-specific code is added. If specific code for a page is not found in a new interface, then page-specific code from the CWIS default interface is used. This means that it is possible to dramatically alter the appearance of a portal site Noun 1. portal site - a site that the owner positions as an entrance to other sites on the internet; "a portal typically has search engines and free email and chat rooms etc. by adding a new interface containing just a new common page template and then making changes to that template. This approach also allows additional changes or additions to be made on a per-page basis to change the appearance or add functionality, without having to provide a new set of pages for the entire portal.
Some limited user interface customization ability is available in CWIS and SPT without modifying or adding HTML HTML
in full HyperText Markup Language
Markup language derived from SGML that is used to prepare hypertext documents. Relatively easy for nonprogrammers to master, HTML is the language used for documents on the World Wide Web. . For SPT the colors and logo graphics in the default interface may be set by the portal administrator via a Web-based customization tool. For CWIS the default interface comes with a half dozen additional "themes" that provide alternate color schemes and header graphics.
[FIGURE 4 OMITTED]
Sometimes when presenting data, the intended recipient is a computer rather than a human being. To address this need, CWIS supports exporting data in three formats: RSS (Really Simple Syndication) A syndication format that was developed by Netscape in 1999 and became very popular for aggregating updates to blogs and the news sites. RSS has also stood for "Rich Site Summary" and "RDF Site Summary. , OAI (Open Application Interface) A computer to telephone interface that lets a computer control and customize PBX and ACD operations. , and tab-delimited text. The OAI (Open Archives Initiative The Open Archives Initiative (OAI) is an attempt to build a "low-barrier interoperability framework" for archives (institutional repositories) containing digital content (digital libraries). It allows people (Service Providers) to harvest metadata (from Data Providers). , 2004a) format is an XML-based protocol for harvesting metadata. Developed to be a low-barrier (that is, easily implemented) method for sharing metadata, the protocol has been adopted very rapidly by the online metadata community. CWIS supports version 2.0 (Open Archives Initiative, 2004b) of the OAI protocol, including qualifiers. RSS (Backend. Userland.com., 2003) is a well-established XML XML
in full Extensible Markup Language.
Markup language developed to be a simplified and more structural version of SGML. It incorporates features of HTML (e.g., hypertext linking), but is designed to overcome some of HTML's limitations. (World Wide Web Consortium, 2004) format for syndicating online content, typically conveying article titles or headlines. The first version of RSS was developed and released by Netscape in 1999 (Libby, 1999), and the format has since been adopted as a de facto standard Hardware or software that is widely used, but not endorsed by a standards organization. Contrast with de jure standard.
de facto standard - A widespread consensus on a particular product or protocol which has not been ratified by any official standards body, such as ISO, (Syndic SYNDIC. A term used in the French law, which answers in one sense to our word assignee, when applied to the management of bankrupts' estates; it has also a more extensive meaning; in companies and communities, syndics are they who are chosen to conduct the affairs and attend to the 8.com, 2004) among weblogs and other Web sites where syndicated headlines are desired. CWIS supports RSS version 2.0 (which is backward compatible Refers to hardware or software that is compatible with earlier versions of the product. Also called "downward compatible." Contrast with forward compatible.
backward compatible - backward compatibility with RSS version 0.92). Although the Metadata Tool will likely be the primary method of entering new data into the portal, sometimes--particularly during initial setup--an administrator may want to import records into the portal in bulk. To allow this, CWIS supports importing tab-delimited resource records in a flexible format, with the first record in the imported file defining the order and meaning of fields in subsequent records. As with data entered via the Metadata Tool, CWIS can adapt to some degree to field content in imported records; for example, dates or date ranges may appear in almost any common format and will be interpreted and stored correctly for use within the portal.
The tab-delimited export format matches the data import format described earlier and should be compatible with many common applications. Each format is targeting a different audience. RSS will most commonly be used to share resource titles and information to be displayed on other Web sites, such as those implemented with uPortal (uPortal, n.d.). OAI will most commonly be used when sharing data with other groups that are working with online metadata, such as NSF's National Science Digital Library (NSDL) (National Science Digital Library, n.d.) initiative. The tab-delimited format should be of use when collection developers want to manipulate data with other, non-Web-based applications.
GETTING STARTED WITH CWIS AND SPT
CWIS and SPT are designed to be easy to install and configure See configuration.
(software) configure - A program by Richard Stallman to discover properties of the current platform and to set up make to compile and install gcc.
Cygnus configure was a similar system developed by K. , in most cases taking less than ten minutes to get up and running when installed in the recommended environment.
CWIS and SPT have been developed to run on a Linux-based Web server that supports PHP (PHP Hypertext Preprocessor) A scripting language used to create dynamic Web pages. With syntax from C, Java and Perl, PHP code is embedded within HTML pages for server side execution. 4.1.0 (or later) (PHP Group, 2004) and a database server running MySQL 3.23 (or later) (MySQL, 2004). PHP must have been installed with MySQL support. If graphics manipulation is desired, PHP must include the GD library. Although Linux is the target platform, there are sites currently in operation running CWIS or SPT on Solaris and OS X. Running the software on some variants of Microsoft Windows See Windows.
(operating system) Microsoft Windows - Microsoft's proprietary window system and user interface software released in 1985 to run on top of MS-DOS. Widely criticised for being too slow (hence "Windoze", "Microsloth Windows") on the machines available then. is possible but not recommended.
As far as hardware requirements, CWIS and SPT will run on almost anything that will support PHP and MySQL. If the portal will include a large number of resources (thousands or tens of thousands), collection developers will likely want to be running CWIS on faster PC hardware because the search engine and recommender system can both be CPU-intensive.
Where to Download
CWIS and the Scout Portal Toolkit are available for download from the Internet Scout Project site on the following pages:
Two files are available on each page: the software package itself and an installation script. The integrity of the files can be verified by checking their MD5 checksums against the values posted at the bottom of the page.
The dynamic interface support provided by CWIS is intended primarily to allow customization via HTML, but there are times when more extensive changes or additions are warranted. To support this, CWIS offers programming hooks for customization, where additional code may be linked in a way that will affect the operation of existing C3NIS Niš or Nish (both: nēsh), city (1991 pop. 175,391), SE Serbia, on the Nišava River. An important railway and industrial center, it has industries that manufacture textiles, electronics, spirits, and locomotives. functionality. Examples of this might include additional filtering of search results or on-the-fly processing of resource metadata prior to display.
Of course, new versions of CWIS and SPT are likely to be released with additional functionality or enhanced performance. When an existing installation is upgraded to a new version, interface or programming changes are preserved wherever possible.
SPT and CWIS in Use
Because some sites are very heavily modified and others are not publicly accessible (and, of course, the software is free for download, and registration, while strongly encouraged, is not required), accurately determining the number of active SPT and CWIS installations in the field is not possible. However, the best estimates as of this writing (July 2004) put the total number of active production installations at somewhere between 45 and 60 sites and the number of SPT- or CWIS-based sites under development but not yet available to end users somewhere between 200 and 250. These numbers are expected to dramatically increase within the next year, with the increasing rate of adoption of CWIS by NSDL-related projects and the rapid growth of interest in the OAI-PMH OAI-PMH Open Archives Initiative Protocol for Metadata Harvesting protocol for disseminating dis·sem·i·nate
v. dis·sem·i·nat·ed, dis·sem·i·nat·ing, dis·sem·i·nates
1. To scatter widely, as in sowing seed.
2. collection metadata.
Some of the active CW/S-based sites include the Electronic Environmental Resources Library (http://www.eerl.org), a collection focused on environmental and sustainability resources for community college educators and students; the Journal of Chemical Education's JCE-DLib repository (http:// resgenchem15.chem.wisc.edu/spt/), which catalogs chemical education resources; and the Consortium for the Advancement of Undergraduate Statistical Education's CAUSEweb project (http://www.causeweb.org/resources/). Active SPT-based sites include Duke University Libraries' Classical Music Resources collection (http://www.lib.duke.edu/dw3/), the Tibetan & Himalayan Digital Library bibliography bibliography. The listing of books is of ancient origin. Lists of clay tablets have been found at Nineveh and elsewhere; the library at Alexandria had subject lists of its books. database (http://datastore.lib. virginia.edu/tibet/spt/), and the British Columbia British Columbia, province (2001 pop. 3,907,738), 366,255 sq mi (948,600 sq km), including 6,976 sq mi (18,068 sq km) of water surface, W Canada. Geography
History Portal (http:// bchistoryportal.tc.ca/). SPT has also been used for several projects where Scout has had a more direct role, including LearningLanguages.Net (http:// learninglanguages.net), a site collecting Spanish, French, and Japanese language Japanese language
Language spoken by about 125 million people on the islands of Japan, including the Ryukyus. The only other language of the Japanese archipelago is Ainu (see Ainu), now spoken by only a handful of people on Hokkaido, though once much more widespread. education resources for K-12 students and teachers, and Access NSDL (http://accessnsd.org), a portal intended to help NSDL collection builders and service projects cope with online accessibility issues. And, of course, one of the largest and most active SPT-based installations is Scout's own Scout Archives (http://scout.wisc.edu/Archives/), which catalogs more than 17,000 online resources culled from the past ten years of Scout publications. All of these sites and more can be found on Scout's SPT/CWIS site list (http://scout.wisc.edn/Projects/SPTCWISSites/), which is periodically updated to list new public installation of the software.
With funding from the Andrew W. Mellon Foundation and the National Science Foundation, respectively, the Scout Portal Toolkit and CWIS were developed by Edward Almasy, Barry Wiegan, Andy Yaco-Mink, Rachael Bower, and David Sleasman. CWIS and SPT are open-source software, licensed under the GNU General Public License A software license from the Free Software Foundation (FSF) that ensures every user receives the essential freedoms that define "free" software, which is free of restrictions (see free software). (http://www.gnu gnu (n) or wildebeest (wĭl`dəbēst'), large African antelope, genus Connochaetes. .org/licenses/gpl. txt) and available at no charge.
Backend.Userland.com. (2003). RSS 0.92. Retrieved November 20, 2004, from http://backend.userland.com/rss092.
Breese, J., Heckerman, D., & Kadie, C. (1998). Empirical analysis of predictive algorithms for collaborative filtering Also known as "social filtering" and "social information filtering," it refers to techniques that identify information a user might be interested in. There are different kinds of algorithms used, but the basic principle is to develop a rating system for matching incoming material. . In G. Cooper, S. Moral, & P. P. Shenoy (Eds.), Uncertainty in artificial intelligence: Proceedings of the Fourteenth Conference, July 24-26, 1998, University of Wisconsin, Madison, WI (pp. 43-52). Madison, WI: Morgan Kaufmann. Retrieved November 20, 2004, from http://www.research.microsoft.com/users/breese/cfalgs.html.
Dublin Core Metadata Initiative. (2003a). Dublin Core Metadata element set, version 1.1: Reference description. Retrieved November 20, 2004, from http://dublincore.org/documents/ dces/.
Dublin Core Metadata Initiative. (2003b). DCMI administrative metadata working group. Retrieved November 20, 2004, from http://dublincore.org/groups/admin/.
Dublin Core Metadata Initiative. (2004). SCMI SCMI Southern California Marine Institute (Terminal Island, CA)
SCMI Supply Chain Management Institute
SCMI South Carolina Military Institute education working group. Retrieved November 20, 2004, from http://dublincore.org/groups/education/.
Libby, D. (1999). RSS 0.91 spec, revision 3. Retrieved November 20, 2004, from http:// my.netscape.com/publish/formats/rss-spec-0.91.html.
MySQL. (2004). Home page. Retrieved November 20, 2004, from http://www.mysql.com.
National Science Digital Library. (n.d.). Home page. Retrieved November 20, 2004, from http://www.nsdl.org.
Open Archives Initiative. (2004a). Home page. Retrieved November 20, 2004, from http://www. openarchives.org.
Open Archives Initiative. (2004b). The ()pen Archives Initiative protocol for metadata harvesting. Retrieved November 20, 2004, from http://www.openarchives.org/OAI/2.0/openarchivesprotocol.htm.
PHP Group. (2004). Home page. Retrieved November 20, 2004, from http://www.php.net.
Syndic8.com. (2004). Welcome to Syndic8.com. Retrieved November 20, 2004, from http://www. syndic8.com.
uPortal. (n.d.). Home page. Retrieved November 20, 2004, from http://www.uportal.org.
World Wide Web Consortium. (2004). Extensible Markup Language See XML.
(language, text) Extensible Markup Language - (XML) An initiative from the W3C defining an "extremely simple" dialect of SGML suitable for use on the World-Wide Web.
http://w3.org/XML/. (XML). Retrieved November 20, 2004, from http://www.w3.org/XML/.
Edward Almasy, Computer Sciences, University of Wisconsin-Madison, 1210 W. Dayton St., Madison, WI 53715