Printer Friendly
The Free Library
14,504,174 articles and books
Member login
User name  
Password 
 
Join us Forgot password?

The invisible web: uncovering sources search engines can't see.


ABSTRACT

THE PARADOX OF THE INVISIBLE WEB See deep Web.  is that it's easy to understand why it exists, but it's very hard to actually define in concrete, specific terms. In a nutshell nut·shell  
n.
The shell enclosing the meat of a nut.

Idiom:
in a nutshell
In a few words; concisely: Just give me the facts in a nutshell.

Adv. 1.
, the Invisible Web consists of content that's been excluded from general-purpose search engines and Web directories such as Lycos and LookSmart--and yes, even Google. There's nothing inherently "invisible" about this content. But since this content is not easily located with the information-seeking tools used by most Web users, it's effectively invisible because it's so difficult to find unless you know exactly where to look.

In this paper, we define the Invisible Web and delve into the reasons search engines can't "see" its content. We also discuss the four different "types" of invisibility, ranging from the "opaque" Web which is relatively accessible to the searcher, to the truly invisible Web, which requires specialized spe·cial·ize  
v. spe·cial·ized, spe·cial·iz·ing, spe·cial·iz·es

v.intr.
1. To pursue a special activity, occupation, or field of study.

2.
 finding aids to access effectively.

**********

The visible Web is easy to define. It's made up of HTML HTML
 in full HyperText Markup Language

Markup language derived from SGML that is used to prepare hypertext documents. Relatively easy for nonprogrammers to master, HTML is the language used for documents on the World Wide Web.
 Web pages that the search engines have chosen to include in their indices. It's no more complicated than that. The Invisible Web is much harder to define and classify clas·si·fy  
tr.v. clas·si·fied, clas·si·fy·ing, clas·si·fies
1. To arrange or organize according to class or category.

2. To designate (a document, for example) as confidential, secret, or top secret.
 for several reasons.

First, many Invisible Web sites are made up of straightforward Web pages that search engines could easily crawl To search the Internet for hosts, Web pages or blogs. See crawler.  and add to their indices but do not, simply because the engines have decided against including them. This is a crucial point--much of the Invisible Web is hidden because search engines have deliberately chosen to exclude some types of Web content. We're not talking about unsavory "adult" sites or blatant spam E-mail that is not requested. Also known as "unsolicited commercial e-mail" (UCE), "unsolicited bulk e-mail" (UBE), "gray mail" and just plain "junk mail," the term is both a noun (the e-mail message) and a verb (to send it).  sites--quite the contrary! Many Invisible Web sites are first-rate content sources. These exceptional resources simply cannot be found using general-purpose search engines because they have been effectively locked out.

There are a number of reasons for these exclusionary policies, many of which we'll discuss. But keep in mind that, should the engines change their policies in the future, sites that today are part of the Invisible Web will suddenly join the mainstream as part of the visible Web. In fact, since the publication of our book The Invisible Web: Uncovering Information Sources Search Engines Can't See (Medford, NJ: CyberAge Books, 2001, 0-910965-51X/softbound), most major search engines are now including content that was previously hidden--we'll discuss these developments below.

Second, it's relatively easy to classify some sites as either visible or invisible based on the technology they employ. Some sites using database technology, for example, are genuinely difficult for current generation search engines to access and index. These are "true" Invisible Web sites. Other sites, however, use a variety of media and file types, some of which are easily indexed and others that are incomprehensible to search engine crawlers. Web sites that use a mixture of these media and file types aren't easily classified as either visible or invisible. Rather, they make up what we call the "opaque" Web.

Finally, search engines could theoretically index some parts of the Invisible Web, but doing so would simply be impractical im·prac·ti·cal  
adj.
1. Unwise to implement or maintain in practice: Refloating the sunken ship proved impractical because of the great expense.

2.
, either from a cost standpoint, or because data on some sites is ephemeral Temporary. Fleeting. Transitory.  and not worthy of indexing--for example, current weather information, moment-by-moment stock quotes, airline flight arrival times, and so on. However, it's important to note that, even if all Web engines "crawled" everything, an unintended consequence For the 1996 novel by John Ross, see .

Unintended consequences are situations where an action results in an outcome that is not (or not only) what is intended. The unintended results may be foreseen or unforeseen, but they should be the logical or likely results of the
 could be that, with the vast increase in information to process, finding the right "needle" in a larger "haystack" might become more difficult. Invisible Web tools offer limiting features for a specific data set, potentially increasing precision. General engines don't have these options. So the database will increase but precision could suffer.

INVISIBLE WEB DEFINED

The Invisible Web: Text pages, files, or other often high-quality authoritative information available via the World Wide Web that general-purpose search engines cannot, due to technical limitations, or will not, due to deliberate choice, add to their indices of Web pages. Sometimes also referred to as the "deep Web" or "dark matter."

This definition is deliberately very general, because the general-purpose search engines are constantly adding features and improvements to their services. What may be invisible today may suddenly become visible tomorrow, should the engines decide to add the capability to index things that they cannot or will not currently index.

Let's examine the two parts of this definition in more detail. First, we'll look at the technical reasons search engines can't index certain types of material on the Web. Then we'll talk about some of the other nontechnical but very important factors that influence the policies that guide search engine operations.

At their most basic level, search engines are designed to index Web pages. Search engines use programs called crawlers (a.k.a., "spiders" and "robots") to find and retrieve Web pages stored on servers all over the world. From a Web server's standpoint, it doesn't make any difference if a request for a page comes from a person using a Web browser The program that serves as your front end to the Web on the Internet. In order to view a site, you type its address (URL) into the browser's Location field; for example, www.computerlanguage.com, and the home page of that site is downloaded to you.  or from an automated au·to·mate  
v. au·to·mat·ed, au·to·mat·ing, au·to·mates

v.tr.
1. To convert to automatic operation: automate a factory.

2.
 search engine crawler Also known as a "Web crawler," "spider," "ant," "robot" (bot) and "intelligent agent," a crawler is a program that searches for information on the Web. Crawlers are widely used by Web search engines to index all the pages on a site by following the links from page to page. . In either case, the server returns the desired Web page to the computer that requested it.

A key difference between a person using a browser browser

Software that allows a computer user to find and view information on the Internet. The first text-based browser for the World Wide Web became available in 1991; Web use expanded rapidly after the release in 1993 of a browser called Mosaic, which used
 and a search engine spider is that the person can manually type a URL URL
 in full Uniform Resource Locator

Address of a resource on the Internet. The resource can be any type of file stored on a server, such as a Web page, a text file, a graphics file, or an application program.
 into the browser window and retrieve the page the URL points to. Search engine crawlers lack this capability. Instead, they're forced to rely on links they find on Web pages to find other pages. If a Web page has no links pointing to it from any other page on the Web, a search engine crawler can't find it. These "disconnected" pages are the most basic part of the Invisible Web. There's nothing preventing a search engine from crawling and indexing disconnected pages--but without links pointing to the pages, there's simply no way for a crawler to discover and fetch them.

Disconnected pages can easily leave the realm of the invisible and join the visible Web in one of two ways. First, if a connected Web page links to the disconnected page, a crawler can discover the link and spider the page. Second, the page author can request that the page be crawled by submitting it to "search engine add URL" forms.

Technical problems begin to come into play when a search engine crawler encounters an object or file type that's not a simple text document. Search engines are designed to index text and are highly optimized to perform search and retrieval operations on text. But they don't do very well with nontextual data, at least in the current generation of tools.

Some engines, like AltaVista and Google, can do limited searching for certain kinds of nontext files, including images, audio, or video files. But the way they process requests for this type of material are reminiscent of early Archie searches, typically limited to a filename file·name also file name  
n.
A name given to a computer file to distinguish it from other files, often containing an extension that classifies it by type.
 or the minimal alternative (ALT (character) alt - /awlt/ 1. The alt modifier key on many keyboards, including the IBM PC. On some keyboards and operating systems, (but not the IBM PC) the alt key sets bit 7 of the character generated.

See bucky bits.

2.
) text that's sometimes used by page authors in the HTML image tag. Text surrounding an image, sound, or video file can give additional clues about what the file contains. But keyword searching with images and sounds is a far cry from simply telling the search engine to "find me a picture that looks like Picasso's 'Guernica'" or "let me hum a few bars of this song and you tell me what it is." Pages that consist primarily of images, audio, or video, with little or no text, make up another type of Invisible Web content. While the pages may actually be included in a search engine index, they provide few textual tex·tu·al  
adj.
Of, relating to, or conforming to a text.



textu·al·ly adv.
 clues as to their content, making it highly unlikely they will ever garner high relevance scores.

Researchers are working to overcome these limitations. Google, for example, has experimented with optical character recognition optical character recognition (OCR), method for the machine-reading of typeset, typed, and, in some cases, hand-printed letters, numbers, and symbols using optical sensing and a computer.  processes for extracting text from photographs and graphic images, in its experimental Google Catalogs project (Google Catalogs, n.d.). While not particularly useful to serious searchers, Google Catalogs illustrates one possibility for enhancing the capability of crawlers to find Invisible Web content.

Another company, Singingfish (owned by Thompson) indexes audio streaming See streaming audio.  media and makes use of metadata (1) (meta-data) Data that describes other data. The term may refer to detailed compilations such as data dictionaries and repositories that provide a substantial amount of information about each data element.  embedded Inserted into. See embedded system.  in the files to enhance the search experience (Singingtish, n.d.). ShadowTV performs near real-time indexing of television audio and video, converting spoken audio to text to make it searchable (Shadow TV, n.d.).

While search engines have limited capabilities to index pages that are primarily made up of images, audio, and video, they have serious problems with other types of nontext material. Most of the major general-purpose search engines simply cannot handle certain types of formats. When our book was first written, PDF (Portable Document Format) The de facto standard for document publishing from Adobe. On the Web, there are countless brochures, data sheets, white papers and technical manuals in the PDF format.  and Microsoft Office Microsoft's primary desktop applications for Windows and Mac. Depending on the package, it includes some combination of Word, Excel, PowerPoint, Access and Outlook along with various Internet and other utilities.  format documents were among those not indexed by search engines. Google pioneered the indexing of PDF and Office documents, and this type of search capability is widely available today.

However, a number of other file formats are still largely ignored by search engines. These formats include:

* Postscript The de facto standard page description language (PDL) in the graphics arts industry as well as in commercial printing. Developed by Adobe, many printers and most imagesetters support PostScript by having a built-in PostScript interpreter. ,

* Flash,

* Shockwave,

* Executables (programs), and

* Compressed files (.zip, .tar, etc.).

The problem with indexing these files is that they aren't made up of HTML text. Technically, most of the formats in the list above can be indexed. AlltheWeb.com, for example, recently began indexing the text portions of Flash files, and Google can follow links embedded within Flash files.

The primary reason search engines choose not to index certain file types is a business judgment. For one thing, there's much less user demand for these types of files than for HTML text files. These formats are also "harder" to index, requiring more computing computing - computer  resources. For example, a single PDF file See PDF.  might consist of hundreds or even thousands of pages, so even those engines that do index PDF files typically ignore parts of a document that exceed 100K bytes or so. Indexing non-HTML text file formats tends to be costly. In other words Adv. 1. in other words - otherwise stated; "in other words, we are broke"
put differently
, the major Web engines are not in business to meet every need of information professionals and researchers.

Pages consisting largely of these "difficult" file types currently make up a relatively small part of the Invisible Web. However, we're seeing a rapid expansion in the use of many of these file types, particularly for some kinds of high-quality, authoritative information. For example, to comply with federal paperwork reduction legislation, many U.S. government agencies are moving to put all of their official documents on the Web in PDF format. Most scholarly papers are posted to the Web in Postscript or compressed Postscript format. For the searcher, Invisible Web content made up of these file types poses a serious problem. We discuss a partial solution to this problem later in this article.

The biggest technical hurdle search engines face lies in accessing information stored in databases. This is a huge problem, because there are thousands--perhaps millions--of databases containing high-quality information that are accessible via the Web. Web content creators See content provider.  favor databases because they offer flexible, easily maintained development environments. And increasingly, content-rich databases from universities, libraries, associations, businesses, and government agencies are being made available online, using Web interfaces as front-ends to what were once closed, proprietary information systems.

Databases pose a problem for search engines because every database is unique in both the design of its data structures and its search and retrieval tools and capabilities. Unlike simple HTML files, which search engine crawlers can simply fetch and index, content stored in databases is trickier to access, for a number of reasons that we'll describe in detail below.

Search engine crawlers generally have no difficulty finding the interface or gateway pages to databases because these are typically pages made up of input fields and other controls. These pages are formatted with HTML and look like any other Web page that uses interactive forms. Behind the scenes, however, are the knobs, dials, and switches that provide access to the actual contents of the database, which are literally incomprehensible to a search engine crawler.

Although these interfaces provide powerful tools for a human searcher, they act as roadblocks for a search engine spider. Essentially, when an indexing spider comes across a database, it's as if it has run smack into the entrance of a massive library with securely bolted doors. A crawler can locate and index the library's address, but because the crawler cannot penetrate the gateway it can't tell you anything about the books, magazines, or other documents it contains.

These Web-accessible databases make up the lion's share of the Invisible Web. They are accessible via the Web but may or may not actually be on the Web. To search a database you must use the powerful search and retrieval tools offered by the database itself. The advantage to this direct approach is that you can use search tools that were specifically designed to retrieve the best results from the database. The disadvantage is that you need to find the database in the first place, a task the search engines may or may not be able to help you with.

There are several different kinds of databases used for Web content, and it's important to distinguish between them. Just because Web content is stored in a database doesn't automatically make it part of the Invisible Web. Indeed, some Web sites use databases not so much for their sophisticated query tools, but rather because database architecture is more robust and makes it easier to maintain a site than if it were simply a collection of HTML pages.

One type of database is designed to deliver tailored content to individual users. Examples include MyYahoo!, Personal Excite, Quicken A popular financial management program for PCs and Macs from Intuit, Inc., Mountain View, CA (www.intuit.com). It is used to write checks, organize investments and produce a variety of reports for personal finance and small business. .com's personal portfolios, and so on. These sites use databases that generate "on the fly" HTML pages customized for a specific user. Since this content is tailored for each user there's little need to index it in a general-purpose search engine.

A second type of database is designed to deliver streaming or real-time data--stock quotes, weather information, airline flight arrival information, and so on. This information isn't necessarily customized, but it is stored in a database due to the huge, rapidly changing quantities of information involved. Technically, much of this kind of data is indexable because the information is retrieved from the database and published in a consistent, straight HTML file format. But because it changes so frequently, and has value for such a limited duration (other than to scholars or archivists), there's no point in indexing it. It's also problematic for crawlers to keep up with this kind of information. Even the fastest crawlers revisit re·vis·it  
tr.v. re·vis·it·ed, re·vis·it·ing, re·vis·its
To visit again.

n.
A second or repeated visit.



re
 most sites monthly or even less frequently (other than news crawlers, which are designed to track rapidly changing news sites). Staying current with real-time information would consume so many resources it is effectively impossible for a crawler.

The third type of Web-accessible database is optimized for the data it contains, with specialized query tools designed to retrieve the information using the fastest or most effective means possible. These are often "relational" databases that allow sophisticated querying to find data that are "related" based on criteria specified by the user. The only way of accessing content in these types of databases is by directly interacting with the database. It is this content that forms the core of the Invisible Web.

Let's take a closer look at these elements of the Invisible Web and demonstrate exactly why search engines can't or won't index them.

WHY SEARCH ENGINES CAN'T SEE THE INVISIBLE WEB

Text--more specifically hypertext--is the fundamental medium of the Web. The primary function of search engines is to help users locate hypertext hypertext, technique for organizing computer databases or documents to facilitate the nonsequential retrieval of information. Related pieces of information are connected by preestablished or user-created links that allow a user to follow associative trails across the  documents of interest. Search engines are highly tuned and optimized to deal with text pages and, even more specifically, text pages that have been encoded with the HyperText Markup Language (hypertext, World-Wide Web, standard) Hypertext Markup Language - (HTML) A hypertext document format used on the World-Wide Web. HTML is built on top of SGML. "Tags" are embedded in the text. A tag consists of a "<", a "directive" (in lower case), zero or more parameters and a ">".  (HTML). As the Web evolves and additional media become commonplace, search engines will undoubtedly offer new ways of searching for this information. But for now, the core function of most Web search engines A Web site that maintains an index and short summaries of billions of pages on the Web, Google being the world's largest. Most search engine sites are free and paid for by advertising banners, while others charge for the service.  is to help users locate text documents.

HTML documents are simple. Each page has two parts: a "head" and a "body" which are clearly separated in the source code of an HTML page. The head portion contains a title, which is displayed (logically enough) in the title bar at the very top of a browser's window. The head portion may also contain some additional metadata describing the document, which can be used by a search engine to help classify the document. For the most part, other than the title, the head of a document contains information and data that helps the Web browser display the page but is irrelevant to a search engine. The body portion contains the actual document itself. This is the meat that the search engine wants to digest.

The simplicity of this format makes it easy for search engines to retrieve HTML documents, index every word on every page, and store them in huge databases that can be searched on demand. Problems arise when content doesn't conform to Verb 1. conform to - satisfy a condition or restriction; "Does this paper meet the requirements for the degree?"
fit, meet

coordinate - be co-ordinated; "These activities coordinate well"
 this simple Web page model. To understand why, it's helpful to consider the process of crawling and the factors that influence whether a page either can or will be successfully crawled and indexed.

The first thing a crawler attempts to determine is whether access to pages on a server it is attempting to crawl is restricted. Webmasters can use three methods to prevent a search engine from indexing a page. Two methods use blocking techniques specified in the Robots Exclusion See robots.txt.  Protocol that most crawlers voluntarily honor and one creates a technical roadblock that cannot be circumvented (Robots Exclusion Protocol, n.d.).

The Robots Exclusion Protocol is a set of rules that enable a Webmaster A person responsible for the implementation of a Web site. Webmasters must be proficient in HTML as well as one or more scripting and interface languages such as JavaScript and Perl. They may also have experience with more than one type of Web server. See Web administrator and Webmistress.  to specify which parts of a server are open to search engine crawlers, and which parts are off-limits. The Webmaster simply creates a list of files or directories that should not be crawled or indexed and saves this list on the server in a file named robots.txt. This optional file, stored by convention at the top level of a Web site, is nothing more than a polite request to the crawler to keep out, but most major search engines respect the protocol and will not index files specified in robots.txt.

The second means of preventing a page from being indexed works in the same way as the robots.txt file, but it is page-specific. Webmasters can prevent a page from being crawled by including a "noindex" metatag See meta tag.  instruction in the "head" portion of the document. Either robots.txt or the noindex metatag can be used to block crawlers. The only difference between the two is that the noindex metatag is page specific, while the robots.txt file can be used to prevent indexing of individual pages, groups of files, or even entire Web sites.

Password protecting a page is the third means of preventing it from being crawled and indexed by a search engine. This technique is much stronger than the first two since it uses a technical barrier rather than a voluntary standard.

Why would a Webmaster block crawlers from a page using the Robots Exclusion Protocol rather than simply password protecting the pages? Password protected pages can be accessed only by the select few users that know the password. Pages excluded from engines using the Robots Exclusion Protocol, on the other hand, can be accessed by anyone except a search engine crawler. The most common reason Webmasters block content from indexing is that a page changes far more frequently than the engines can keep up with.

Pages using any of the three methods described above are part of the Invisible Web. In many cases, they contain no technical roadblocks that prevent crawlers from spidering and indexing the page. They are part of the Invisible Web because the Webmaster has opted to keep them out of the search engines.

Once a crawler has determined whether it is permitted access to a page, the next step is to attempt to fetch it and hand it off to the search engine's indexer component. This crucial step determines to a large degree whether a page is visible or invisible. Let's examine some variations crawlers encounter as they discover pages on the Web, using the same logic they do to determine whether a page is indexable or not.

Case 1

The crawler encounters a page that is straightforward HTML text, possibly including basic Web graphics. This is the most common type of Web page. It is visible and can be indexed, assuming the crawler can discover it.

Case 2

The crawler encounters a page made up of HTML, but it's a form, consisting of text fields, check boxes, or other components requiting user input. It might be a sign-in page, requiring a user name and password. It might be a form requiting the selection of one or more options. The form itself, since it's made up of simple HTML, can be fetched and indexed. But the content behind the form (what the user sees after clicking the submit button) may be invisible to a search engine. There are two possibilities here:

* The form is used simply to select user preferences. Other pages on the site consist of straightforward HTML that can be crawled and indexed (presuming pre·sum·ing  
adj.
Having or showing excessive and arrogant self-confidence; presumptuous.



pre·suming·ly adv.
 there are links from other pages elsewhere on the Web pointing to the pages). In this case, the form and the content behind it are visible and can be included in a search engine index. Quite often, sites like this are specialized search sites for specific types of content. A good example is Hoover's Business Profiles, which provides a form to search for a company, but presents company profiles in straightforward HTML that can be indexed (Hoover's Online, n.d.).

* The form is used to collect user-specified information that will generate dynamic pages when the information is submitted. In this case, although the form is visible, the content "behind" it is invisible. Since the only way to access the content is by using the form, how can a crawler, which is simply designed to request and fetch pages, possibly know what to enter into the form? Since forms can literally have infinite variations, if they function to access dynamic content they are essentially roadblocks for crawlers. A good example of this type of Invisible Web site is the World Bank Group Economics of Tobacco Control Country Data Report Database, which allows you to select any country and choose a wide range of reports for that country (Economics of Tobacco-Country Data Report, n.d.). It's interesting to note here that this database is just one part of a much larger site, the bulk of which is fully visible. So even if the search engines do a comprehensive job of indexing the visible part of the site, this valuable information still remains hidden to all but those searchers who visit the site and discover the database on their own.

In the future, forms will pose less of a challenge to search engines. Several projects are underway aimed at creating more intelligent crawlers that can fill out forms and retrieve information. One approach uses preprogrammed "brokers" designed to interact with the forms of specific databases. Other approaches combine brute force (programming) brute force - A primitive programming style in which the programmer relies on the computer's processing power instead of using his own intelligence to simplify the problem, often ignoring problems of scale and applying naive methods suited to small problems directly  with artificial intelligence to "guess" what to enter into forms, allowing the crawler to "punch through" the form and retrieve information. It's not a trivial problem: In a conversation with Google's Chief Technology Officer, Craig Silverstein, he estimated that it may take as long as fifty years before Google has the capability to index all Invisible Web content. And even if general-purpose search engines do acquire the ability to crawl content in databases, it's likely that the native search tools provided by each database will remain the best way to interact with most databases.

Case 3

The crawler encounters a dynamically generated page assembled and displayed on demand. The telltale sign of a dynamically generated page is the "?" symbol appearing in its URL. Technically, these pages are part of the visible Web. Crawlers can fetch any page that can be displayed in a Web browser, regardless of whether it's a static page stored on a server or generated dynamically. A good example of this type of Invisible Web site is Compaq's experimental SpeechBot search engine, which indexes audio and video content using speech recognition and converts the streaming media See streaming audio, streaming video and digital media hub.  files to viewable text (SpeechBot, n.d.). Somewhat ironically, one could make a good argument that mostsearch engine result pages are themselves Invisible Web content, since they generate dynamic pages on the fly in response to user search terms.

Dynamically generated pages pose a challenge for crawlers. Dynamic pages are created by a script, a computer program that selects from various options to assemble a customized page. Until the script is actually run, a crawler has no way of knowing what it will actually do. The script should simply assemble a customized Web page. Unfortunately, unethical unethical

said of conduct not conforming with professional ethics.
 Webmasters have created scripts to generate literally millions of similar but not quite identical pages in an effort to "spamdex" the search engine with bogus bo·gus  
adj.
Counterfeit or fake; not genuine: bogus money; bogus tasks.



[From obsolete bogus, a device for making counterfeit money.
 pages. Sloppy slop·py  
adj. slop·pi·er, slop·pi·est
1. Marked by a lack of neatness or order; untidy: a sloppy room.

2.
 programming can also result in a script that puts a spider into an endless loop See infinite loop.

endless loop - infinite loop
, repeatedly retrieving the same page.

These "spider traps A spider trap (or crawler trap) is a set of web pages that may intentionally or unintentionally be used to cause a web crawler or search bot to make an infinite number of requests or cause a poorly constructed crawler to crash. " can be a real drag on Verb 1. drag on - last unnecessarily long
drag out

last, endure - persist for a specified period of time; "The bad weather lasted for three days"

2.
 the engines, so most have simply made the decision not to crawl or index URLs that generate dynamic content. They're "apartheid apartheid (əpärt`hīt) [Afrik.,=apartness], system of racial segregation peculiar to the Republic of South Africa, the legal basis of which was largely repealed in 1991–92. " pages on the Web--separate but equal, making up a big portion of the "opaque" Web that potentially can be indexed but is not. Inktomi's FAQ (Frequently Asked Questions) A group of commonly asked questions about a subject along with the answers. Vendors often display them on their Web sites for use as troubleshooting guidelines.  about its crawler, named "Slurp," offers this explanation:
   Slurp now has the ability to crawl dynamic links or dynamically
   generated documents. It will not, however, crawl them by default.
   There are a number of good reasons for this. A couple of reasons
   are that dynamically generated documents can make up infinite URL
   spaces, and that dynamically generated links and documents can be
   different for every retrieval so there is no use in indexing them.
   (Slurp, n.d.)


As crawler technology improves, it's likely that one type of dynamically generated content will increasingly be crawled and indexed. This is content that essentially consists of static pages that are stored in databases for production efficiency reasons. As search engines learn which sites providing dynamically generated content can be trusted not to subject crawlers to spider traps, content from these sites will begin to appear in search engine indices. It's important to note that even as search engines learn which content is acceptable, they still may not index everything, as evidenced by this statement from Google's Webmaster tips page: "We are able to index dynamically generated pages. However, because our web crawler See crawler and WebCrawler.  can easily overwhelm o·ver·whelm  
tr.v. o·ver·whelmed, o·ver·whelm·ing, o·ver·whelms
1. To surge over and submerge; engulf: waves overwhelming the rocky shoreline.

2.
a.
 and crash sites serving dynamic content, we limit the amount of dynamic pages we index" (Google Information for Webmasters, n.d.).

Another development that has reduced the barriers for dynamic content is the increasing adoption of paid inclusion programs by the major search engines. These programs are designed to allow Webmasters to specify specific pages for crawling and guaranteed indexing, in exchange for an annual fee. The search engines give no preferential pref·er·en·tial  
adj.
1. Of, relating to, or giving advantage or preference: preferential treatment.

2.
 treatment to these pages beyond guaranteed indexing, and spam rules still apply. Any pages that violate search engine spare policies, whether crawled or submitted via paid exclusion, are subject to removal from the index. Paid inclusion is a means for search engines to trust dynamic content, on the theory that nobody would willingly pay just to have their content removed anyway.

Case 4

The crawler encounters an HTML page with nothing to index. There are thousands, if not millions, of pages that have a basic HTML framework, but which contain only Flash; images in the .gif, .jpeg, or other Web graphics format; streaming media; or other nontext content in the body of the page. These types of pages are truly parts of the Invisible Web because there's nothing for the search engine to index. Specialized multimedia search engines are able to recognize some of these nontext file types and index minimal information about them, such as file name and size, but these are far from keyword searchable solutions.

Case 5

The crawler encounters a site offering dynamic, real-time data Real-time data denotes information that is delivered immediately after collection. There is no delay in the timeliness of the information provided.

Some uses of this term confuse it with the term dynamic data.
. There are a wide variety of sites providing this kind of information, ranging from real-time stock quotes to airline flight arrival information. These sites are also part of the Invisible Web, because these data streams are, from a practical standpoint, unindexable. While it's technically possible to index many kinds of real-time data streams, the value would only be for historical purposes, and the enormous amount of data captured would quickly strain a search engine's storage capacity, so it's a futile exercise. A good example of this type of Invisible Web site is Cheap Ticket's FlightTracker, which provides real-time flight arrival information taken directly from the cockpit This article is about the flight deck of an aircraft. For other uses, see Cockpit (disambiguation).

A cockpit is the area usually nearer the front of a piloted aircraft from which a pilot controls the aircraft.
 of in-flight airplanes (FlightTracker, n.d.).

Case 6

The crawler encounters a PDF or Postscript file. PDF and Postscript are text formats that preserve the look of a document and display it identically regardless of the type of computer used to view it. While many search engines index PDF files, most do not index the full text of the documents. Google stops indexing after 120KB; AlltheWeb stops indexing after 110KB.

An experimental search engine called ResearchIndex, created by computer scientists at the NEC (NEC Corporation, Tokyo, www.nec.com, www.necus.com) An electronics conglomerate known in the U.S. for its monitors. In Japan, it had the lion's share of the PC market until the late 1990s (see PC 98).

NEC was founded in Tokyo in 1899 as Nippon Electric Company, Ltd.
 Research Institute, not only indexes the full text of PDF and Postscript files, it also takes advantage of the unique features that commonly appear in documents using the format to improve search results (CiteSeer, n.d.). For example, academic papers typically cite other documents and include lists of references to related material. In addition to indexing the full text of documents, ResearchIndex also creates a citation index A citation index is an index of citations between publications, allowing the user to easily establish which later documents cite which earlier documents.

The first citation indices were legal citators such as Shepard's Citations (1873).
 that makes it easy to locate related documents. It also appears that citation Citation

(foaled 1945) U.S. Thoroughbred racehorse. In four seasons he won 32 of 45 races, finished second in ten, and third in two. He won the 1948 Triple Crown, and became the first horse to win $1 million. He set a world record in 1950 by running a mile in 1:33 3/5.
 searching has little overlap with keyword searching, so combining the two can greatly enhance the relevance of results.

Case 7

The crawler encounters a database offering a Web interface. There are tens of thousands of databases containing extremely valuable information available via the Web. But search engines cannot index the material in them. Although we present this as a unique case, Web-accessible databases are essentially a combination of cases 2 and 3. Databases generate Web pages dynamically, responding to commands issued through an HTML form. Though the interface to the database is an HTML form, the database itself may have been created before the development of HTML, and its legacy system is incompatible with protocols used by the engines, or they may require registration to access the data. Finally, they may be proprietary, accessible only to select users, or users who have paid a fee for access.

Ironically, the original HTTP HTTP
 in full HyperText Transfer Protocol

Standard application-level protocol used for exchanging files on the World Wide Web. HTTP runs on top of the TCP/IP protocol.
 specification developed by Web inventor INVENTOR. One who invents or finds out something.
     2. The patent laws of the United States authorize a patent to be issued to the original inventor; if the invention is suggested by another, he is not the inventor within the meaning of those laws; but in that
 Tim Berners-Lee (person) Tim Berners-Lee - The man who invented the World-Wide Web while working at the Center for European Particle Research (CERN). Now Director of the World-Wide Web Consortium.

Tim Berners-Lee graduated from the Queen's College at Oxford University, England, 1976.
 included a feature called format negotiation that allowed a client to say what kinds of data it could handle and allow a server to return data in any acceptable format. Berners-Lee's vision encompassed the information in the Invisible Web, but this vision, at least from a search engine standpoint, has largely been unrealized.

These technical limitations give you an idea of the problems encountered by search engines when they attempt to crawl Web pages and compile indices. There are other, nontechnical reasons why information isn't included in search engines. We look at those next.

FOUR TYPES OF INVISIBLE

Technical reasons aside, there are other reasons that some kinds of material that can be accessed either on or via the Internet are not included in search engines. There are really four "types" of invisible Web content. We make these distinctions not so much to make hard and fast distinctions between the types, hut rather to help illustrate the amorphous Unorganized or vague. A lack of structure. For example, the amorphous state of a spot on a rewritable optical disc means that the laser beam will not be reflected from it, which is in contrast to a crystalline state which will reflect light. See crystalline.  boundary of the Invisible Web that makes defining it in concrete terms so difficult.

The four types of invisible are:

* The "Opaque" Web,

* The Private Web,

* The Proprietary Web, and

* The Truly Invisible Web.

THE "OPAQUE" WEB

The "Opaque" Web consists of files that can be, but are not, included in search engine indices. The Opaque Web is quite large and presents a unique challenge to a searcher. Whereas the deep content in many truly Invisible Web sites is accessible if you know how to find it, material on the Opaque Web is often much harder to find.

The biggest part of the Opaque Web consists of files that the search engines can crawl and index, but simply do not. There are a variety of reasons for this; let's look at them.

Depth of Crawl

Crawling a Web site is a resource-intensive operation. It costs money for a search engine to crawl and index every page on a site. In the past, most engines would merely sample a few pages from a site rather than performing a "deep crawl" that indexed every page, reasoning that a sample provided a "good enough" representation of a site that would satisfy the needs of most searchers. Limiting the depth of crawl also reduced the cost of indexing a particular Web site.

In general, search engines don't reveal how they set the depth of crawl for Web sites. Increasingly, there is a trend to crawl more deeply, to index as many pages as possible. As the cost of crawling and indexing goes down, and the size of search engine indices continues to be a competitive issue, the depth of crawl issue is becoming less of a concern for searchers. Nonetheless, simply because one, fifty, or five thousand pages from a site are crawled and made searchable, there is no guarantee that every page from a site will be crawled and indexed. This problem gets little attention and is one of the top reasons why useful material may be all but invisible to those who only use general-purpose search tools to find Web materials.

Frequency of Crawl

The Web is in a constant state of dynamic flux flux

In metallurgy, any substance introduced in the smelting of ores to promote fluidity and to remove objectionable impurities in the form of slag. Limestone is commonly used for this purpose in smelting iron ores.
. New pages are added constantly, and existing pages are moved or taken off the Web. Even the most powerful crawlers typically visit only about 10 million pages per day, a fraction of the entire number of pages on the Web. This means that each search engine must decide how best to deploy its crawlers, creating a schedule that determines how frequently a particular page or site is visited.

Web Search researchers Steve Lawrence
This is about the singer/actor. For other uses, see Steve Lawrence (disambiguation).


Steve Lawrence (born July 8, 1935) is an American singer, perhaps best known as a member of a duo with his wife Eydie Gormé.
 and Lee Giles Dr. C. Lee Giles is the David Reese Professor at the College of Information Sciences and Technology at the Pennsylvania State University. He is also Professor of Computer Science and Engineering, Professor of Supply Chain and Information Systems, and Director of the Intelligent , writing in the July 8, 1999, issue of Nature, state that "indexing of new or modified pages by just one of the major search engines can take months" (Lawrence and Giles, 1999). While the situation appears to have improved since their study, most engines only completely "refresh (1) To continuously charge a device that cannot hold its content. CRTs must be refreshed, because the phosphors hold their glow for only a few milliseconds. Dynamic RAM chips require refreshing to maintain their charged bit patterns. See vertical scan frequency and redraw. " their indices monthly or even less frequently.

It's not enough for a search engine to simply visit a page once and then assume it's still available thereafter. Crawlers must periodically return to a page to not only verify its existence, but also to download the freshest copy of the page and perhaps fetch new pages that have been added to a site. According to according to
prep.
1. As stated or indicated by; on the authority of: according to historians.

2. In keeping with: according to instructions.

3.
 one study, it appears that the half-life of a Web page is somewhat less than two years and the half-life of a Web site is somewhat more than two years. Put differently Adv. 1. put differently - otherwise stated; "in other words, we are broke"
in other words
, this means that if a crawler returned to a site spidered two years ago it would contain the same number of URLs, but only half of the original pages would still exist, having been replaced by new ones ("Graph Structure in the Web," n.d.; "Altavista, Compaq, and IBM (International Business Machines Corporation, Armonk, NY, www.ibm.com) The world's largest computer company. IBM's product lines include the S/390 mainframes (zSeries), AS/400 midrange business systems (iSeries), RS/6000 workstations and servers (pSeries), Intel-based servers (xSeries) ," n.d.).

New sites are the most susceptible to oversight by search engines because relatively few other sites on the Web will have linked to them compared to more established sites. Until search engines index these new sites, they remain part of the Invisible Web.

Maximum Number of Viewable Results

It's quite common for a search engine to report a very large number of results, sometimes into the millions of documents. However, most engines also restrict the total number of results they will display for a query, typically between 200 and 1,000 documents. For queries that return a huge number of results, this means that the majority of pages the search engine has determined might be relevant are inaccessible inaccessible Surgery adjective Unreachable; referring to a lesion that unmanageable by standard surgical techniques–eg, lesions deep in the brain or adjacent to vital structures–ie, not accessible. See Accessible. , since the result list is arbitrarily truncated truncated adjective Shortened . Those pages that don't make the cut are effectively invisible.

Good searchers are aware of this problem and will take steps to circumvent cir·cum·vent  
tr.v. cir·cum·vent·ed, cir·cum·vent·ing, cir·cum·vents
1. To surround (an enemy, for example); enclose or entrap.

2. To go around; bypass: circumvented the city.
 it by using a more precise search strategy and the advanced filtering and limiting controls offered by many engines. However, for many inexperienced in·ex·pe·ri·ence  
n.
1. Lack of experience.

2. Lack of the knowledge gained from experience.



in
 searchers this limit on the total number of viewable hits can be a problem. What happens if the answer you need is available (with a more carefully crafted search) but cannot be viewed using your current search terms?

Disconnected URLs

For a search engine crawler to access a page, one of two things must take place. Either the Web page author uses the search engine's "Submit URL" feature to request that the crawler visit and index the page, or the crawler discovers the page on its own by finding a link to the page on some other page. Web pages that aren't submitted directly to the search engines, and that don't have links pointing to them from other Web pages, are called "disconnected" URLs and cannot be spidered or indexed simply because the crawler has no way to find them.

Quite often, these pages present no technical barrier for a search engine. But the authors of disconnected pages are clearly unaware of the requirements for having their pages indexed. A May 2000 study by IBM, AltaVista, and Compaq discovered that the total number of disconnected URLs makes up about 20 percent of the potentially indexable Web, so this isn't an insignificant problem ("Graph Structure in the Web," n.d.; "Altavista, Compaq, and IBM," n.d.).

In summary, the Opaque Web is large, but is not impenetrable im·pen·e·tra·ble  
adj.
1. Impossible to penetrate or enter: an impenetrable fortress.

2. Impossible to understand; incomprehensible: impenetrable jargon.
. Determined searchers can often find material on the Opaque Web, and search engines are constantly improving their methods for locating and indexing Opaque Web material.

The three other types of invisible are more problematic, as we'll see.

THE PRIVATE WEB

The Private Web consists of technically indexable Web pages that have deliberately been excluded from inclusion in search engines. There are three ways Webmasters can exclude a page from a search engine:

* Password protect the page. A search engine spider cannot go past the form that requires a username The name you use to identify yourself when logging into a computer system or online service. Both a username (user ID) and a password are required. In an Internet e-mail address, the username is the left part before the @ sign. For example, KARENB is the username in karenb@mycompany.  and password;

* Use the robots.txt file to disallow To exclude; reject; deny the force or validity of.

The term disallow is applied to such things as an insurance company's refusal to pay a claim.
 a search spider from accessing the page;

* Use the "noindex" metatag to prevent the spider from reading past the head portion of the page and indexing the body.

For the most part, the Private Web is of little concern to most searchers. Private Web pages simply use the public Web as an efficient delivery and access medium, but in general are not intended for use beyond the people who have permission to access the pages.

There are other types of pages that have restricted access that may be of interest to searchers, yet they typically aren't included in search engine indices. These pages are part of the "Proprietary" Web, which we describe next.

THE PROPRIETARY WEB

Search engines cannot for the most part access pages on the Proprietary Web, because these pages are only accessible to people who have agreed to special terms in exchange for viewing the content. Proprietary pages may simply be content that's only accessible to users willing to register to view them. Registration in many cases is free, but a search crawler clearly cannot satisfy the requirements of even the simplest registration process.

Other types of proprietary content are available only for a fee, whether on a per-page basis or via some sort of subscription mechanism. Examples of proprietary fee-based Web sites include Hoover's and the Wall Street Journal Interactive Edition.

Proprietary Web services (1) Loosely, any online service delivered over the Web. Such usage appears in articles from non-technical sources, but not in IT-oriented publications, because definition #2 below describes the correct use of the term.  are not the same as traditional online information providers, such as Dialog, Lexis-Nexis, and Dow Jones Dow Jones

the best known of several U.S. indexes of movements in price on Wall Street. [Am. Hist.: Payton, 202]

See : Finance
. These services offer Web access to proprietary information but use legacy database systems that existed long before the Web came into being. While the content offered by these services is exceptional, they are not considered to be Web or Internet providers Internet provider - Internet Service Provider .

THE TRULY INVISIBLE WEB

Some Web sites or pages are truly invisible, meaning that there are technical reasons that search engines can't spider or index the material they have to offer. A definition of what constitutes a truly invisible resource must necessarily be somewhat fluid, since the engines are constantly improving and adapting their methods to embrace new types of content. But at the time of writing truly invisible content consisted of several types of resources.

The simplest, and least likely to remain invisible over time, are Web pages that use file formats that current generation Web crawlers aren't programmed to handle. These file formats include PDF, postscript, Flash, Shockwave, executables (programs), and compressed files. There are two reasons search engines do not currently index these types of files. First, the files have little or no textual context, so it's difficult to categorize cat·e·go·rize  
tr.v. cat·e·go·rized, cat·e·go·riz·ing, cat·e·go·riz·es
To put into a category or categories; classify.



cat
 them, or compare them for relevance to other text documents. The addition of metadata to the HTML container carrying the file could solve this problem--but it would nonetheless be the metadata description that got indexed rather than the contents of the file itself.

The second reason certain types of files don't appear in search indices is simply because the search engines have chosen to omit o·mit  
tr.v. o·mit·ted, o·mit·ting, o·mits
1. To fail to include or mention; leave out: omit a word.

2.
a. To pass over; neglect.

b.
 them. They can be indexed, but aren't. You can see a great example of this in action with the Research Index engine, which retrieves and indexes PDF, Postscript, and even compressed files in real time, creating a searchable database Refers to databases on the Web that are searchable by typing in a query. The term is quite redundant because all databases are searchable. In fact, that is one of their major features.  that's specific to your query. AltaVista's Search Engine product for creating local site search services is capable of indexing more than 250 file formats, but the flagship public search engine includes only a few of these formats. It's typically lack of willingness, not an ability issue with file formats.

More problematic are dynamically generated Web pages. Again, in some cases, it's not a technical problem but rather unwillingness on the part of the engines to index this type of content. This occurs specifically when a noninteractive script is used to generate a page. These are static pages, and generate static HTML An HTML page (Web page) that displays the same information for all users. Although it may be updated from time to time, it does not change with each user retrieval. Contrast with dynamic HTML.  that the engine could spider. The problem is that unscrupulous use of scripts can also lead crawlers into "spider traps" where the spider is literally trapped within a huge site of thousands, if not millions, of pages designed solely to spam the search engine. This is a major problem for the engines, so they've simply opted not to index URLs that contain script commands.

Finally, information stored in relational databases relational database

Database in which all data are represented in tabular form. The description of a particular entity is provided by the set of its attribute values, stored as one row or record of the table, called a tuple.
, which cannot be extracted without a specific query to the database, is truly invisible. Crawlers aren't programmed to understand either the database structure, or the command language used to extract information.

CONCLUSION

The Invisible Web is a vast portion of cyberspace Coined by William Gibson in his 1984 novel "Neuromancer," it is a futuristic computer network that people use by plugging their minds into it! The term now refers to the Internet or to the online or digital world in general. See Internet and virtual reality. Contrast with meatspace. , and offers invaluable resources that should not be overlooked by serious searchers. Although search engine technology continues to improve, the Invisible Web is largely an intractable intractable /in·trac·ta·ble/ (in-trak´tah-b'l) resistant to cure, relief, or control.

in·trac·ta·ble
adj.
1. Difficult to manage or govern; stubborn.

2.
 problem that will be with us for some time to come. Although it's a vast and useful resource, it's important not to get bogged down in the semantics semantics [Gr.,=significant] in general, the study of the relationship between words and meanings. The empirical study of word meanings and sentence meanings in existing languages is a branch of linguistics; the abstract study of meaning in relation to language or . An information professional should treat these types of resources like traditional reference tools. Learn what's available and have them ready to go. The best way for searchers to access the Invisible Web is to build and bookmark A stored location for quick retrieval at a later date. Web browsers provide bookmarks that contain the addresses (URLs) of favorite sites. Most electronic references, large text databases and help systems provide bookmarks that mark a location users want to revisit in the future.  a personal collection of resources, treating them as a personal "reference library," and using them when needed, rather than relying on search engines that in many cases simply cannot access the content residing on the Invisible Web.

REFERENCES

Altavista, Compaq, and IBM researchers create world's largest, most accurate picture of the Web. (n.d.). [Summary of "Graph Structure in the Web," (n.d.)]. Retrieved August 27, 2003, from http://www.almaden.ibm.com/almaden/webmap_release.html.

CiteSeer: The NEC Research Institute Scientific Literature Digital Library. (n.d.). Retrieved April 17, 2003, from http://www.researchindex.com. Commonly referred to as ResearchIndex.

Economics of tobacco-country data report. (n.d.). Retrieved April 14, 2003, from http://wwwl. worldbank.org/tobacco/database.asp.

FlightTracker. (n.d.). Retrieved April 16, 2003, from CheapTickets Travel Web site: http:// www.cheaptickets.com/trs/cheaptickets/flighttracker /flight_tracker_graphic.xsl.

Google catalogs. (n.d.). Retrieved April 17, 2003, from http://catalogs.google.com.

Google information for Webmasters. (n.d.). Retrieved April 17, 2003, from http://www.google.com/webmasters/2.html.

Graph structure in the Web. (n.d.). Retrieved August 27, 2003, from http://www9.org/ w9cdrom/160/160.html.

Hoover's online. (n.d.). Retrieved April 15, 2003, from http://www.hoovers.com/.

Lawrence, S., & Giles, C. L. (1999). Accessibility of information on the web. Nature, 400, 107-109. Additional material can be found at: http://wwwmetrics.com/.

Robots exclusion. (n.d.). Retrieved April 17, 2003, from http://www.robotstxt.org/wc/ exclusion.html.

Shadow TV. (n.d.). Retrieved March 17, 2003, from http://www.shadowtv.com.

Singingfish. (n.d.). Retrieved April 9, 2003, from http://www.singingfish.com/.

Slurp. (n.d.). Retrieved April 17, 2003, from http://www.inktomi.com/slurp.html.

Speechbot. (n.d.). Retrieved April 15, 2003, from http://www.speechbot.com.

ADDITIONAL READINGS

Guernsey, L. (2001, January 25). Mining the "deep web" with sharper shovels. New York New York, state, United States
New York, Middle Atlantic state of the United States. It is bordered by Vermont, Massachusetts, Connecticut, and the Atlantic Ocean (E), New Jersey and Pennsylvania (S), Lakes Erie and Ontario and the Canadian province of
 Times, p. G1.

Price, G. (2002). Specialized search engine See vertical search engine.  FAQs: More questions, answers, and issues. Searcher, 10(9), 42-48.

Price, G., & Sherman, C. (2001). Exploring the invisible Web: Seven essential strategies. Online, 25(4), 32-35.

Sherman, C. (2001). Google unveils more of the invisible Web. Search Day. Retrieved April 17, 2003, from http://www.searchenginewatch.com/searchday/article.php/2158091.

Chris Sherman, President, Searchwise, 898 Rockway Place, Boulder, CO 80303; Gary Price Gary Price is a librarian, best known for founding ResourceShelf.com and originating Price's List of Lists, "a database of ranked listings of companies, people and resources freely available on the Internet", which is now maintained by others. , Librarian, Gary Price Library Research and Internet Consulting, 107 Kinsman kins·man  
n.
1. A male relative.

2. A man sharing the same racial, cultural, or national background as another.


kinsman
Noun

pl -men
 View Circle, Silver Spring, MD 20901.

GARY PRICE is a librarian, information research consultant, and writer based in suburban Washington, D.C. Gary is the editor and compiler compiler

Computer software that translates (compiles) source code written in a high-level language (e.g., C++) into a set of machine-language instructions that can be understood by a digital computer's CPU.
 of the Resource Shelf (http://www.resourceshelf.com), a daily electronic newsletter. A native of the Chicago area, he earned his M.L.I.S. degree from Wayne State University Wayne State University, at Detroit, Mich.; state supported; coeducational; established 1956 as a successor to Wayne Univ. (formed 1934 by a merger of five city colleges).  in Detroit, Michigan “Detroit” redirects here. For other uses, see Detroit (disambiguation).
Detroit (IPA: [dɪˈtʰɹɔɪt]) (French: Détroit, meaning strait
. He also holds a B.A. degree from the University of Kansas The University of Kansas (often referred to as KU or just Kansas) is an institution of higher learning in Lawrence, Kansas. The main campus resides atop Mount Oread.  in Lawrence, Kansas Lawrence, Kansas

Union stronghold where Quantrill’s Confederate band killed more than 150 people (1863). [Am. Hist.: EB, VIII: 338]

See : Massacre
. Price is a Web Search University faculty member and an inductee of the Internet Librarian Hall of Fame.

CHRIS SHERMAN is President of Searchwise, a Boulder, Colorado-based Web consulting firm Noun 1. consulting firm - a firm of experts providing professional advice to an organization for a fee
consulting company

business firm, firm, house - the members of a business organization that owns or operates one or more establishments; "he worked for a
, and Editor of SearchDay, a daily newsletter from SearchEngineWatch.com. He is a regular contributor to Information Today, Online, EContent, and other information industry journals and a regular presenter at information industry conferences and workshops. Sherman holds a master's degree master's degree
n.
An academic degree conferred by a college or university upon those who complete at least one year of prescribed study beyond the bachelor's degree.

Noun 1.
 in Interactive Educational Technology from Stanford University Stanford University, at Stanford, Calif.; coeducational; chartered 1885, opened 1891 as Leland Stanford Junior Univ. (still the legal name). The original campus was designed by Frederick Law Olmsted. David Starr Jordan was its first president.  and a bachelor's degree in Visual Arts visual arts nplartes fpl plásticas

visual arts nplarts mpl plastiques

visual arts npl
 and Communications from the University of California, San Diego UCSD is consistently ranked among the top ten public universities for undergraduate education in the United States by U.S. News & World Report.[3] It is a Public Ivy. [1] For graduate studies, most of UCSD's Ph.D. . He is a Web Search University faculty member and an inductee of the Internet Librarian Hall of Fame.
COPYRIGHT 2003 University of Illinois at Urbana-Champaign
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2003, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

 Reader Opinion

Title:

Comment:



 

Article Details
Printer friendly Cite/link Email Feedback
Author:Price, Gary
Publication:Library Trends
Date:Sep 22, 2003
Words:7789
Previous Article:Creating the front door to government: a case study of the Firstgov portal.
Next Article:Web search: emerging patterns.



Related Articles
Refining your Internet searches.(Brief Article)
Honing our Web search skills. (Convention Panels).
Introducing search engines. (Searching Data).
Introduction.(organizing the Internet)
Search engine marketing campaigns make a local call.(Cutting-Edge Technologies for the Contact Center)(search engine marketing )
Toronto Public Library. Research Ate My Brain: The Panic-proof Guide to Surviving Homework .(Brief article)(Children's review)(Book review)
What can Google do for you? The no-frills search engine has taken the Internet by storm. But if you're using it just for simple Web research, you're...
Invisibility ring.(microwaves used for invisibility)
The Invisible.

Terms of use | Copyright © 2009 Farlex, Inc. | Feedback | For webmasters | Submit articles