Web search: emerging patterns.ABSTRACT THIS ARTICLE EXAMINES the public searching of the Web and provides an overview of recent research exploring what we know about how people search the Web. The article reports selected findings from studies conducted from 1997 to 2002 using large-scale Web user data provided by commercial Web companies, including Excite, Ask Jeeves Noun 1. Ask Jeeves - a widely used search engine accepting plain English questions or phrases or terms trademark - a formally registered symbol identifying the manufacturer or distributor of a product , and AlltheWeb.com. We examined what topics people search for on the Web; how people search the Web using keywords in queries during search sessions; and the different types of searches conducted for multimedia, medical, e-commerce, sex, etc., information. Key findings include changes and differences in search topics over time, including a shift from entertainment to e-commerce searching by largely North American North American named after North America. North American blastomycosis see North American blastomycosis. North American cattle tick see boophilusannulatus. users. Findings show little change in current patterns of Web searching by many users from short queries and sessions. Alternatively, we see more complex searching behaviors by some users, including successive and multitasking multitasking Mode of computer operation in which the computer works on multiple tasks at the same time. A task is a computer program (or part of a program) that can be run as a separate entity. searches. INTRODUCTION People are spending increasing amounts of time working with electronic information. Web searching services such as Alta Vista See AltaVista. (World-Wide Web) Alta Vista - A World-Wide Web site provided by Digital which features a very fast Web and Usenet search engine. As of April 1996 its word index is 33GB in size. and Google are now everyday tools for information seeking Information seeking is the process or activity of attempting to obtain information in both human and technological contexts. Information seeking is related to, but yet different from, information retrieval (IR). . The research that explores such issues as the organization of the Web or Web searching trends is becoming more important for users and Web search engines A Web site that maintains an index and short summaries of billions of pages on the Web, Google being the world's largest. Most search engine sites are free and paid for by advertising banners, while others charge for the service. alike. There are many overlapping and related levels of a user's context that are relevant to Web research, including the information environment/social level, organizational level, information-seeking level, human-computer interaction Human-computer interaction An interdisciplinary field focused on the interactions between human users and computer systems, including the user interface and the underlying processes which produce the interactions. level, and query level. In order to better understand how to organize the Web, we also need to understand more about how people interact with and use the Web at these different levels. For many users, Web interactions are often frustrating frus·trate tr.v. frus·trat·ed, frus·trat·ing, frus·trates 1. a. To prevent from accomplishing a purpose or fulfilling a desire; thwart: and constrained con·strain tr.v. con·strained, con·strain·ing, con·strains 1. To compel by physical, moral, or circumstantial force; oblige: felt constrained to object. See Synonyms at force. 2. . A growing body of large-scale quantitative or qualitative studies is exploring these issues, including the effectiveness and limitations of Web search engines (Lawrence & Giles, 1998) and how users search the Web (Silverstein et al., 1999; Wolframn et al., 2001). One outgrowth of Web research is better support for human information behaviors and the development of a new generation of Web tools, such as Web meta--search engines, to help users persist in Verb 1. persist in - do something repeatedly and showing no intention to stop; "We continued our research into the cause of the illness"; "The landlord persists in asking us to move" continue electronic information seeking and help people resolve their information problems. This article reports selected results from a large-scale and ongoing series of studies of searching behavior on commercial Web search engines by a diverse range of users. The research reported in this article is focused at the human-computer interaction and query level of Web user behavior. Selected results are reported from studies of Web query data from Excite, AlltheWeb.com, and Ask Jeeves. The researchers were not able to obtain data from the major Web company Google, but further analysis is being conducted on Web query data from Alta Vista. The goal of these studies is to track trends in the public searching of the Web and explore how the public searches the Web (Spink, Wolfram wolfram: see tungsten. , Jansen, & Saracevic, 2001). WEB QUERY DATA SETS The analysis was conducted on various large sets of Web query data provided by various Web companies from 1997 to 2001. All users were anonymous and could not be identified in any way. But we could identify each user's sequence of queries. Each transaction record contained three fields. With these three fields, researchers were able to locate a user's initial query and recreate the chronological series of actions by each user in a session: Time of Day: measured in hours, minutes, and seconds; User Identification: an anonymous user code assigned by the Web server; Query Terms: exactly as entered by the given user. We focused on three levels of data analysis--sessions, queries, and terms. This large-scale study provides insights into Web searching with implications for developing better search engines and services. WEB SEARCH PATTERNS Selected findings, summarized below, provide interesting insights into current patterns of public Web searching, including how people structure their Web searches, what they search for, and search behavior in special topic areas. Web Queries How long are general Web queries? The mean length of Excite queries increased steadily from 1.5 words in 1996 to 2.6 in 1999, and the mean number of terms in unique queries was 2.4. The mean query length for U.S./U.K. users in 1996 was 1.5 and mean query length for European users in 1997 was 1.5--in 1999 U.S./U.K. users mean query length was 2.6, and for European users it was 1.9. English language English language, member of the West Germanic group of the Germanic subfamily of the Indo-European family of languages (see Germanic languages). Spoken by about 470 million people throughout the world, English is the official language of about 45 nations. queries increased in length more quickly than European language queries. Jansen, Spink, and Saracevic (2000) report that Web queries were short and most users did not enter many queries per search. The mean number of queries per user was 2.8 in 1997. However, a sizable percentage of users did go on to either modify their original query or view subsequent results. On average, a query contained 2.21 terms in 1997. About one in three queries had one term only, two in three had one or two terms, and four in five had one, two, or three terms. Fewer than 4 percent of the queries were comprised of more than six terms. Spink, Jansen, Wolfram, and Saracevic (2002) reported the mean terms per query had increased slightly to 2.6 by 2001. Overall, general Web queries are still short. Use of Boolean Operators One of the Boolean logic operators such as AND, OR and NOT. How frequently are Boolean operators used during Web searching? The use of Boolean operators (AND, OR, NOT, +, -) increased from 22 percent of queries in 1997 to 28 percent of queries in 1999. From the 1996-99 data set, approximately 8 percent of searches included proximity searching Proximity search can mean:
Web Query Reformulation Do Web search engine See Web search engines. users reformulate Verb 1. reformulate - formulate or develop again, of an improved theory or hypothesis redevelop formulate, explicate, develop - elaborate, as of theories and hypotheses; "Could you develop the ideas in your thesis" their queries? Spink, Jansen, and Ozmultu (2000) found that most users searched one query only and did not follow with successive queries. The average session, ignoring identical queries, included 1.6 queries. About two in three users submitted a single query, and six in seven did not go beyond two queries. Spink, Jansen, Wolfram, and Saracevic (2002) reported that in 2001 some 44 percent of users modified their queries with 25 percent of users entering three or more queries. Overall, most users still enter only one or two queries and conduct little query reformulation. Question and Request Format Web Queries Do users enter queries in question or request format? Spink and Ozmutlu (2002) report that only 50 percent of Ask Jeeves users entered queries in question format. Most questions began with the words "Where do I find ...?" Some 25 percent of users phrased their queries as requests, most commonly "Get me information...." Overall, most general Web queries are in query rather than question format. Search Terms: Distribution What is the distribution of search terms? Jansen, Spink, and Saracevic (2000) report the distribution of the frequency of use of terms in queries as highly skewed skewed curve of a usually unimodal distribution with one tail drawn out more than the other and the median will lie above or below the mean. skewed Epidemiology adjective Referring to an asymmetrical distribution of a population or of data . A few terms were used repeatedly and many terms were used only once. On the top of the list, the sixty-three subject terms that had a frequency of appearance of 100 or more represented only one-third of 1 percent of all terms, but they accounted for about one of every ten terms used in all queries. Terms that appeared only once amounted to half of the unique terms. By 2001, 615 terms were not repeated in the dataset, as reported by Spink, Jansen, Wolfram, and Saracevic (2002). Overall, Web searching involves a small percentage of high-frequency terms and many low-frequency terms. Use of Relevance Feedback Relevance feedback is a feature of some information retrieval systems. The idea behind relevance feedback is to take the results that are initially returned from a given query and to use information about whether or not those results are relevant to perform a new query. How frequently are relevance feedback commands used? Analysis of Web searches shows that, when available, relevance feedback is rarely used. About one in twenty queries used the feature "More Like This." Spink, Jansen, and Ozmultu (2000) found that one-third of Excite users went beyond the single query, with a smaller group using either query modification or relevance feedback or viewing more than the first page of results. They examined the occurrence of each query type (unique, modified, relevance feedback, view a results page, etc.) in a large sample of user sessions A count of how many times all users access a Web site regardless whether the same person came back several times during the measurement period. If a user leaves and returns within a short time, some systems count those sessions as one. Contrast with unique visitors. See also user session. . The distribution of query type changes as the length of the user session increases. For the user sessions of two and three queries, the relevance feedback query is dominant. As the length of the sessions increase, the occurrences of relevance feedback as a percentage of all query types decreases. Some 63 percent of relevance feedback sessions could be construed as being successful. If the partially successful user sessions are included, then more than 80 percent of the relevance feedback sessions provided some measure of success. Viewing Results How many pages of ten hits do users view? This is a very interesting question for users and Web industry people alike. From 1996 to 1999, for more than 70 percent of the time, a user only viewed the top ten results. On average, users viewed 2.35 pages of results (where one page equals ten hits). Over half the users did not access results beyond the first page. Jansen, Spink, and Saracevic (2000) found that more than three in four users did not go beyond viewing two pages. By 2001, only roughly one-third of users looked beyond the second page of Web sites retrieved (Spink, Jansen, Wolfram, & Saracevic, 2002). WEB SEARCH TOPICS Users search the Web on an infinite variety of topics. The next section focuses on what we know about how users search on particular topics such as sex, e-commerce, and medical information. Spink, Jansen, Wolfram, and Saracevic (2002) report a shift in Web search topics from entertainment and sex in 1997 to commerce, travel, employment, economy, people, places, and things in 2001. Search topics have shifted from entertainment to e-commerce as the content of the Web has shifted more toward business. Sexually Related Searching Jansen, Spink, and Saracevic (2000) found searching about sex on Excite represents only a small proportion of all searches. When the top frequency terms are classified as to subject, the top category is "Sexual." As to the frequency of appearance, about one in every four terms in the list of sixty-three highest used terms can be classified as sexual in nature. But while sexual terms are high as a category, they still represent a very small proportion of all terms. Many other subjects are searched and the diversity of subjects searched is very high. Spink, Ozmultu, and Lorence (in press) found that sexually related searches were longer than general searches and involved viewing more pages of Web sites. Overall, sexual Web searchers are more persistent and likely to be seeking images. Medical and Health-related Web Searching Medical and health-related information is proliferating Proliferating is the multiplication of a certain thing. Often it is used as a biological term to describe the increase of cells due to cell division. Look under proliferate or proliferation for more details. on the Web. Spink, Yang, Nykanen, Lorence, Ozmutlu, and Ozmutlu (in press) found that a small percentage of Web searching is medical or health-related. The top five categories of medical or health advice sought were general health, weight issues, reproductive health Within the framework of WHO's definition of health[1] as a state of complete physical, mental and social well-being, and not merely the absence of disease or infirmity, reproductive health, or sexual health/hygiene and puberty puberty (py `bərtē), period during which the onset of sexual maturity occurs. , pregnancy/obstetrics, and human relationships. Trends show that medical and health queries have declined as a proportion of Web queries as the use of specialized medical/health Web sites and e-commerce-related queries has increased, but e-commerce-related searching has increased substantially. E-Commerce Searching E-commerce queries are increasing on the Web (Spink & Guner, 2001). Web queries are a primary means for translating people's business product, service, and information needs for e-commerce. Spink and Guner (2001) found that business queries often include more search terms than other types of queries, are less modified, lead to fewer Web pages viewed, and include less advanced search features. Company or product name queries were the most common form of business. The most common business-related query submitted to Ask Jeeves was "Where can I buy ..." or the request "I want to buy ..." Spink, Jansen, Wolfram, and Saracevic (2002) found that by 2001 the largest category of Web searches were e-commerce related. Multimedia Searching Goodrum and Spink (2001) conducted a specific analysis of image queries within the 1.2 million queries. Provisions for image searching by Web search engines are important for users. Users seeking images input relatively few terms to specify their image information needs on the Web. Users seeking images interact iteratively during the course of a single session but input relatively few queries overall. Most image terms are used infrequently in·fre·quent adj. 1. Not occurring regularly; occasional or rare: an infrequent guest. 2. with the top term occurring in less than 9 percent of queries. Jansen, Spink, and Saracevic (2000) found that many terms were unique in the large data sets, with over half of the terms used only once. Terms indicating sexual or adult content materials appear frequently in image queries. They represented a quarter of the most frequently occurring terms but were a small percentage of the total terms. Overall, multimedia searching is shifting as the content of the Web changes (Jansen, Goodrum, & Spink, 2000; Ozmutlu, Spink, & Ozmutlu, 2002). LONGITUDINAL SEARCH PATTERNS Despite the generally short nature of user Web queries and search sessions, recent studies are also showing that some users are engaging in more complex Web search interactions. Successive Searching How many Web searches do users conduct on a particular topic? Spink, Bateman, and Jansen (1999) conducted an interactive survey of over three hundred Excite users and found that many had conducted two searches or three or more related searches using the Excite search engine over time when seeking information on a particular topic. Successive searches often involved a refinement or extension of the previous searches as new databases were searched and search terms changed as the Excite users' understanding and evaluation of results evolved over time from one successive search to the next. Multitasking Search How many topics are users searching for? Spink, Ozmutlu, and Ozmutlu (2002) found that many Web searches involved users seeking information on two or more topics concurrently. Overall, we see some users moving toward more complex searches that involve multiple related interactions and multiple topics. DISCUSSION The research we conducted over the last five years shows some interesting patterns and trends in general Web searching. In summary, most Web queries are short, without much modification, and simple in structure. Few queries incorporate advanced search techniques and, when they are used, many mistakes result. However, advanced search features are slowly growing in use. Many people retrieve a large number of Web sites, but view few results pages and tend not to browse beyond the first or second results pages. Overall, a small number of terms are used with high frequency and many terms are used once. Web queries are very rich in subject diversity, and some are unique. The subject distribution of Web queries does not seem to map to the distribution of Web sites' subject content. Some users are engaging in more longitudinal Web searching practices during their information-seeking processes that are not well supported by Web search technologies. We can see that Web searching is growing as a huge public challenge, but it is an imprecise im·pre·cise adj. Not precise. im pre·cise ly adv. and challenging skill. Insights into Web searching trends and patterns have implications for the organization of the Web. A key problem for Web organization is that people in general do not really understand how Web search engines work or the structure of the Web. The Web is a creature of interaction, yet many Web interactions are subject to limitations due to a lack of information and training by users. In general, Web search engines do not explain the Web to users and do not tell users that their search engines only cover a limited number of Web sites. Web culture is based on a "quick and dirty" approach to searching, rather than an exploratory, interactive approach. Web organizational issues and search issues are related. The success of users' search interactions depends on the intersection of more effective search techniques and serf-user training. CONCLUSION AND FURTHER RESEARCH Our ongoing study of Web searching is examining a number of large-scale Web query transaction logs. These studies, using large-scale log data, are showing some interesting trends and patterns in general Web searching and helping to answer some interesting questions about Web searching. Due to the nature of the data, the research cannot address the results of users' queries or assess the performance of different search engines. However, the findings do provide a snapshot for comparison of public Web searching that can help improve Web search engines and services. Further research is currently being conducted, using query data from Alta Vista, to explore Web search including the similarities and/or differences between North American and European users. Ongoing Web user behavior research is further identifying trends and impacting the development of new types of user training, interfaces and software agents, and new organizational schemas Schemas Fundamental core beliefs or assumptions that are part of the perceptual filter people use to view the world. Cognitive-behavioral therapy seeks to change maladaptive schemas. to aid users in better Web searching. REFERENCES Goodrum, A., & Spink, A. (2001). Image searching on the Excite web search engine. Information Processing information processing: see data processing. information processing Acquisition, recording, organization, retrieval, display, and dissemination of information. Today the term usually refers to computer-based operations. and Management, 37(2), 95-312. Jansen, B.J., Goodrum, A., & Spink, A. (2000). Searching for multimedia: An analysis of audio, video, and image Web queries. World Wide Web: An International Journal, 3(4), 249-254. Jansen, B.J., Spink, A., & Saracevic, T. (2000). Real life, real users and real needs: A study and analysis of users' queries on the Web. Information Processing and Management, 36(2), 207-227. Lawrence, S., & Giles, C. L. (1998). Searching the World Wide Web. Science, 280(5360), 98-100. Ozmutlu, S., Spink, A., & Ozmutlu, H. C. (2002). Trends in multimedia Web searching: 1997-2001. Information Processing and Management, 38(3), 475-496. Silverstein, C., Henzinger, M., Marais, H., & Moricz, M. (1999). Analysis of a very large Web search engine query log. ACM (Association for Computing Machinery, New York, www.acm.org) A membership organization founded in 1947 dedicated to advancing the arts and sciences of information processing. In addition to awards and publications, ACM also maintains special interest groups (SIGs) in the computer field. SIGIR SIGIR Special Interest Group on Information Retrieval (Association for Computing Machinery) SIGIR Special Inspector General for Iraq Reconstruction Forum, 33, 3. Spink, A., Bateman, J., & Jansen, B.J. (1999). Searching the Web: Survey of EXCITE users. Internet Research This article is about using the Internet for research; for the field of research about the Internet, see Internet studies. Internet research is the practice of using the Internet, especially the World Wide Web, for research. : Electronic Networking Applications and Polity, 9(2), 117-128. Spink, A., & Guner, O. (2001, July). E-commerce Web queries: Excite and Ask Jeeves study. First Monday First Monday is a short-lived U.S. television drama centered on the U.S. Supreme Court. Created by JAG creator Donald Bellisario, the show aired on CBS from January until May of 2002. , 6(7). Spink, A.,Jansen, B.J., & Ozmultu, H. C. (2000). Use of query reformulation and relevance feedback by Web users. Internet Research: Electronic Networking Applications and Policy, 10(4), 317-328. Spink, A., Jansen, B.J., Wolfram, D., & Saracevic, T. (2002). From e-sex to e-commerce: Web search changes. IEEE (Institute of Electrical and Electronics Engineers, New York, www.ieee.org) A membership organization that includes engineers, scientists and students in electronics and allied fields. Computer, 35(3), 133-135. Spink, A., & Ozmutlu, H. C. (2002). Characteristics of question format Web queries: An exploratory study. Information Processing and Management, 38(4), 453-471. Spink, A., Ozmudu, H. C., & Lorence, D. P. (in press). Web searching for sexual information: An exploratory study. Information Processing and Management. Spink, A., Ozmutlu, H. C., & Ozmutlu, S. (2002). Multitasking information seeking and searching processes. Journal of the American Society for Information Science and Technology The American Society for Information Science and Technology (also referred to as ASIST or ASIS&T) is an organization of information professionals. Established in 1937, the organization sponsors an annual conference and publishes proceedings from this conference under , 53(8), 639-652. Spink, A., Wolfram, D., Jansen, B.J., & Saracevic, T. (2001). Searching the Web: The public and their queries. Journal of the American Society for Information .Science, 53(2), 226-234. Spink, A., Yang, Y., Nykanen, P., Lorence, D. P., Jansen, B. J., Ozmudu, S., & Ozmutlu, H. C. (in press). Medical and health Web searching: An exploratory study. Wolfram, D., Spink, A., Jansen, B. J., & Saracevic, T. (2001). Vox populi vox populi Voice of the people Sociology A language, as spoken, which includes slang and jargon. See Jargon, Slang. : The public searching of the Web. Journal of the American Society for Information Science and Technology, 52(12), 1073-1074. Amanda Spink, Associate Professor, School of Information Sciences, University of Pittsburgh, 610 IS Building, 135 N. Bellefield Avenue, Pittsburgh PA 15260. AMANDA SPINK is Associate Professor at the School of Information Sciences at the University of Pittsburgh. She has a B.A. (Australian National University Australian National University, located in Canberra and state-sponsored, founded 1946 as Australia's only completely research-oriented university. Originally limited to graduate studies, it expanded in 1960, merging with Canberra University College (est. 1929). ) ; Graduate Diploma A Graduate Diploma is generally a postgraduate qualification. Australia
Postgraduate diplomas offered in Australia are typical of those offered in England, Wales, and Ireland. of Librarianship (University of New South Wales The University of New South Wales, also known as UNSW or colloquially as New South, is a university situated in Kensington, a suburb in Sydney, New South Wales, Australia. ); M.B.A. (Fordham University Fordham University (fôr`dəm), in New York City; Jesuit; coeducational; founded as St. John's College 1841, chartered as a university 1846; renamed 1907. Fordham College for men and Thomas More College for women merged in 1974. ), and a Ph.D. in Information Science (Rutgers University Rutgers University, main campus at New Brunswick, N.J.; land-grant and state supported; coeducational except for Douglass College; chartered 1766 as Queen's College, opened 1771. Campuses and Facilities Rutgers maintains three campuses. ). Dr. Spink's research focuses on theoretical and applied studies of human information behavior and interactive information retrieval information retrieval Recovery of information, especially in a database stored in a computer. Two main approaches are matching words in the query against the database index (keyword searching) and traversing the database using hypertext or hypermedia links. (IR), including Web and digital libraries studies. The National Science Foundation, Andrew R. Mellon Foundation Mellon Foundation, officially the Andrew W. Mellon Foundation, philanthropic trust formed (1969) through the merger of the Avalon Foundation (est. 1940 by Ailsa Mellon Bruce) and the Old Dominion Foundation (est. 1941 by Paul Mellon). , NEC (NEC Corporation, Tokyo, www.nec.com, www.necus.com) An electronics conglomerate known in the U.S. for its monitors. In Japan, it had the lion's share of the PC market until the late 1990s (see PC 98). NEC was founded in Tokyo in 1899 as Nippon Electric Company, Ltd. , IBM (International Business Machines Corporation, Armonk, NY, www.ibm.com) The world's largest computer company. IBM's product lines include the S/390 mainframes (zSeries), AS/400 midrange business systems (iSeries), RS/6000 workstations and servers (pSeries), Intel-based servers (xSeries) , Excite, FAST, and Lockheed Martin For the former company, see . Lockheed Martin (NYSE: LMT) is a leading multinational aerospace manufacturer and advanced technology company formed in 1995 by the merger of Lockheed Corporation with Martin Marietta. have sponsored her research. She has published over 180journal articles and conference papers, with many in the Journal of the American Society for Information Science and Technology, Information Processing and Management, Interacting with Computers, IEEE Computer, Internet Research, the ASIST ASIST Cardiology A clinical trial–Atenolol Silent Ischemia Study that evaluated the effect of atenolol on M&M in Pts with CAD and/or silent myocardial ischemia. See Atenolol, Coronary artery disease, Silent ischemia. and ISIC ISIC International Student Identity Card ISIC Information Storage Industry Center ISIC International Standard Industrial Classification ISIC International Symposium on Intelligent Control (IEEE) ISIC Immediate Superior In Command Conferences. |
|
||||||||||||||||||

`bərtē)
pre·cise
ly adv.
Printer friendly
Cite/link
Email
Feedback
Reader Opinion