Changes in users' Web search performance after ten years.
Searching for information on the Web is a common activity for hundreds of millions of people around the world, virtually on a daily basis. Internet World Stats (2012) estimates that there are now more than two billion people connected to the Internet and a large proportion of these people utilise the Web frequently to look for information. Due to the ubiquity of the Web searching it is suspected that expertise is largely achieved through trial-and-error experience augmented with shared experiences from other people who use the Web. Within the broader scope of Web searching, this study focuses on users' interactions and performance with Web search engines (WSE's). The rapid growth in the size of the Web and the increasing diversity of users has led to more online users turning to WSE's in order to find information. WSE's have been shown to be the dominant information retrieval tool utilised by people searching the Web since the late 1990's (Lawrence and Giles, 1998;Jansen and Spink, 2006; Liaw and Hang, 2006; Slone, 2002; Zhang, 2008) and continue to remain the dominant form of information searching on the Web (Flavian-Blanco et al., 2011; Jansen et al., 2008). Their dominance implies that WSE's clearly assist users in finding information efficiently and accurately (Jansen et al., 2008). In this study we examine this implication empirically by comparing the search performance of a cohort of users in 2000 and another cohort in 2010.
1.1 Web Search Engines (WSE's)
There are essentially four main tasks that WSE's perform: (1) they gather collections of webpages (or index portions of the Web); (2) they organise these collections or indexes in some hierarchical fashion; (3) they allow users to interrogate the collections or indexes and sort them into "results" based on queries submitted to the WSE's (usually as text-based search terms); and (4) they provide hyperlinks to the source webpages (Gordon and Pathak, 1999; Liaw and Huang, 2006). Sullivan (2007) identified two major types of WSE; crawler-based and human-powered WSE's. Crawler-based WSE's use collection algorithms to collect and index webpages automatically (e.g. Google). Human-powered WSE's rely on human-entered descriptions to a database in order to collect and index entries (e.g. Open Directory). Some WSE's combine crawler-based searching with human-powered databases (Sullivan, 2007)--MSN Search for example. In addition to single source WSE's, such as Google and Bing, there are also metasearch WSE's, such as Dogpile or Metacrawler, which combine search results from several WSE's (Jansen et al., 2007). It is also apparent that some users access a number of different WSE's in the course of their interactions with the Web and sometimes different WSE's within the same search task (Thatcher, 2008). This might be based on the awareness that different WSE's possess different qualities such as diverse collection, retrieval (Fortunato et al., 2006), and ranking algorithms (Bar-Ilan et al., 2007).
1.2 How has the Web changed over time?
Much has changed in the Web domain over the last decade. First, the Web is far larger (i.e. exponentially more websites and webpages) and, as a result, is more complex with potentially more irrelevant information to sift through before finding the information one is looking for. Paradoxically though, with more webpages and greater connectivity between those webpages, this means that there are also more possible accessible points for information.
Second, connectivity speed, for most parts of the World, is considerably faster, making it quicker for users to find information without getting frustrated by slow download speeds.
Third, the WSE interfaces have generally changed in the intervening years. In 2000 most WSE interface query webpages contained a text entry box for queries as well as a list of various different search categories (e.g. Arts, Entertainment, News, Government, Health, etc.). In 2010 a number of the interface query webpages of WSE's, such as those of Yahoo!, Excite, and Ask.com, have remained quite complex. However, some of the dominant WSE's (e.g. Google and Bing) have opted for very simple interfaces, with only a text entry box and a search button.
Fourth, in a related point, by 2010 there were also far more (usually portable) devices that could connect to the Web (in 2000 mobile connectivity and wireless networks were still in various stages of infancy). Some of these devices require modified interfaces to present webpages, WSE webpages, and WSE results pages.
Fifth, a significant development in the intervening decade has been to include the Boolean operator "AND" as a default option between search terms, whereas in 2000 most WSE's required users to use "AND" if they wanted both search terms to be included in the relevant search results. Sixth, the webpage/s for the presentation of the search results has also undergone changes over the period 2000 to 2010. Most notably, WSE's in 2010 were more likely to indicate which of the search results were advertisements or paid placements (although this is by no means consistent or transparent). By 2010 WSE search results usually contained several standard elements; including the title of the webpage, a short synopsis (usually with the query terms highlighted), and the URL. In 2000 search results were largely unstandardised and varied considerably between WSE's. Some WSE's experimented with other relevant features such as "like this" and "find similar" to help fine-tune the user's relevance ranking, but some of this functionality has been removed. Seventh, WSE designers have made considerable adjustments to the ways in which databases are collected, the search algorithms of collected databases, and the sorting and presentation algorithms. While the actual data collection algorithms largely remain hidden to end-users, these design features have arguable made searching easier for users as evidenced by the increasing number of search results that WSE's return for simple queries. Eighth, users were more likely to have been exposed to WSE's for longer periods of time in 2010 (i.e. by 2010 many people may have had exposure to a WSE for a decade or longer, whereas in 2000 even the early Web adopters would have had a maximum of 7 years of exposure to the Web's first WSE's). Finally, the WSE landscape has changed considerably. In 2000 no single WSE dominated the WSE domain whereas in 2010 Google was, by far, the most frequently used WSE. According to Sullivan (2006), by 2006 Google already had 50% of the market-share of total searches and according to StatOwl (2012) this had increased to more than 80% by 2010. Given these considerable changes in the WSE landscape, this study investigates how these changes have impacted on Web search performance. In particular, this study was interested in examining the patterns of WSE use as well as the accuracy and efficiency of Web search performance.
1.3 Research aim
This study sought to explore whether users' Web search performance using WSE's systems have significantly improved over a ten year period (i.e. from 2000 to 2010).
H1: Web search speed will significantly improve from 2000 to 2010.
H2: Web searching will involve significantly fewer steps to find an answer from 2000 to 2010.
H3: Web searching will produce significantly more correct answers from 2000 to 2010.
H4: Since Web searching is predicted to be faster, more efficient, and more accurate, satisfaction with the search process is predicted to significantly improve from 2000 to 2010.
H5: Since people have had a chance for greater exposure to the Web, it is predicted that Web experience will have a significant interactive impact on performance.
This study involves a comparative analysis of two groups of participants. A purposive sampling technique known as snowballing was used for both groups (Fife-Shaw, 1997). Snowballing involves approaching participants known to the researchers and then (after data collection) requesting them to suggest anyone suitable that they know to build an expanding network of potential participants. Snowballing was used in the 2000 sample in order to get representivity from participants from different age groups and professions. A purposive snowball sampling technique (all participants were volunteers) was used for the 2010 sample in an attempt to match the 2010 and 2000 samples as closely as possible on age, language, gender, and occupation (e.g. student, school children, and type of occupation). The first group of 80 participants was sampled in 2000 and the second, separate group of 80 participants was sampled ten years later in 2010. Since the respondents in 2000 were anonymous it was not possible to trace the exact same participants which would have enabled a longitudinal within-subjects design. The 2010 sample must therefore be considered as a contrast group. A similar replication study design of changes in mental models of computers was used by Oleson et al. (2010) and Sims et al. (2011).
2.1.1 Sample 1
In the 2000 sample there were 21 undergraduate students, 25 postgraduate students, 7 school children, 7 University staff members, 7 IT professionals, 3 other professionals, and 10 semi-professionals (e.g. secretaries, administrators, shop owners, etc.). The students and University staff were drawn from two Universities in close geographical proximity. There were 50 male participants (30 female participants) and a sample mean age of 23.28 years (range: 16--35 years), 51 respondents who spoke English, 25 who spoke an African language from South Africa, and 4 who spoke a language from elsewhere in Africa.
2.1.2 Sample 2
In the 2010 sample there were 21 undergraduate students, 22 postgraduate students, 8 school children, 11 University staff members, 5 IT professionals, 10 other professionals, and 5 semi-professionals. The students and University staff were drawn from the same two Universities as in the 2000 sample. In the 2010 sample there were 48 male participants (32 female participants) and a sample mean age of 25.78 years (range: 14 52 years), 50 respondents who spoke English, 24 who spoke an African language from South Africa, and 6 who spoke a language from elsewhere in Africa.
The two samples were not statistically different on gender ([chi square] = 0.11, p > 0.05), language ([chi square] = 0.202, p > 0.05), occupation ([chi square] = 6.92, p > 0.05), or age (t = 1.94, p > 0.05).
The identical procedure was followed for both sampling periods. After informed consent was provided, each participant first completed a biographical and Web experience questionnaire (i.e. self-rated Web experience, length of time in months since starting to use the Web, and number of hours per week using the Web). Each participant was then asked to complete a directed search task while having their search actions recorded by a commercially available onscreen capture software programme. Participants were allowed to choose their preferred browser software as in earlier pilot studies this was found to have an influence on the WSE strategy used. In Sample 1, 31 participants chose Netscape Communicator (version 4.5) and 49 participants chose Microsoft Internet Explorer (IE 5.0). In Sample 2, 77 participants chose Microsoft Explorer (IE 8.0) and 3 participants used Firefox (version 3.6). The directed search task was: "Find Bill Clinton's mother's maiden name". This search task was chosen because it has a single correct answer which would not be immediately obvious to participants. Participants were given a maximum of 15 minutes to complete the task, thereafter the task result was considered to be incorrect and the steps and time were recorded up to this point. After each participant had completed the task the researchers cleared all browsing data (i.e. browsing history, download history, emptying the cache, and so forth) to make sure the next participant could not follow the same search path and returned to the same starting page.
2.3 Measures and analysis
The performance measures were the number of steps, time taken, the correctness of the answer, and satisfaction with the directed search task. These measures were compared (2000 (Sample 1) to 2010 (Sample 2)) using a t-test (Huck, 2004). A step was defined as any action (e.g. entering a search term, going to a previous webpage, clicking on a link or opening a new tab) that participants undertook during the task/s. Time was defined as the recorded length of time that had elapsed from the beginning of the task up until participants indicated that they had completed the task (or until the 15 minutes had elapsed). A chi-squared test was used to assess the accuracy of the answer given by participants comparing Sample 1 to Sample 2. In order to assess the impact of Web experience on the performance measures, an ANCOVA was conducted using Web experience as the covariate. Because correctness of the answer was a dichotomous variable the Web experience variables were converted into dichotomous scales using the median split method. A 2x2x2 chi-squared analysis was then conducted when examining the impact of Web experience on the correctness of the answer across the two time points.
In order to contextualise the results it is first necessary to outline how WSE's were used by participants. In Sample 1 participants used a wide range of different WSE's (16 different WSE's in total) whereas the majority of participants (N=73) in Sample 2 used Google as their preferred WSE (see Table 1) and only 4 different WSE's in total. Participants in Sample 1 frequently used more than one WSE (N=19 used 2 WSE's in the completion of the task; N=1 used 3 WSE's; N=1 used 4 WSE's) in order to crosscheck an answer, to conduct multiple searches simultaneously, or because a WSE did not produce a perceived useful result (see Table 1).
Only 7 participants in Sample 2 used multiple WSE's. In addition, 3 participants in Sample 1 and 2 participants in Sample 2 used other online databases that were not specifically WSE's (N=2 used the online Encyclopedia Britannica and N=1 used the CNN news archive in Sample 1 and N=2 used Wikipedia in Sample 2) but were treated as searchable online databases.
The t-test analyses on the performance measures indicated significant differences (see Table 2), with Sample 2 requiring significantly fewer steps and less time to complete the directed search task, but that satisfaction with the search process was statistically non-significant. Sample 2 was significantly more correct in the answers they provided.
The results indicated that Sample 2 enjoyed significantly more Web experience (on measures of self-rated Web experience, time since starting to use the Web, and weekly hours using the Web). As a result, an analysis of covariance (ANCOVA) was conducted with experience being added as a covariate for the performance measures of time, number of steps, and satisfaction. After co-varying for participants' time since using the Web and weekly usage of the Web in the majority of cases, only the main effect of differences between the two samples was statistically significant (see Table 3). The exception was the significant interaction effect between Sample and number of hours per week using the Web for time to completion as the performance measure. In this instance there were main effects for Sample and number of hours per week as well as an interaction effect. Sample 2 had significantly shorter time to completion which interacted with number of hours per week on the Web. There was also a main effect for the number of hours per week using the Web on satisfaction with the search process. However, this effect was statistically consistent across the two samples.
As described earlier, a 2x2x2 chi-squared analysis was conducted to test for interaction effects between Web experience and the correctness of the answers. The only interactive effect was for self-rated Web experience with those who rated themselves as low on Web experience more likely to find the correct answer in Sample 2 ([chi square] = 6.21, p < 0.05).
A large proportion of users in more recent times have gravitated towards Google and have apparently enjoyed considerable success with this WSE (StatOwl, 2012). Those seven respondents who did not use Google also did not use another WSE and instead went straight to a website where they expected to find an answer (e.g. www.whitehouse.gov). In an analysis of the data from 2000, Thatcher (2008) found that Web searchers used different WSE's to broaden their access to search results. Users in 2000 were aware that different WSE's produced different sets of results and that a WSE's search results were not necessarily comprehensive. While researchers are aware that the collection, indexing, organisation, and matching algorithms of different WSE's result in WSE's indexing different portions of the Web (Vaughn and Zhang, 2007), it is not clear whether users continue to be aware of these different capabilities. The narrowing in WSE choice is not only due to the perceived effectiveness of Google, but also due to the changing WSE landscape. Some WSE's simply closed down (or were acquired and then closed down, such as Teoma), unable to compete for advertising revenue, whereas others merged with WSE's with larger market share (e.g. Altavista and Overture merged with Yahoo!). The results for hypotheses 1 to 3 were supported, demonstrating that users have become faster, more efficient (fewer steps), and more accurate in their search processes. However, contrary to hypothesis 4, participants were not significantly more satisfied with the search process. This result could be partly due to the rating scale (1-5 scale) which would have restricted the range of responses and made it more difficult to identify significant differences. It is more likely that this result is due to changing expectations. In 2000, Internet connection speeds were markedly slower and there were far fewer websites from which to find information. Participants might therefore have expected Web search performance to take longer and their expectations regarding the search process would have been similarly adjusted. Since participants were not told whether their answers were correct or not, the correctness of the answer would likely not have had an impact on their levels of satisfaction.
Although there was no specific hypothesis on changes in Web experience, all three of the experience measures used in this study were significantly higher in Sample 2. This is in line with the rapid growth in access to the Web over this time period (Internet World Stats, 2012). It was therefore interesting that Web experience did not act as a covariate. In the majority of instances the main effects of time (i.e. Sample 1 to Sample 2) dominated over the variance in Web experience suggesting that there were factors other than Web experience that explained the variance in performance measures over time. The two exceptions to these non-significant results was the interactive effect of weekly usage of the Web for time to completion and a main effect for satisfaction. One possible explanation is that the average time spent using the Web per week is a more sensitive measure of actual Web experience than how long one has been using the Web or self-rated experience. It was also noted that those participants who rated themselves as 'low' on Web experience in Sample 2 were more likely to locate the correct answer than those participants in Sample 1. It is possible that those who self-rated themselves as 'low' in 2010 were relatively more experienced than those participants who rated themselves as 'low' in 2000. This subjective measure is more likely to have been an unreliable indicator of underlying Web experience.
An alternative explanation is that the decreased time taken to find answers, the reduced number of steps, and the increased correctness of the answers is related to WSE interface modifications, increased connectivity speeds and the increased amount of information on the Internet. As already noted, WSE interfaces have generally become simpler, they have included a number of Boolean operators as default options, and their collection and database storage algorithms have become more sophisticated and complete. In addition, connectivity speeds are now much higher (explaining the increased speed in task completion but not the decreased number of steps or increased accuracy) and there are more websites to locate the relevant information (possibly explaining the increased correctness but not necessarily the decrease in the number of steps)
It must be noted that this study was not a within-subjects longitudinal design. It is possible that any differences (or lack thereof) between 2000 and 2010 might be as a result of the different individuals in the samples, despite attempting to match on perceived important demographic variables. Ideally, we would have liked to have traced the original participants in Sample 1 but this was not possible due to anonymity in the original data collection. A sample size of 80 participants at each point in time is not representative of the whole Internet population which numbers approximately 2.4 billion (Internet World Stats, 2012), although it is large given the retrospective verbal protocol data collection method. It is therefore difficult to generalise these results to other Internet and WSE users. It is also possible that a directed search task may have 'forced' participants to express their WSE search performance in a particular manner which is different to their day-to-day use of a WSE. Further, the experimental nature of a single, assigned task would have removed many of the characteristics that users typically encounter when interacting normally with a WSE including multitasking (Du and Spink, 2011) and levels of motivation (Thatcher, 2008). It is also possible that a directed search task might be too easy for users. WSE's have improved their collection algorithms to such an extent by 2010 that it was possible to find an answer to the directed search task (used in this study) from the search results page, provided the right search terms were used (this was not possible in 2000). Further investigations would be useful to look at Web searching under more naturalistic conditions and with a broader range of tasks, especially a general purpose browsing task which is more typical of searching for broader information gathering.
In this study we found that users' Web search performance (time taken, number of steps taken, and proportion of correct answers) significantly improved over the time period 2000 to 2010. Using Web experience as a covariate these results suggest that this improvement in performance is more likely to be as a result of structural changes to the Web and the technical improvements in WSE's than to a concomitant increase in Web experience. Future research should examine the impact that time has had on changing users' mental models of WSE's.
Bar-Ilan, J., Keenoy, K., Yaari, E. and Levene, M. (2007). User rankings of search engine rankings. Journal of the American Society of Information Science and Technology 58(9), 1254-1266.
Du, J.T. and Spink, A. (2011). Toward a Web search model: integrating multitasking, cognitive coordination, and cognitive shifts. Journal of the American Society for Information Science and Technology 62(8), 1446-1472.
Fife-Schaw, C. (1997). Surveys and sampling issues. In G.M. Breakwell, S. Hammond and C. Fife-Schaw (Eds.), Research methods in psychology (pp. 99-115). London: Sage Publications.
Flavian-Blanco, C., Gurrea-Sarasa, R. and Orus-Sanclemente, C. (2011). Analyzing the emotional outcomes of the online search behaviour with search engines. Computers in Human Behaviour 27(1), 540-551.
Fortunato, S., Flammini, A., Menczer, F. and Vespignani, A. (2006). Topical interests and mitigation of search engine bias. Proceedings of the National Academy of Science 103(34), 12684-12689.
Gordon, M. and Pathak, P. (1999). Finding information on the World Wide Web: the retrieval effectiveness of search engines. Information Processing and Management 35(2), 141-180.
Huck, S. W. (2004). Reading statistics and research (4th Ed.). Boston: Pearson.
Internet World Stats (2012). World Internet usage and population statistics. [Online] Retrieved 20 June 2013 on the WWW: http://www.internetworldstats.com/stats.htm.
Jansen, B.J., Booth, D.L. and Spink, A. (2008). Determining the informational, navigational, and transactional intent of Web queries. Information Processing and Management 44(3), 1251-1266.
Jansen, B.J., and Spink, A. (2006). How are we searching the World Wide Web? A comparison of nine search engine transaction logs. Information Processing and Management 42(1), 248-263.
Jansen, B.J., Spink, A. and Koshman, S. (2007). Web searcher interaction with the Dogpile.com metasearch engine. Journal of the American Society for Information Science and Technology 58(5), 744-755.
Lawrence, S. and Giles, C.L. (1998). Searching the World Wide Web. Science 280(5360), 98-100.
Liaw, S. and Huang, H. (2006). Information retrieval from the World Wide Web: a user-focused approach based on individual experience with search engines. Computers in Human Behaviour 22(3), 501-517.
Oleson, K.E., Sims, V.K., Chin, M.G., Lum, H.C. and Sinatra, A. (2010). Developmental human factors: children's mental models of computers. Proceedings of the Human Factors and Ergonomics Society Annual Meeting 54 (pp. 1450-1453), Santa Monica: Human Factors and Ergonomics Society.
Sims, V.K., Chin, M.G., Sinatra, A.M., Lum, H.C., Selkowitz, A., Murphy, C.E. and Oleson, K.E. (2011). From TVs to phones: a comparison of adult's mental models of computers 1999 and 2009. Proceedings of the Human factors and Ergonomics Society Annual Meeting 55 (pp. 1394-1397), Santa Monica: Human Factors and Ergonomics Society.
Slone, D.J. (2002). The influence of mental models and goals on search patterns during web interaction. Journal of the American Society for Information Science and Technology 53(13), 1152-1169.
StatOwl (2012). Search engine market share: search engine usage statistics. [Online] Retrieved 7 April 2012 on the WWW: http://www.statowl.com/search_engine_market_share.php.
Sullivan, D. (2006). Nielsen NetRatings Search Engine Ratings. [Online] Retrieved 4 April 2012 on the WWW: http://searchenginewatch.com/2156451.
Sullivan, D. (2007). How search engines work. [Online] Retrieved 28 March 2012 on the WWW: http://searchenginewatch.com/article/2065173/How-Search-Engines-Work.
Thatcher, A. (2008). Web search strategies: the influence of Web experience and task type. Information Processing and Management 44(3), 1308-1329.
Vaughan, L. and Zhang, Y. (2007). Equal representation by search engines? A comparison of websites across countries and domains. Journal of Computer-Mediated Communication 12(3), article 7. [Online] Retrieved 18 April 2009 from the WWW: http://jcmc.indiana.edu/vol12/issue3/vaughan.html.
Zhang, Y. (2008). Undergraduate students' mental models of the Web as an information retrieval system. Journal of the American Society for Information Science and Technology 59(13), 2087-2098.
A Thatcher *
Psychology Department, University of the Witwatersrand Psychology,
Wits, 2050, South Africa
* Corresponding author
Table 1. WSE's used in Sample 1 and Sample 2 WSE chosen N Sample 1 N Sample 2 Google 4 73 Yahoo 33 2 Bing 0 10 AltaVista 10 0 Looksmart 12 0 Infoseek 14 0 Ask(jeeves) 12 2 Goto 5 0 Lycos 3 0 SNAP 2 0 Dogpile (metasearch engine) 2 0 Metacrawler (metasearch engine) 2 0 Hotbot 1 0 Search.com 1 0 Excite 1 0 Inforocket 1 0 Webcrawler 1 0 Note: The total number of WSE's is greater than 80 because some participants used more than 1 WSE Table 2. Performances differences on the search task comparing Sample 1 to Sample 2 Mean Sample 1 (SD) Mean Sample 2 (SD) Time since using Web 36.86 (20.22) 119.43 (45.56) (months) Weekly usage of Web 7.35 (7.86) 16.44 (12.57) (hours) Self-rated Web exp. (1-5 3.32 (0.89) 3.70 (0.75) scale) Steps (number) 14.35 (7.79) 8.32 (4.57) Time (seconds) 501.20 (272.00) 215.00 (131.49) Number of correct 45 57 answers Satisfaction with the whole 3.47 (1.33) 3.63 (0.98) search process t-statistic Time since using Web 14.82 ** (months) Weekly usage of Web 5.49 ** (hours) Self-rated Web exp. (1-5 2.86 * scale) Steps (number) -5.91 ** Time (seconds) -8.41 ** Number of correct 4.48 * answers Satisfaction with the whole 0.85 search process ** p < 0.01; * p < 0.05 Table 3. ANCOVA results for performances measures with Web experience covariates Performance measures Time (seconds) Steps (number) F-statistic F-statistic Sample 7 11 ** 34.05 ** Self-rated Web exp. 3.12 0.78 Sample*self-rated exp. 0.79 0.64 Sample 43.92 ** 34.80 ** Weekly Web usage 11.62 ** 0.79 Sample*Weekly Web 5.95 * 0.74 usage Sample 8.86 ** 34.70 ** Time since using Web 0.76 0.22 Sample* Time since using 0.02 0.86 Web Satisfaction F-statistic Sample 0.34 Self-rated Web exp. 1.61 Sample*self-rated exp. 1.53 Sample 0.75 Weekly Web usage 4.42 * Sample*Weekly Web 2.39 usage Sample 0.72 Time since using Web 0.15 Sample* Time since using 0.48 Web ** p < 0.01; * p < 0.05
|Printer friendly Cite/link Email Feedback|
|Author:||Thatcher, A.; Mlilo, S.|
|Date:||Dec 1, 2014|
|Previous Article:||The incidence of chronic low back pain on employment status in working adults in South Africa.|
|Next Article:||Changes in users' mental models of Web search engines after ten years.|