Search for news online: challenging traditional methods.
ICF International, Inc. (ICFI) is a management, technology, and policy consulting firm that partners with government and commercial clients. The company works with clients in a variety of markets including energy, environment, and infrastructure; health, social programs, and consumer/financial; and public safety and defense. Our research activities often support client information requests.
We recently had the opportunity to locate news articles for historical news research and ongoing alert-type searches using various resources. We discuss our experiences and findings here, comparing the results returned, time spent reviewing results, and the trade-off between cost to use and search features. We also discuss the lessons learned, including how articles published solely online affect the ways in which news research is conducted today.
Our client tasked us with locating historical newspaper articles (print and electronic) published between 1980 and 2011 and articles currently being published (when we did this project, that meant in 2012). The topic was limited in scope to a specific area within the United States. Relevant content was often found in national and international papers, so we found results in these sources as well as many local publications.
You could well face similar tasks, such as locating information about the increasing traffic congestion in Los Angeles and the potential increase in lung cancer as a result or researching the history of PCB contamination and cleanup efforts in New York's Hudson River over several decades.
For historical articles (defined as those published more than 6 months ago), we relied on traditional news databases. We took a different approach when searching for current news articles on our topic, but for both efforts, we used some of the same resources and therefore could compare our results.
HISTORICAL NEWS SEARCH
To identify historical articles, we used a simple search strategy based on the geographic location and a short list of keywords. We relied on traditional news databases, including Dialog, LexisNexis, and Factiva. Our client had subscriptions with these services that we could use. We realize that NewsBank (newsbank.com), with its 4,631 U.S. news sources (newspapers, news wires, and transcripts) could have helped with this project, but neither ICFI nor the client had access. All the resources we used provide full text, which was a client requirement (pricing varied by source). To supplement our searches in news databases, we searched Google News using its advanced search functions. Finally, we were able to visit the state library (near our site of interest) and search some of their onsite news sources.
On LexisNexis, we searched all English news, which includes major world publications, newspapers, magazines, news wires, broadcast transcripts, blog and web publications, and U.S. newspapers and wires. The search interface is user-friendly and powerful. Duplicates can be removed automatically and we could use complex search strings and proximity operators--requiring terms to be within same sentence, paragraph, or user-defined number of words--which allows for a more precise definition of the context.
Factiva allows subscribers to search more than 35,000 global news and information sources. Factiva claims to have thousands of sources otherwise unavailable on the free web and many items on or before the date of publication. The search interface is easy to use and offers more advanced search functions including Boolean searching, truncation, proximity operators, multiple ways to sort records, and field codes. Users can also elect to remove duplicates.
Dialog contains several news databases that can be searched using complex search queries. (We used DialogLink5 for this search.) Several news databases provide access to major U.S. newspapers and news wires. Advanced search functions allowed us to remove duplicates before reviewing results. Archival news searching is covered by various databases that provide coverage for a small number of major publications. According to Dialog, the PAPERS database is being discontinued (as of January 2013, only The New York Times and USA TODAY are still there). The core newspaper collection to be available on ProQuest Dialog is being built around ProQuest Newsstand. At the moment, it's Dialog's Newsroom files that contain most of their newspaper coverage.
Google News allows searches by time period and with rudimentary Boolean strategies. A news aggregator, it scans more than 4,500 English-language news sites. It uses the Google search engine, so news articles are selected using an automatic aggregation algorithm. The news archives contain scanned images of the original print articles dating back 200 years, depending upon the publication.
COMPARING RETRIEVAL RESULTS
Our initial search Geographic Term AND Topic1 yielded thousands of results in traditional news databases. We were able to lower the number of results that needed to be screened for relevance when we modified the search using proximity operators, Boolean logic, and automatic removal of duplicates (Geographic Term near10 Topicl) AND (subTopicl OR subtopic2 OR subtopic3 OR subtopic4 OR subtopic5 OR subtopic6). Although searches in Dialog's news databases allowed us to remove duplicates, design complex search queries, and use proximity operators, we found that the Dialog databases lacked in-depth coverage of small local newspapers, which was often where we found some of the most relevant articles. In total, we found approximately 500 relevant results from the news databases.
We executed our initial search of Google News using a basic search strategy "Geographic Term" "Topicl". The AND operator is implied. After executing a basic search in Google News, we limited to Any Time (found under Search Tools). We used this function to limit to "Archives" and retrieved more than 900 results. We then searched using Google's AROUND proximity operator "Geographic Term" AROUND(10) "Topic1", but due to the limitations of Google we could not combine multiple terms using the AND operator and combine multiple terms using the OR operator in the same search, as in the traditional news databases. This required us to execute a variety of searches combining Geographic Term and Topic1 with each Subtopic and to limit to various time periods so that we could more efficiently review the results. In total, we added approximately 40 articles that were not captured in the news database searches.
Although we visited the state library, it ultimately proved unnecessary. The subscription database we accessed at the library, Newspaper Archive, did not provide any new results and was clumsy to search. Any time spent beyond an I exploratory visit would have been cost-prohibitive, as I would any manual searches of publications held on micro- I fiche. We also learned that indexing for news articles in the I 1980s was extremely poor, making it much more efficient to use a database such as LexisNexis or Factiva that allows users to search the full text.
As expected, we found significant overlap between the content we reviewed from LexisNexis and Factiva, but each also had some content not found in the other source. It was time-consuming, but manageable, to review the results and identify relevant articles. Searching Google News was more difficult due to the lack of complex search capabilities. Conducting numerous searches and removing duplicates was time-consuming. However, Google News Archives turned up 40 additional articles that were not found by any other sources, including LexisNexis.
Both LexisNexis and Factiva allowed us to remove duplicates, which was a big time-saver. Duplicates in Google News were much harder to identify and remove. Overall, given the volume of articles, duplicates slipped through at every stage. This prompted us to develop a process to remove duplicates in our article database, which took some additional time.
CURRENT NEWS SEARCH
Locating current articles and providing the client with weekly updates required a different approach. Although the client subscribed to LexisNexis and Factiva, we learned when researching historical articles that using only the traditional news databases did not catch all relevant articles. For this part of our research, we relied heavily on freely available nontraditional sources given that the volume of articles would be much smaller. We used Google News, HighBeam Research (highbeam.com), and PressDisplay.com. The client also subscribed to a news-monitoring service that we used to locate articles for weekly updates.
As librarians, we knew the most efficient way to obtain news as it is published was to create automatic alerts in any of the services we employed. We set up Google News alerts and had daily results sent by email. URLs to the articles were provided, so it was relatively easy to view the actual news article.
PressDisplay.com is a self-described online newspaper I kiosk that contains more than 2,000 publications, including ^ international and national newspapers, press releases, and magazines. News articles published during the previous 60 days are available. Although news articles are available from print sources only, the exact replica of the entire print version of the newspaper is displayed. There is no cost to search and view results that include article title, lead sentence, date, source, and page number. PressDisplay.com also offers a free "My Monitor" email alert service.
HighBeam Research is a paid search engine and full-text online archive that provides access to more than 6,500 publications--1,470 of which are national and international newspapers. Searching is available to nonsubscribers, and retrieval can be filtered by publication type and by date. Email alerts are only available to paid subscribers. As a result, we manually searched this source weekly. HighBeam was owned by Gale, a Cengage company, when we searched it, but was acquired by Yippy, Inc. in January 2013.
We arranged for weekly updates from the media-monitoring service. The service applies an algorithm to capture news, and we provided them with keywords to use. [for an overview of several media monitoring services, see "Competition Among Competitive Intelligence Platforms," by Barbie E. Keiser, Online Searcher, January/February 2013.--Ed.]
COMPARING RESULTS AND SAVING TIME
Google News provided almost all of the current news articles and more results than HighBeam Research or PressDisplay.com. It was easy to use, and the email alerts arrived daily. HighBeam Research and the media-monitoring service did not locate any additional news articles. However, PressDisplay.com picked up a full-page ad that was of interest to our client and was caught because the print newspaper was indexed cover-to-cover by this resource only.
A drawback to all of the free sources was the inability to conduct complex Boolean searches and remove duplicates. We found that we had to use broad search strategies with few terms to capture all newly published articles on our topic, which also retuned a considerable number of irrelevant articles.
Obviously, email alerts saved time, and the initial set-up time was comparable for each service. PressDisplay.com provided so few results that very little time was needed to review them. The news-monitoring service was a great asset, but we did have to ask for some articles that were not retrieved using the algorithm. This made it a time savings, but a certain level of duplicative work was necessary. We found that Google News was comprehensive and only one article of interest was not captured that was picked up by PressDisplay.com.
COMPARING NEWS SOURCES
How did relevant articles found in a traditional news database compare to an online news aggregator? To find out, we took 10 random articles identified as relevant for different time periods and searched Google News and LexisNexis for them. We found that the percentage of articles found in Google News increased over time. Results from the traditional news database were variable: In 2012, only five of the articles were found, compared to eight in Google News. These results confirm that no single source was sufficient. Ultimately we found it necessary to rely on several sources to capture relevant articles particularly for the period between 1980 and 2000.
Over time, we found that the percentage of articles picked up by one of these sources increased. For the 2012 sample, all 10 articles were found in either LexisNexis or Google News, compared to six in the period between 1980 and 2000.
We did not find any single source or approach to be sufficient. Time was not a limiting factor for our project, so we were able to search multiple sources and spend the necessary time screening duplicates. If time is an issue, a traditional news database that allows you to remove duplicates might be a wise choice, particularly if you are searching for historic records. When tracking current news on an ongoing basis, Google News and other free news aggregators might be efficient in terms of cost and time.
Newspapers owned by a parent company can pose a problem. The McClatchy Co. is the third-largest newspaper company in the United States, owning digital as well as print newspapers. As a result, the same article was often found in many different publications, but the headline could differ. We needed to obtain every article for our project, since we were tasked with identifying all news articles, but it was often unclear when they were duplicates.
Online newspapers are changing the landscape of news and creating issues that will need to be addressed in the future. Titles may change over time, and articles can disappear. With Google News, we would view the titles, link to the actual article, and find that the title had sometimes changed. Articles were usually similar, but it we found it difficult to maintain consistent bibliographic information. Online news articles could also vanish. They might be available when a search is conducted, but months later the article can no longer be found. We contacted the publisher of one online paper who told us that articles were not archived and were no longer available.
The overabundance of digital news today creates information overload. As information professionals, it is necessary to use our knowledge of and expertise in the best research methods to extract what is needed from the vast amount of news being created. We must continue to keep abreast of new methods for locating the digital as well as print news and excel at using all resources, not only traditional databases.
Our knowledge of advanced Boolean search strategies has always been a way to prove our value, but learning to use complex search features available through aggregators such as Google must be a skill we are constantly developing. We must realize that free resources, such as Google News, may eliminate the use of expensive traditional databases. If we do not evolve and enhance along with the technology, we run the risk of being seen as outdated and our skills devalued.
Linda Sabelhaus (firstname.lastname@example.org) is senior associate/librarian, ICF International. Michelle Cawley (email@example.com) is also senior associate/librarian at ICF International. Comments? Email the editor-in-chief (firstname.lastname@example.org).
|Printer friendly Cite/link Email Feedback|
|Author:||Sabelhaus, Linda; Cawley, Michelle|
|Article Type:||Cover story|
|Date:||Mar 1, 2013|
|Previous Article:||Yahoo! Image Search.|
|Next Article:||Bringing transparency to private company research: PrivCo and InfoArmy.|