Printer Friendly

Topsy's different way to the web.

I was reluctant to undertake this review because it deals with Twitter, which some say is inhabited by twits. Why should I subject my readers to drivel from celebs and other nonfamous narcissists? But I hasten to add, it's not all about Twitter. I actually came to the topic indirectly through a curiosity about Topsy.

Topsy is a social search engine, which is sharply distinguished from the prevailing search engine type represented by Google and its imitators. The Google model is so dominant for web search that it's easy to overlook other, quite different models. The question is then, "Can these other search engine types provide useful and interesting information that the Google model does not?" From this perspective, Topsy is not just a way to get at solipsistic ramblings; instead, it is a different window into the web.

Social Searching for Tweets

Topsy is one of a group of social search engines that appeared about 3 years ago. Although the talk was about "social search" generically, in practice, the focus was on Twitter. By that time, Twitter was enormous in several respects: in its volume of tweets, in its high public profile, and finally in the presumptive value of the collected tweets as a searchable, analyzable database.

The freshman class of social search engines has had a high dropout rate, succumbing to underfunding and flawed business models. Once-promising names such as Collecta, OneRiot, and Scoopler are gone.

Google even abandoned its own social search engine last summer. That leaves Topsy as the last social search engine standing and as the biggest and best window into a billion tweets every week.

You might ask, "Doesn't Twitter have its own search engine?" Yes, it does ... sort of. Twitter's own search engine has a simple and effective basic level and a multifeatured advanced level. But it doesn't do retrospective, which is crucial; it only searches the past week or so of tweets. This may work for the latest pop culture mania or celebrity dysfunction, but it simply won't suffice for the longitudinal analysis that will data mine Twitter's rich lodes.

Citation Analysis Revisited

Topsy, which launched in 2009, was developed by a team of web entrepreneurs. The public search engine is actually a demonstration site for Topsy's search technology, which has several enterprise applications. The company is investor-backed, which has pleasant results, one of which is that it has no ads.

Topsy's basic principle is quite different from that of Google and its followers. For them, the basic unit for search is the webpage; for Topsy, the basic unit for search is the web conversation. This produces different results, not only in the form of the tweets but also for links that often accompany them. In addition to simply retrieving tweets based on term matches, Topsy applies algorithms to rank tweets and links based on their importance. One such method is the volume of retweeting; another is the influence of the tweeter, based upon the volume of retweeting of the person's tweets (twits do a lot of retweeting).

In essence, Topsy's method is similar to Google's PageRank, which measures incoming links to a page; both measure relevance, or importance, based upon how much attention a tweet or page gets from others. Both are actually types of citation analysis, which has been a principle metric for research literature for hundreds of years.


More Tweets Than Anybody Else

Topsy does not search the entire content set of tweets. Its searches date from May 2008, while Twitter started in 2006. And Topsy doesn't retrieve all tweets. Some are omitted because they fall below Topsy's minimum threshold for importance. If you do side-by-side comparison searches in Twitter and Topsy, you'll find that sometimes a few tweets are removed, and other times larger batches are removed. The algorithm generally works well at omitting inconsequential items, but on occasion, it seems arbitrary.

In keeping with its identity as a real-time search engine, Topsy is updated almost as quickly as Twitter itself, with tweets appearing in Topsy soon after their original posting.


Parsing the Social Web

Topsy analyzes the different components of a tweet--tweet, tweeter, links, and retweets--to provide several search and display options. There is a simple "fill in the panel" basic search and advanced search that has Boolean operators, phrase searching, and searching by Twitter hashtag, site, domain, tweeter, date, or format (tweets, links, photos, and videos). Results can be displayed in relevance or date order and limited by language, time, and record content, including limiting to the tweet itself or to accompanying links.

Topsy's ability to highlight tweeted links (as opposed to a short tweet message) is highly important because many tweets have links. To use the iceberg model, the smaller, visible portion of Twitter shows vacuous, barely literate ramblings by celebs and other twits, while the much larger but less apparent portion contains a large body of useful and interesting links. Many of these are undiscoverable by Google-type search engines; if they are indexed, they may appear so far down in the ranking that they are, in effect, invisible. Thus, even if you have no interest in the content of the tweet portion of a tweet, Topsy is a powerful alternate way to discover valuable webpages.

Twitter has a good advanced search option. It shares common advanced search features with Topsy, including Boolean operators, phrase searching, and date ranging. It also has a few options that Topsy lacks, including a form of "opinion" searching, based upon positive or negative emoticons in the tweet (twits are big on emoticons). Results can be displayed by date or limited to the top tweets, which is a form of relevance ranking.

Of course, Twitter's great weakness is that it shows only a tiny retrospective window. Topsy's vastly larger tweet database is of strong interest to its enterprise clients, who need much longer trend analyses. Topsy's ability to focus on links is another advantage over Twitter's own search engine. Finally, Topsy's relevance-ranking algorithms generally produce better results than Twitter's.

Topsy Versus Google

Of course, the key matchup isn't Topsy versus Twitter; it's Topsy versus Google. Google is so successful and genuinely useful that it's easy to forget how much web content--including items of undoubted value--that it doesn't reveal. Topsy's conversation search model produces different results from Google's page search model, including information that's arguably of comparable interest and usefulness.

Last fall, Topsy extended its search to include posts from Google+ (this is still in beta). Like Twitter, Google+ is a conversation medium that is not covered by page-based search engines. Also like Twitter, Google+ conversations are enriched by web content that's not easily discovered by page-based search engines. Topsy search and retrieval from Google+ is yet another window into good stuff on the web. If you want a genuine alternative to Google, forget Bing or; try Topsy.



Topsy ( is a social search engine that searches and analyzes content in Twitter. Unlike Twitter's own search, which only reaches back several days, Topsy searches tweets dating from 2008, with real-time updating. Topsy has several search features that allow productive analysis of the tweet database, including access to the useful links that often accompany the tweets. Because its search methods and content base are so different from those used by Google and its imitators, Topsy is a fascinating and compelling alternative to them.

Mick O'Leary is the director of the library at Frederick Community College in Frederick, Md. His email address is Send your comments about this column to
COPYRIGHT 2012 Information Today, Inc.
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2012 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:DATABASE REVIEW
Author:O'Leary, Mick
Publication:Information Today
Geographic Code:1USA
Date:Jul 1, 2012
Previous Article:'5 Alternative search engines that respect your privacy'.
Next Article:Overshadowing library services.

Terms of use | Privacy policy | Copyright © 2019 Farlex, Inc. | Feedback | For webmasters