When good links go bad.
--Gareth Branwyn, Jargon Watch, Wired, July 1, 1996
The amusing and edifying resource Word Spy identifies the above quotation as the earliest citation of the term "link rot" (wordspy.com/words/linkrot.asp). So this is an issue that has plagued the World Wide Web since its earliest days.
Now, almost 2 decades in, link rot has ballooned into a major problem, far beyond the simple annoyance of being confronted with a 404-error page (wordspy.com/words/404 .asp). It is written about regularly in scholarly journals--even well outside the purview of library and information science. You'll find a lot of link rot-focused articles in legal journals, sometimes with amusing titles ("Something Rotten in the State of Legal Citation: The Life Span of a United States Supreme Court Citation Containing an Internet Link (1996-2010)" from the Yale Journal of Law & Technology, digitalcommons .law.yale.edu/yjolt/vol15/iss2/2).
The actual problem is not at all amusing. According to "Perma: Scoping and Addressing the Problem of Link and Reference Rot in Legal Citations," a recent Harvard Law School study (non-paywall working paper version at papers.ssrn.com/sol3/pa pers.cfm?abstract_id=2329161), "[MJore than 70% of the URLs within the Harvard Law Review and other journals, and 50% of the URLs found within United States Supreme Court opinions, do not link to the originally cited information." Imagine the frustration for the legal researcher and the ramifications for everyone involved.
Link rot also plagues the corpus of scientific literature, which is worrisome because the advancement of science is so dependent upon earlier research. In "A Cross Disciplinary Study of Link Decay and the Effectiveness of Mitigation Techniques," which appeared in the journal BMC Bioinformatics in 2013 (biomedcen tral.com/1471-2105/14/S14/S5), the authors wrote:
We accessed 14,489 unique web pages found in the abstracts within Thomson Reuters' Web of Science citation index that were published between 1996 and 2010 and found that the median lifespan of these web pages was 9.3 years with 62% of them being archived.
Articles in other scholarly journals also addressed the problem:
* "Towards Robust Hyperlinks for Web-Based Scholarly Communication" (Intelligent Computer Mathematics; DOI: 10.1007/978-3-319-08434-3_2)
* "Uniform Resource Locator Decay in Dermatology Journals: Author Attitudes and Preservation Practices" (Archives of Dermatology, ncbi .nlm.nih.gov/pubmed/16983002)
* "Accessibility of Internet References in Annals of Emergency Medicine: Is It Time to Require Archiving?" (Annals of Emergency Medicine; ncbi.nlm .nih.gov/pubmed/17276549)
* "'Link Rot' Limits the Usefulness of Web-Based Educational Materials in Biochemistry and Molecular Biology" (Biochemistry and Molecular Biology Education-, DOI: 10.1002/bmb .2003.494031010165)
The scholarly community has attempted to address the transient nature of URLs via the digital object identifier (DOI) system (doi.org) and Persistent Uniform Resource Locators (PURL; purl.oclc.org/docs/index .html). But keep in mind that these are useful only if the item in question has changed location but is still actually online somewhere.
What causes link rot? You know this. Content gets renamed/relocated/removed. Websites get redesigned or disappear entirely. If you search the internet for a living, you know what to try when you encounter a bad link:
* Check that you've typed the URL correctly. (Duh.)
* If you've clicked a link that turns out to be bad, mouse over the link and scrutinize the URL to see if something looks funky.
* Put the title of the document you're looking for--surrounded by quotation marks--into your general web search engine of choice. Maybe try more than one.
* Try searching for the item on the homepage of the website. Most have site search engines.
That being said, most website search engines are terrible--and I'm sorry, because I come from the world of journalism--but newspaper sites tend to be the worst. I usually resort to Google; the advanced search form (which is more or less hidden, so bookmark google.com/advanced_ search) allows you to restrict your search to a single domain. You'll also find other options, including the ability to restrict your search to a single document format--PDF, for example--which is useful if you're looking for a paper or report. Actually, there are a whole bunch of operators/parameters that you can use directly in the main Google search box (sans.org/security-resources/Goo gleCheatSheet.pdf), but I can never remember them.
If you're a serious internet researcher, you know all about the Internet Archive's Wayback Machine (archive.org/web), which provides links to no-longer-existing earlier versions of webpages. This thing is truly amazing; the database currently comprises more than 430 billion webpages. That sounds like a lot until you consider that 571 new websites are created every 60 seconds, at least according to a couple-of-years-old infographic that is still floating around in cyberspace (mash able.com/2012/06/22/data-createdevery-minute).
One of its small but highly useful features is "the ability to archive a page instantly and get back a permanent URL for that page in the Wayback Machine," which allows anybody to create a stable URL for future citation. Just enter the URL of the page you want archived into the Save Page text box in the lower right corner of the homepage and click Save Page.
Finally, if you create content for the web, be part of the solution to link rot rather than exacerbating the problem. The Journalist's Resource, a project of Harvard Kennedy School's Shorenstein Center on Media, Politics and Public Policy and the Carnegie-Knight Initiative on the Future of Journalism Education, offered some best practices in a recent article about "the growing problem of Internet 'link rot'" (journal istsresource.org/studies/society/ internet/website-linking-best-practices-media-online-publishers). In brief:
* Include only essential links. Overlinking means more chances for URLs to break.
* Choose linking text carefully; links of two to five words are ideal.
* If possible, link to landing pages rather than PDFs.
* Use the most compact/direct URL available, and generally avoid link shorteners.
* Verify links after publication, and check them regularly.
* Other sites may link to your content, so maintain stable URLs.
Shirley Duglin Kennedy is the research editor for the Center for Deployment Psychology in North Bethesda, Md., and is editor of the blog FullTextReports. Send your comments about this column to email@example.com.
|Printer friendly Cite/link Email Feedback|
|Title Annotation:||INTERNET WAVES; impacts of errors in hyperlink navigation|
|Author:||Kennedy, Shirley Duglin|
|Date:||Nov 1, 2014|
|Previous Article:||Executive profile.|
|Next Article:||Information Today, Inc. conferences.|