The first study, with Kendra Albert and Larry Lessig, focused on documents meant to endure indefinitely: links within scholarly papers, as found in the Harvard Law Review, and judicial opinions of the Supreme Court. We found that 50 percent of the links embedded in Court opinions since 1996, when the first hyperlink was used, no longer worked. And 75 percent of the links in the Harvard Law Review no longer worked.
People tend to overlook the decay of the modern web, when in fact these numbers are extraordinary—they represent a comprehensive breakdown in the chain of custody for facts. Libraries exist, and they still have books in them, but they aren’t stewarding a huge percentage of the information that people are linking to, including within formal, legal documents. No one is. The flexibility of the web—the very feature that makes it work, that had it eclipse CompuServe and other centrally organized networks—diffuses responsibility for this core societal function.
The problem isn’t just for academic articles and judicial opinions. With John Bowers and Clare Stanton, and the kind cooperation of The New York Times, I was able to analyze approximately 2 million externally facing links found in articles at nytimes.com since its inception in 1996. We found that 25 percent of deep links have rotted. (Deep links are links to specific content—think theatlantic.com/article, as opposed to just theatlantic.com.) The older the article, the less likely it is that the links work. If you go back to 1998, 72 percent of the links are dead. Overall, more than half of all articles in The New York Times that contain deep links have at least one rotted link.
Our studies are in line with others. As far back as 2001, a team at Princeton University studied the persistence of web references in scientific articles, finding that the raw number of URLs contained in academic articles was increasing but that many of the links were broken, including 53 percent of those in the articles they had collected from 1994. Thirteen years later, six researchers created a data set of more than 3.5 million scholarly articles about science, technology, and medicine, and determined that one in five no longer points to its originally intended source. In 2016, an analysis with the same data set found that 75 percent of all references had drifted.
Of course, there’s a keenly related problem of permanency for much of what’s online. People communicate in ways that feel ephemeral and let their guard down commensurately, only to find that a Facebook comment can stick around forever. The upshot is the worst of both worlds: Some information sticks around when it shouldn’t, while other information vanishes when it should remain.