When news sites shut down, those sites’ owners often don’t prioritize the preservation of the content.
MTV pulled down MTV News in June. After Deadspin was sold, many of its archives temporarily disappeared. This week, Flaming Hydra reported that The Awl’s archives are gone. And those examples are just from the past couple of months; in 2021, the authors of a Reynolds Journalism Institute report found that just 7 out of 24 newsrooms they interviewed were fully preserving their news content.
“It’s really kind of a web of responsibility in terms of creating an accurate record,” Talya Cooper, a research curation librarian at NYU and The Intercept’s former archivist, told me. “When you hear about something being shut down, it’s not just ‘Wow, all of this content is being lost.’ It’s also all of the content that is derived from this content — a key bedrock of evidence that could be used to verify a claim, or bolster someone’s career, or any number of things.” AI further complicates matters — what happens when sites are used to feed ChatGPT, then go offline? “What happens when that information is baked into large language models and the source of that information is not live on the web anymore?” Cooper wondered. “It’s kind of mind-boggling to think about, but it is reality for a lot of websites that have been crawled and had their content put into the blender of large language models. How will it be possible, in the future, to trace back some of the claims that will be made by ChatGPT if the content is no longer alive?”
When news sites’ archives disappear, readers aren’t the only ones who lose out — there are all kinds of personal and professional challenges for journalists, too. They’re left to archive their work on their own, so that they have clips to show the next job. Web pages, photographs, and text stories are easier to save than audio files, interactives, and other types of digital journalism; to preserve those, journalists often have to get creative. Paid personal archiving services are available, but “it’s not necessarily appealing when you’re just trying to look for a way to save something that was previously online for free,” one journalist told me.
I spoke with three journalists about how they’re going beyond the Wayback Machine to preserve their own work.