As early as July 2016, the Internet Archive began preparing for the anticipated loss of US Government information that would surely result from the presidential transition on January 20th. Using tools developed by the excellent digital repositories team at the University of North Texas, Internet Archive crawled all websites in the .gov domain and harvested their content for permanent preservation on the Wayback Machine. Additionally, they invited librarians, researchers, and academics from around the country to nominate sites outside of the .gov range (social media feeds, .com and .edu sites, etc.) for inclusion in the harvest. In all, they planned to collect webpages from 6,000 government domains, over 200,000 hosts, and feeds from around 10,000 official federal social media accounts.
It seems that work is already proving its value.
Just hours after assuming office on Friday, the Trump Administration removed a number of documents from agency websites without offering prior warning or alternative means of access. The New York Times reported on a handful, including the Depart of Labor’s report on Advancing LGBT Workplace Rights (formerly at https://www.dol.gov/asp/policy-development/lgbt-report.pdf) and a mass of Whitehouse.gov material relating to climate change and foreign policy. Thankfully that information was recovered. The LGBT report is now at https://archive.org/details/lgbt-report, and the Obama Administrations climate change policy is preserved on any one of the 24 thousand snapshots the Wayback Machine made of Whitehouse.gov in the month preceding the inauguration. To service specific information needs pertaining to presidential transitions, the “End of Term” Project Team has set up a special web portal for searching and browsing government information that has been collected.
I want to make clear that the disappearance of government information during executive transition is by no means limited to this most recent transference of power. In the transition between the 2008 and the 2012 Obama Administrations, 83% of all .gov pdfs were switched out, updated, or removed, with no comprehensive effort made to preserve them. While the examples of document disappearance we witnessed over the weekend were clearly politically-motivated, there is a much more massive and in many ways more important loss of information that is occurring on government websites every day caused nothing more than simple administrative carelessness. We owe a tremendous debt to civically-driven institutions like Internet Archive and University of North Texas Libraries, which have accepted the responsibility and expense of preserving the US Government’s web-based information, even as that same government fails to fulfill this service for itself.