Data Discovery, Preservation, Tracking

Where is that data (now)?

Whether intentionally orchestrated by political actors or due to a shortsighted technical mishap, the loss of important research data is unfortunately not a new phenomenon. Recent expunging and “alteration” of U.S. Federal Agency data to accommodate a political viewpoint demonstrate the vulnerability of our most trusted information resources. Such actions erode scholarly provenance, interrupt current research, impede sharing efforts, and stifle future innovation.

Our goals are to raise awareness of the need for public data access and preservation, to involve subject experts in identifying at-risk data, encourage personal protection of research data, and support research reproducibility.

While it is impossible to hope that all data will be preserved in some form for ongoing research, there are several efforts underway that are capturing important data and online information. We encourage you to review, and potentially participate in, the ongoing community efforts to preserve crucial data.

If you have any questions or encounter lost data sources, please contact the Publishing and Data Services team.

Community Efforts

Looking for data? Want to contribute data to save? Below are relevant community efforts to help locate previously public data.

  • Data Rescue Project – The Data Rescue Project is a coordinated effort among a group of data organizations with efforts for data gathering, data curation and cleaning, data cataloging, and providing sustained access and distribution of data assets.
    • Data Rescue Tracker – The tracker provides an overview of who is downloading which dataset from which government websites. If you are looking for a specific dataset, use the Downloads column to see if it has been captured.
    • Data Rescue Tracker Download Submission Form – Use this submission form to nominate data to save.
  • Data Liberation Project – The Data Liberation Project is an initiative to identify, obtain, reformat, clean, document, publish, and disseminate government datasets of public interest.
  • DataLumos – DataLumos is an ICPSR archive for valuable government data resources.
  • CAFE Dataverse Collection – This Climate and Health Research Coordinating Center (CAFÉ) Dataverse sub-collection stores critical climate and health datasets.
  • Find Lost Data – Find Lost Data provides a search tool across several data archive/rescue sites, including CDC, Harvard Dataverse, Data Rescue Project, and Harvard LiL Data.Gov mirror.

Documenting Data Preservation

Remember, just because you found data to track, saving it is not enough. Use the resources below to ensue the data is in accessible formats, and well-described with metadata to facilitate long-term access and reusability.

  • Curating for Data Rescue – Data Curation Network (curators making data more ethical, reusable, and understandable) advice on preserving data.
  • Checklist for USA Federal Data Backups – Checklist from MIT provides steps you can take to ensure the government data you use in your research remains accessible to you and others.

Additional Guidance for Data Discovery

Use the following resources to locate data collections maintained by Harvard or other entities. Have questions? We are here to help!