NDSA:Digital Preservation X-Challenges

From DLF Wiki
Revision as of 15:17, 11 February 2016 by Dlfadm (talk | contribs) (10 revisions imported: Migrate NDSA content from Library of Congress)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

This is where this action team will make its plans.

Action Team Members

  • Micah Beck, University of Tennessee
  • Jane Mandelbaum, Library of Congress
  • Dean Farrel, State Library of North Carolina
  • James S. Plank, University of Tennessee
  • John Spencer, BMS/Chace
  • Micah Altman, Harvard University
  • Mike Smorul, University of Maryland

Project Overview

This group will plan and launch a set of challenges and/or prizes in to spur innovation in digital preservation. Group members will focus on defining the challenges, promoting them, and exploring ways to identify funding to support challenges if it is determined that funding would be appropriate. The action team will communicate over email and report their work on their wiki page. Email Jane Mandelbaum (jman@loc.gov) if you would like to participate.

Examples of different prize/challenge models

Potential Grand Challenges

BitStab

Develop and promote a specific technical competition, analogous to the Top 500 supercomputer ranking (http://www.top500.org), but more relevant to Digital Preservation. Learning from the Top 500 experience, such a competition requires a good metric (in the case of Top 500 the Linpack benchmark) which is widely understood and accepted, and which is not too difficult/expensive to implement nor too easy to game. Then it requires buy-in from a community that is wide acknowledged to include "the best contenders."Proposal for such a metric: bulk bit stability (stable bit-years). For some definition of "without change", we simply ask for evidence of the product of how much data is stored without change (stable bits) and the length of time it was stored for.

Our ability to preserve and interpret digital objects rests on underlying storage facilities that can accurately maintain the bit patterns that represent them. The goal of the Bit Stability (BitStab) Challenge is to define a metric for the size and duration of stored data that meets specified characteristics including accuracy and accessibility, to facilitate the measurement of this metric by data archives, and to publicize the results. The intent is two-fold: 1) to bring glory and publicity to the most successful data archives, and 2) to measure and understand the realistic characteristics of the most massive data archives. The pursuit of the second goal in particular will require that the metrics used will be developed in cooperation with the operators of data archives and the developers and vendors of storage technologies.

The first step, which will be discussed at the Digital Preservation Challenges workshop at the upcoming NDIIPP/NDSA Partners meeting, will be to organize efforts to define one or more metrics that could be used as the basis for ranking the most massive data archives. Working group co-chair and action team leader Micah Beck has written up some notes on his proposed approach to the problem. He proposed the development of "service level agreements (SLAs)" for access to archived data, and then simply measuring bit-years maintained under a specific SLA. In such a framework, the key problem is to define appropriate SLA to match the intended characteristics of a particular class of data archives. Beck's notes on the subject can be found here .

Real-World Reliability of Long-Term Digital Storage

How to assess and quantify the real-world reliability of long-term digital storage, when in the long term dominant threats are likely to be economic, and organizational?

Format migration

Format migration remains a central technical strategy for digital preservation, but creates a risk of loss of information in formatting. A grand challenge might be to identify a process for verifying that the semantic content of digital objects in different forms is (approximately) the same. Note that industry has made much more progress in practical application of semantic fingerprints (primarily for DRM) and similar technologies for this -- but the aims differ somewhat.

Other Potential Challenge Issues

Other grand challenge problems seem less technical to me. Rereading the Blue Ribbon Task Force report suggests that selection criteria appropriate for the data deluge; realistic cost models for long-term preservation activities; business models for funding preservation and access to public goods; and legal strategies for enabling long term access in the face of short-term copyright, IP and confidentiality restrictions are central challenges.