NDSA:Storage ping

From DLF Wiki
Revision as of 14:18, 11 February 2016 by Dlfadm (talk | contribs) (2 revisions imported: Migrate NDSA content from Library of Congress)

Below are the notes from the Storage Ping/Bit Stability discussion:


The Challenge Challenge Workshop Technical Discussion Table Notes 7/20/2011

Storage Ping Characteristics:

  • Test for average, not maximum, latency
  • Test for bit integrity
  • Test for uptime of files
  • A more sophisticated version of a link checker, it would act as an extra audit for properly prompting redirects to URLS.
  • Runs against repositories, not specifically hardware
  • Similar to scrubbing, but occurs externally and uses statistical analysis, sampling the collection to determine availability of files.
  • Limited requests per day per collection to keep from negatively impacting the repository
  • Use fixity provided with submission of URL. If no fixity is provided, generate fixity upon first read of file and store that for later checks of file.
  • A transparent way to view the current status of stored bits

Next Steps:

  • Stephen Abrams to send document on related software he's working on to NDSA-Innovation listserv.
    • Here's a (belated) writeup of the idea I introduced during the July NDSA meeting. This is an exploration of what we're calling "repository neighborhood watch," an idea first batted around by my colleagues Trisha Cruse and John Kunze. The point is that what you (as a repository operator) say about your trustworthiness is less important than what your "neighbors" (i.e., repository customers) say about you. (Attached PDF: File:Neighborhood-watch-for-repository-QA.pdf) --Stephen.abrams 21:10, 8 September 2011 (UTC)
  • Mike Smourl to send material/document on related software to NDSA-Innovation listserv.
  • Group to refine idea, characteristics and model in advance of the August 2011 Curate Camp. Pass of plan to team member at Indiana University Libraries or CDL who will be attending so that team member can get feedback about the model.
  • Suggest what benchmarks would be reported on for participating repositories
  • Innovation Working Group to refine model based on comments at Curate Camp.
  • Present refined model to LC Storage Workshop in Washington, DC in September 2011.
  • Further refinement and development. Develop tool as well as benchmark participation agreement.
  • Potentially have workshop on Storage Ping tool at DLF or CNI in fall/winter 2011.

Benchmark Participation Agreement:

  • Guidelines for participating repositories
  • Register collection with URLs for participation in the Storage Ping
  • Provide URLs to fixity of files if available
  • Look at stability of software over time as well as stability of files

Discussion Notes:

The conversation began with a discussion about how a Bit Stability Challenge might function. While there was interest at the table in exploring the idea more, it was deemed better to start small with a less-controversial metric. The table determined that getting the repository population used to reporting was the appropriate initial objective. There was interest in engaging with repositories as well as storage service providers, however, the table decided that starting with repositories was the most productive plan.

Objectives of the challenge and reporting include:

  • Identifying benchmarks that will act as indicators of trust
  • Developing benchmarks that are able to be reproduced by the consumer
  • Encouraging transparency in metrics
  • Reporting on size of collection and length of time the collection has been successfully preserved.

Possible metrics that were discussed included:

  • Size of storage: number of bits preserved
  • Total bit years where bit year = bytes x length of time preserved
  • Effort over time
  • Relating the score of a repository not only to the stability of the storage but the overall cost of the storage approach.