NDSA:Tuesday, November 22, 2011: Difference between revisions

Newer edit →

Revision as of 15:21, 23 November 2011

Infrastructure WG Call – November 22, 2011, 4pm-5pm

The next call will be scheduled soon and will occur before the late-December holidays.

In attendance:

Micah Altman
Jefferson Bailey
Priscilla Caplan
Karen Cariani
Dean Farrell
Bill Kehoe
Cal Lee
Trevor Owens
Mike Smorul
Cory Snavely
John Spencer

Infrastructure Survey Results

[Priscilla will be presenting on this survey at CNI)

Feedback:

The larger you are the less likely you are to have the same system for preservation and access. Do we want to ask why is that? The many different access requirements people have with large collections drive the need to have many different types of systems.
It is notable that 91% want indefinite preservation, but only 56% plan on TRAC compliance. There is a contrast (or irony) there.
We should emphasize -- what are more than half these organizations doing and what numbers show the sort of practices emerging as common?
To what extent are the results driven by assumptions of indefinite responsibilities (or responsibilities into an indefinite future)?
A vast majority recognize that copies in multiple locations & checking fixity are things they need to do.
A large number claim a strong preference to maintain local control but some only have one copy of data (which contrasts with the idea of indefinite responsibility).
Keeping one copy is widespread in the commercial sector.
Other contrasts are that 82% keep SOME stuff in multiple locations, but only 7% keep ALL their stuff in multiple locations – what is it that is making this more difficult than it needs to be?
Similarly, a majority (80%) are doing fixity checks, but only some are doing regular fixity checks (others probably only at ingest or use). If this was 3 years ago would that be the case? Most do fixity locally “because they do it better – managing technical preservation.”
The high percentage of fixity checking is a success, even if the details (on which copies (master, access, backup) and at which points in the lifecycle) are unclear.
Is a lack of “systematic” or programmatic replication and ongoing/automated fixity driving the numbers behind a preference for local control? There is a theme of “mistrust” or uncertainty running through the numbers regarding local control and interest in third-party options.
Could some of the institutions not doing fixity check using LOCKS and therefore they think they are not doing it? Can check observational data and see if that’s true.
Overall the results could use some validity checking by comparing and data edits.

Actions:

Analyze some trends by institution type to get more clarity of which types of organizations are doing what; cross-tabulation and data editing consistency. Drill into data for 3 or 4 key points. Can be done over email. Micah offered; Jefferson can help. Look at patterns – data editing consistency checks.
Get a list of non-respondents and press for their input. John Spenser and Micah both offered. Trevor and Jefferson can help.
Create a potential smaller list of trimmed down questions for use in the broader community in the future.
Clear up the role of fixity checks in the LOCKSS environment and how that impacts the numbers.

Open-Source Software (OSS) Use Cases

Feedback:

The idea is to develop guidance for making decisions for building, for example, 25 questions to ask before starting an OSS project, breaking down what the decision process is, or creating a decision tree.
It will be helpful to differentiate between the many different types of activities or projects that can involve OSS; this makes it clear that “open-source” is not one thing and makes it clear that what I should do next is dependent on my use case.
This could take the form of a set of recommendations or checklist questions when making a tool part of your enterprise structure.
Goal is to have a set of recommendation or check list of questions about when you should think about incorporating or embarking on open source software for digital preservation.
We need to clearly articulate how OSS use (or how what we produce) ties into overall digital preservation activities and management.
A possible way to maintain the relevance to digital preservation is to approach OSS not as its own “tree,” but as a branch of the larger decision tree involved in digital preservation infrastructure – so that OSS is just one of many options (some proprietary) in the broader digital preservation administrative ecosystem.
Concurrently, some of the questions in Andrea's Use Case document (and potentially in what the group produces) have weight even beyond the OSS aspect.
What we produce should reinforce the connection between openness of software and the ability to more easily offer long-term access. We will need to identify the connections between open standards and preservation that we can tease out.
We will need to stay cognizant of the “mix and match” aspect to storage infrastructure – OSS may just be a piece in a larger puzzle (or a branch of a larger tree, as noted above).
If NDSA is the face of digital preservation and incorporating commercial and non profits, then the decision tree should be about making decisions regardless of where funding comes from. What are economic factors involved in choosing, commercial or not. Could this be part of something larger?

Actions:

Review the Use Case document from Andrea and the Code4Lib guide on this topic and discuss over email.
Circulate over the list the kinds of things we might end up with, such as example of decision trees or guides or other ideas for ways the end results can be organized, presented, and used.