NDSA:Discussions on Preservation Storage Topics: Difference between revisions

From DLF Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 1: Line 1:
== Statement of Purpose ==
==Statement of Purpose==
The Infrastructure Working Group, in February 2012, initiated a series of open conversations on detailed aspects of preservation storage. These conversations are conducted over the listserv and each topic is discussed over the course of 2-3 weeks. A list of future, potential discussion topics is maintained at bottom and can be augmented by group members. This page serves to capture the content of those conversations for further elaboration by group members.
The Infrastructure Working Group, in February 2012, initiated a series of open conversations on detailed aspects of preservation storage. These conversations are conducted over the listserv and each topic is discussed over the course of 2-3 weeks. A list of future, potential discussion topics is maintained at bottom and can be augmented by group members. This page serves to capture the content of those conversations for further elaboration by group members.


== Topic 1: Encryption ==
==Topic 1: Encryption==
'''Do you have any opinions on it? What are your reasons for your opinions (gut feelings are OK)?'''
'''Do you have any opinions on it? What are your reasons for your opinions (gut feelings are OK)?'''


Line 46: Line 46:
*No (multiple respondents)
*No (multiple respondents)


== Potential Future Discussion Topics ==
==Potential Future Discussion Topics==
Preservation Policies
Preservation Policies
* number of copies
* number of copies

Revision as of 11:58, 24 February 2012

Statement of Purpose

The Infrastructure Working Group, in February 2012, initiated a series of open conversations on detailed aspects of preservation storage. These conversations are conducted over the listserv and each topic is discussed over the course of 2-3 weeks. A list of future, potential discussion topics is maintained at bottom and can be augmented by group members. This page serves to capture the content of those conversations for further elaboration by group members.

Topic 1: Encryption

Do you have any opinions on it? What are your reasons for your opinions (gut feelings are OK)?

  • The majority of the preservation data we deliver to our clients is stored on LTO data tapes - without encryption. We do use WORM capability if the client is OK with it. Our reasons are mainly based on the the assumption that we do not have any control over who can access the tape, now or in the future, and staffing changes might stifle the client's ability to recover the preservation files ("now where did the last person put the list of encryption keys?")
  • Pros:
    • Strong encryption eliminates worries associated with unauthorized access to preservation copies of materials (such as copyrighted data).
    • Encryption doubles as an authenticity check, and in fact, some encryption methods involve the creation of a digital signature that can be used for provenance or bit rot detection.
  • Cons:
    • Encryption causes file size bloat to the tune of 20-30%.
    • For light archives, encryption imparts a performance penalty for systems that need to extract the content from the preservation archive for access purposes.
  • Duracloud's approach to encryption is in response to what consumers of cloud storage are requesting. The number one concern is over unauthorized access.

What kinds of problems do you think it might create in the future?

  • See above. We're most concerned that staffing issues combined with object-based vault management infrastructures in place could lead to problems. Certainly not saying that is the best rationale, but it is based on current reality.
  • As mentioned, preservation of the encryption keys is typically raised as a long-term concern (see below)
  • Format obsolescence and the need for migration is of equal concern with encryption formats as it is with data storage formats themselves.
  • Similar to the Tivoli Storage Manager example, many cloud storage consumers want to separate the responsibilities of data security from that of storage by uploading already encrypted content. However, the burden of client-side encryption poses a barrier to some.

Do you have any current requirements to do this (laws, policies)? What are the conditions under which you need to encrypt? Do you know of any upcoming requirements for you to do this?

  • No (multiple respondents)

If you do it what technique(s)/strategies do you use? Do you isolate encrypted content from non-encrypted content?

  • Duracould, as a provider, is developing tools to accommodate a number of scenarios:
    • client does all encryption and key management themselves
    • client manages keys, but provides them to upload tooling to encrypt content prior to transit
    • client wants persisted content to be encrypted, but would rather not deal with key management or the encryption process
    • layered on top of these scenarios is the consideration of the contents' usability within the storage system; such as indexing or metadata extraction.
  • We use Tivoli Storage Manager (TSM) client-side encryption for our tape backups. We do not operate the tape backup system we use, so we want to isolate our data security from the tape backup system environment.
  • Encryption is done for security purposes - tracking is done w/ barcodes entered into the preservation metadata database, and vault system databases do not usually "refresh" with each other, so the 2 are not in sync. Again, not saying by any stretch this should/ could be considered "best practices" - it's just what typically happens. Refresh cycles (while another topic) are also a problem in this environment, as there is typically less interest in subsequent updates of the data tapes that have not "recouped" their initial cost of creation. I guess I'm trying to convey there is more interest in keeping the 10-20% of backups that have been profitable than a consistent policy that treats all digital preservation files equally.

On decryption keys

  • Long-term secure preservation of the decryption keys themselves is typically raised as a concern, although personally I feel that solutions to this problem are straightforward, albeit complex. I view this as a compound problem that requires a combination of preservation storage principles and security principles to solve:
      • Preservation storage - There have to be multiple copies of the keys (existing framework of geographic distribution should facilitate this)
    • Security storage - The keys themselves obviously have to be secured in some way. This can be done with either additional encryption or physical security, ie a locking safe, or both. The key point is that this chain ultimately ends in human knowledge, i.e., people have to know secrets. The trick is ensuring that enough people know enough secrets to eventually lead to the encryption keys. Providing office staff at multiple sites with combinations to safes that contain the encrypted encryption keys that a more privileged group of repository administrators know the secret for is an example of adding multiple layers into the scheme.
    • Geographic redundancy mitigates decryption key disaster planning
  • Security risk can never be zero, but that the risk can be brought into an acceptable range with a scheme that is well-thought-out by existing digital preservation technology and policy frameworks.

Do you know of any relevant studies/papers, etc. about this topic?

  • No (multiple respondents)

Potential Future Discussion Topics

Preservation Policies

  • number of copies
  • bit integrity check frequency
  • storage hierarchy

Emerging Storage Technology

  • data reduction/de-duplication
  • device encryption
  • cloud providers
  • WORM devices
  • federated clusters

Decision Factors

  • collection size
  • budget
  • development resources