NDSA:Discussions on Preservation Storage Topics: Difference between revisions
No edit summary |
|||
Line 5: | Line 5: | ||
===Questions=== | ===Questions=== | ||
The discussion on compression can be found at [[NDSA:Preservation Storage Topic 2: Compression]] | |||
==Topic 1: Encryption== | ==Topic 1: Encryption== |
Revision as of 10:05, 28 March 2012
Statement of Purpose
The Infrastructure Working Group, in February 2012, initiated a series of open conversations on detailed aspects of preservation storage. These conversations are conducted over the listserv and each topic is discussed over the course of 2-3 weeks. A list of future, potential discussion topics is maintained at bottom and can be augmented by group members. This page serves to capture the content of those conversations for further elaboration by group members.
Topic 2: Compression
Questions
The discussion on compression can be found at NDSA:Preservation Storage Topic 2: Compression
Topic 1: Encryption
Questions
Do you have any opinions on it? What are your reasons for your opinions (gut feelings are OK)?
- The majority of the preservation data we deliver to our clients is stored on LTO data tapes - without encryption. We do use WORM capability if the client is OK with it. Our reasons are mainly based on the the assumption that we do not have any control over who can access the tape, now or in the future, and staffing changes might stifle the client's ability to recover the preservation files ("now where did the last person put the list of encryption keys?")
- Pros:
- Strong encryption eliminates worries associated with unauthorized access to preservation copies of materials (such as copyrighted data).
- Encryption doubles as an authenticity check, and in fact, some encryption methods involve the creation of a digital signature that can be used for provenance or bit rot detection.
- Cons:
- Encryption causes file size bloat to the tune of 20-30%.
- For light archives, encryption imparts a performance penalty for systems that need to extract the content from the preservation archive for access purposes.
- Duracloud's approach to encryption is in response to what consumers of cloud storage are requesting. The number one concern is over unauthorized access.
- There is a surfeit of advice promoting a natural wariness towards encryption, though no known studies have addressed it specifically. This conventional wisdom of avoidance is mostly likely driven by the security risk of losing keys (as mentioned above) and the challenge it poses to access, especially to those that may have a legitimate reason (or authorization) to access the data.
What kinds of problems do you think it might create in the future?
- See above. We're most concerned that staffing issues combined with object-based vault management infrastructures in place could lead to problems. Certainly not saying that is the best rationale, but it is based on current reality.
- As mentioned, preservation of the encryption keys is typically raised as a long-term concern (see below)
- Format obsolescence and the need for migration is of equal concern with encryption formats as it is with data storage formats themselves.
- Similar to the Tivoli Storage Manager example, many cloud storage consumers want to separate the responsibilities of data security from that of storage by uploading already encrypted content. However, the burden of client-side encryption poses a barrier to some.
- There is the potential for encryption requirements to force a revision of the architectural designs of preservation repositories. If ingesting and preserving content with potential HRCI (risk confidential information), PII (personally identifiable information), or other sensitive/private information, institutional policies or legal requirements may dictate security policies. This can determine encryption requirements which, in turn, can necessitiate the use of specific storage media and architecture (see note below on encrypting disk vs. encrypting tape).
- The aforementioned wariness is also likely caused by the uncertainty regarding its complication of format migrations, data mining, and other automated preservation functions. That added layer of complexity is itself an additional preservation risk.
- A key concern is that encryption will overcomplicate legitimate access to content.
Do you have any current requirements to do this (laws, policies)? What are the conditions under which you need to encrypt? Do you know of any upcoming requirements for you to do this?
- What started us having to consider encryption in our preservation repository was our email archiving project. We'll be preserving email with permanent scholarly value. Email is the first type of content we are preserving that could potentially have HRCI (high risk confidential information) or even just sensitive/private information. Because of its potential privacy issues and sheer quantity, we are treating all email that comes into the repository as potentially sensitive. Its range of potential sensitive/private information means that we are subject to the university security policy regarding HRCI/personal/private information as well as all relevant state and federal laws (HIPAA, FERPA, MA state encryption law, etc.) for this content.
- While the above requirements caused us to revise our policies & architectural design, it means that we will be able to accept sensitive content of any type (beyond email) when we are done.
- We received mixed advice regarding software vs. hardware encryption. We were told software encryption solutions were immature (performance problems and worse) and that hardware encryption was the way to go. Some of our system administrators looked at the encryption offerings and found some big drawbacks not even considering effect on preservation (expense mainly but also having to manage a couple of encryption key management devices).
- We have since come to the conclusion that we are not required to encrypt this content on storage disks, because we are taking other measures (private network address space, local firewalls, periodic penetration tests, encryption on transport, etc.). But, if we use tape as part of the storage solution we will have to encrypt the tapes. We are replacing the DRS storage system this year so, in part because of this encryption requirement, we are considering an all-disk solution (up-to-now we have always included 2 tape copies. along with disk storage).
- No (other respondents)
If you do it what technique(s)/strategies do you use? Do you isolate encrypted content from non-encrypted content?
- Duracloud, as a provider, is developing tools to accommodate a number of scenarios:
- client does all encryption and key management themselves
- client manages keys, but provides them to upload tooling to encrypt content prior to transit
- client wants persisted content to be encrypted, but would rather not deal with key management or the encryption process
- layered on top of these scenarios is the consideration of the contents' usability within the storage system; such as indexing or metadata extraction.
- We use Tivoli Storage Manager (TSM) client-side encryption for our tape backups. We do not operate the tape backup system we use, so we want to isolate our data security from the tape backup system environment.
- Encryption is done for security purposes - tracking is done w/ barcodes entered into the preservation metadata database, and vault system databases do not usually "refresh" with each other, so the 2 are not in sync. Again, not saying by any stretch this should/ could be considered "best practices" - it's just what typically happens. Refresh cycles (while another topic) are also a problem in this environment, as there is typically less interest in subsequent updates of the data tapes that have not "recouped" their initial cost of creation. I guess I'm trying to convey there is more interest in keeping the 10-20% of backups that have been profitable than a consistent policy that treats all digital preservation files equally.
On decryption keys
- Long-term secure preservation of the decryption keys themselves is typically raised as a concern, although personally I feel that solutions to this problem are straightforward, albeit complex. I view this as a compound problem that requires a combination of preservation storage principles and security principles to solve:
- Preservation storage - There have to be multiple copies of the keys (existing framework of geographic distribution should facilitate this)
- Security storage - The keys themselves obviously have to be secured in some way. This can be done with either additional encryption or physical security, ie a locking safe, or both. The key point is that this chain ultimately ends in human knowledge, i.e., people have to know secrets. The trick is ensuring that enough people know enough secrets to eventually lead to the encryption keys. Providing office staff at multiple sites with combinations to safes that contain the encrypted encryption keys that a more privileged group of repository administrators know the secret for is an example of adding multiple layers into the scheme.
- Geographic redundancy mitigates decryption key disaster planning
- Security risk can never be zero, but that the risk can be brought into an acceptable range with a scheme that is well-thought-out by existing digital preservation technology and policy frameworks.
Do you know of any relevant studies/papers, etc. about this topic?
- No (all respondents)... I think we've identified a real gap in the literature.
Attempt at Decision Analysis
1) Theoretically, maintenance of keys is straightforward although complex: e.g.
- use a distributed PKI -- either (i) hierarchical or (ii) PGP-style
- maintain physically protected copies off-line
- escrow keys (possibly divided) with multiple (partially) trusted third parties
2) In practice, few organizations have good enterprise key management, and its been unexpectedly difficult to maintain even over the course of normal business timescales. Some commonly encountered challenges include:
- managing key revocation and possible re-encryption of content using a revoked key
- enterprise scaling of key management (esp. a(ii) (b) and (c) )
- enterprise scaling of performance over encrypted content (e.g. barriers to deduplication of virtualized storage; overhead for computing on encrypted content; performance issues with encrypted filesystems; barriers to standard storage maintenance/recovery/integrity auditing )
- enabling collaboration with encrypted content (managing appropriate group access to content, while maintaining desired security properties)
- potential catastrophic single point security failures for PKI ( esp. certificate issuing, checking, revocation architecture)
- proprietary encryption algorithms (esp. hardware embedded)
3) A key issue for preservation is managing risks to long term access, use of encryption creates additional risks for catastrophic / correlated/ single-point long-term access failure, such as:
- undetected corruption of content due to defects in encryption hardware/software
- " " due to increased barriers to auditing
- increased risk of content corruption due to barriers to filesystem maintenance/recovery of encrypted content
- loss of access to content because of loss of proprietary encryption technology/knowledge (try reading a hardware encrypted tape from 10 years ago :-(
- " " because of unintentional loss of key
- " " because of unintentional/incorrect revocation/key destruction (e.g. "self destruct" mechanisms on encrypted hardware, such as IronKeys)
- Financial risks to access because of increased costs of maintaining encryption
None of these are necessarily show stoppers -- in a particular environment one could of possible ways to mitigate these risks, project costs, compare to the benefits of encryption, and make a decision either way... Unless security experts are involved, risks from misimplementation or defects in the security software/hardware/protocols (etc.) are usually not on the radar; and additional risks for long term access are generally not on the radar even where a trained security expert is engaged. So its important to make sure that we, as preservation experts, communicate these additional long-term access risks and costs ...
--Micah altman 19:35, 28 February 2012 (UTC)
Addendum: In addition to loss of access to content because of loss of proprietary encryption technology/knowledge, would it also not be a possible threat that the entire PKI infrastructure could change over a long enough period of time? That, for example, current mechanisms for obtaining keys and revocation lists etc. could become unused and unavailable?
A selected bibliography
(From Micah Altman)
While PKI isn't my area of research here are some relevant articles from a shallow dive:
"Strategies for Ensuring Data Accessibility When Cryptographic Keys Are Lost" http://www.giac.org/paper/gsec/783/strategies-ensuring-data-accessibility-cryptographic-keys-lost/101698
The Risks of Key Recovery, Key Escrow, and Trusted Third-Party Encryption http://www.schneier.com/paper-key-escrow.html
Crypto Backup and Key Escrow http://dl.acm.org/citation.cfm?id=227241
A taxonomy for key escrow encryption systems : http://faculty.nps.edu/dedennin/publications/Taxonomy-CACM.pdf
(And 150+ citing articles: http://scholar.google.com/scholar?hl=en&lr=&cites=49628270739110236&um=1&ie=UTF-8&ei=zjRNT_K8BYuD0QGlzon3Ag&sa=X&oi=science_links&ct=sl-citedby&resnum=3&ved=0CD4QzgIwAg)
Although weighted towards "key escrow" solutions they seem to have substantial relevance to long term access.
Potential Future Discussion Topics
Preservation Policies
- number of copies
- bit integrity check frequency
- storage hierarchy
Emerging Storage Technology
- data reduction/de-duplication
- device encryption
- cloud providers
- WORM devices
- federated clusters
Decision Factors
- collection size
- budget
- development resources