DLF Wiki - User contributions [en]

NDSA:Levels of preservation

2012-08-28T18:27:50Z

Awoods: /* Steps and Targets: Defining Tiered Levels of Digital Preservation */

==Steps and Targets: Defining Tiered Levels of Digital Preservation==

'''One Sentence Description:''' Infrastructure, Innovation, Content and Standards Working Group members will define a brief set of guidelines on tiered levels of digital preservation.

Working Draft Document:
Current Working Draft:
[http://www.loc.gov/extranet/wiki/osi/ndiip/ndsa/images/6/68/Levels_of_pres_revised_aug_28th.doc 3.0],
[http://www.loc.gov/extranet/wiki/osi/ndiip/ndsa/images/7/7d/Levels_of_Digital_Preservation_draft_handout_v2_2.pdf 2.2]

[http://www.loc.gov/extranet/wiki/osi/ndiip/ndsa/images/b/b4/Levels_of_Digital_Preservation_-_Slide_Deck_v2_3.pdf Slides Presented at Digital Preservation 2012 Meeting]

Previous Draft: [http://www.loc.gov/extranet/wiki/osi/ndiip/ndsa/images/0/0c/Digital_preservation_levels_four_levels_six_factors_v2.doc Levels and Factors in Technical Functionality for Digital Preservation]

'''Statement of the Problem and Goal to Address Problem:''' There is both very basic digital preservation information, like NDIIPP’s personal archiving materials, as well as extensive and substantial requirements for being recognized as a trusted digital repository. However, there is little solid guidance on how an organization should prioritize its resource allocation between these two ends of the spectrum. The goal of this project is to develop a tiered set of recommendations for prioritizing enhancements to digital preservation systems (defined broadly to include organizational and technical infrastructure). This group will define targets for at least three distinct levels of criteria for digital preservation systems, at the bottom level providing guidance to “get the boxes off the floor” and at each escalating level offering prioritized suggestions for how organizations can get the most out of their resources for additional preservation assurance at each subsequent level.

'''Strategic Value:'''
* Focused on a clear gap identified by the working group chairs and coordinating committee.
* Focused on pragmatic best usage of resources as opposed to ideal situations.
* Resulting resource is of value to members at each end of the spectrum.

'''Required Resources:''' Time of a small number of internal members. Potentially involves external or specific targeted internal review.

'''Roadmap:'''
#'''Completed''' Hold conference call to discuss levels documents created by content team on an infrastructure call. See [http://www.loc.gov/extranet/wiki/osi/ndiip/ndsa/images/b/b8/Outline_for_levels-draft_oct5.doc Digital Preservation Levels] and [http://www.loc.gov/extranet/wiki/osi/ndiip/ndsa/images/1/17/Digitization_chart.xls Digital Preservation Levels Chart]see also [http://www.mnhs.org/preserve/records/legislativerecords/carol/docs_pdfs/NDIIPPpreservation_grid-Feburary2012Final_000.pdf MNHS Digital File Preservation Options Good, Better, Best doc]
#'''Completed''' Iteratively revise draft document and invite member feedback
#'''Completed''' Hold workshop at NDSA conference to present and critique the document
#Identify key reference documents to link to in a brief annotated bibliography
#Consider including a short glossary
#Invite particular targeted individuals to review it
#Decide on dissemination plan and disseminate it
#Identify a future date at which an NDSA action team should revisit this project

'''Dissemination of Knowledge:''' Once finished we can publish this as a short report on digitalpreservation.gov, put up a blog post announcing it on the Library of Congress digital preservation blog, and group members can send out an announcement about it to various listservs.

'''Signifiers of Success:''' Completed document. Ideally, an indication of broader success would be seeing this document referred to in a range of plans and guidance.

NDSA:Levels of preservation

2012-08-28T18:24:57Z

Awoods: /* Steps and Targets: Defining Tiered Levels of Digital Preservation */

==Steps and Targets: Defining Tiered Levels of Digital Preservation==

'''One Sentence Description:''' Infrastructure, Innovation, Content and Standards Working Group members will define a brief set of guidelines on tiered levels of digital preservation.

Working Draft Document:
Current Working Draft:
[http://www.loc.gov/extranet/wiki/osi/ndiip/ndsa/images/6/68/Levels_of_pres_revised_aug_28th.doc Levels of Preservation Draft 3.0]
[http://www.loc.gov/extranet/wiki/osi/ndiip/ndsa/images/7/7d/Levels_of_Digital_Preservation_draft_handout_v2_2.pdf Levels of Preservation Draft 2.2]

[http://www.loc.gov/extranet/wiki/osi/ndiip/ndsa/images/b/b4/Levels_of_Digital_Preservation_-_Slide_Deck_v2_3.pdf Slides Presented at Digital Preservation 2012 Meeting]

Previous Draft: [http://www.loc.gov/extranet/wiki/osi/ndiip/ndsa/images/0/0c/Digital_preservation_levels_four_levels_six_factors_v2.doc Levels and Factors in Technical Functionality for Digital Preservation]

'''Statement of the Problem and Goal to Address Problem:''' There is both very basic digital preservation information, like NDIIPP’s personal archiving materials, as well as extensive and substantial requirements for being recognized as a trusted digital repository. However, there is little solid guidance on how an organization should prioritize its resource allocation between these two ends of the spectrum. The goal of this project is to develop a tiered set of recommendations for prioritizing enhancements to digital preservation systems (defined broadly to include organizational and technical infrastructure). This group will define targets for at least three distinct levels of criteria for digital preservation systems, at the bottom level providing guidance to “get the boxes off the floor” and at each escalating level offering prioritized suggestions for how organizations can get the most out of their resources for additional preservation assurance at each subsequent level.

'''Strategic Value:'''
* Focused on a clear gap identified by the working group chairs and coordinating committee.
* Focused on pragmatic best usage of resources as opposed to ideal situations.
* Resulting resource is of value to members at each end of the spectrum.

'''Required Resources:''' Time of a small number of internal members. Potentially involves external or specific targeted internal review.

'''Roadmap:'''
#'''Completed''' Hold conference call to discuss levels documents created by content team on an infrastructure call. See [http://www.loc.gov/extranet/wiki/osi/ndiip/ndsa/images/b/b8/Outline_for_levels-draft_oct5.doc Digital Preservation Levels] and [http://www.loc.gov/extranet/wiki/osi/ndiip/ndsa/images/1/17/Digitization_chart.xls Digital Preservation Levels Chart]see also [http://www.mnhs.org/preserve/records/legislativerecords/carol/docs_pdfs/NDIIPPpreservation_grid-Feburary2012Final_000.pdf MNHS Digital File Preservation Options Good, Better, Best doc]
#'''Completed''' Iteratively revise draft document and invite member feedback
#'''Completed''' Hold workshop at NDSA conference to present and critique the document
#Identify key reference documents to link to in a brief annotated bibliography
#Consider including a short glossary
#Invite particular targeted individuals to review it
#Decide on dissemination plan and disseminate it
#Identify a future date at which an NDSA action team should revisit this project

'''Dissemination of Knowledge:''' Once finished we can publish this as a short report on digitalpreservation.gov, put up a blog post announcing it on the Library of Congress digital preservation blog, and group members can send out an announcement about it to various listservs.

'''Signifiers of Success:''' Completed document. Ideally, an indication of broader success would be seeing this document referred to in a range of plans and guidance.

NDSA:Discussions on Preservation Storage Topics

2012-02-28T16:12:47Z

Awoods: /* Topic 1: Encryption */

==Statement of Purpose==
The Infrastructure Working Group, in February 2012, initiated a series of open conversations on detailed aspects of preservation storage. These conversations are conducted over the listserv and each topic is discussed over the course of 2-3 weeks. A list of future, potential discussion topics is maintained at bottom and can be augmented by group members. This page serves to capture the content of those conversations for further elaboration by group members.

==Topic 1: Encryption==
'''Do you have any opinions on it? What are your reasons for your opinions (gut feelings are OK)?'''

*The majority of the preservation data we deliver to our clients is stored on LTO data tapes - without encryption. We do use WORM capability if the client is OK with it. Our reasons are mainly based on the the assumption that we do not have any control over who can access the tape, now or in the future, and staffing changes might stifle the client's ability to recover the preservation files ("now where did the last person put the list of encryption keys?")
*Pros:
**Strong encryption eliminates worries associated with unauthorized access to preservation copies of materials (such as copyrighted data).
**Encryption doubles as an authenticity check, and in fact, some encryption methods involve the creation of a digital signature that can be used for provenance or bit rot detection.
*Cons:
**Encryption causes file size bloat to the tune of 20-30%.
**For light archives, encryption imparts a performance penalty for systems that need to extract the content from the preservation archive for access purposes.
*Duracloud's approach to encryption is in response to what consumers of cloud storage are requesting. The number one concern is over unauthorized access.
*There is a surfeit of advice promoting a natural wariness towards encryption, though no known studies have addressed it specifically. This conventional wisdom of avoidance is mostly likely driven by the security risk of losing keys (as mentioned above) and the challenge it poses to access, especially to those that may have a legitimate reason (or authorization) to access the data.

'''What kinds of problems do you think it might create in the future?'''

*See above. We're most concerned that staffing issues combined with object-based vault management infrastructures in place could lead to problems. Certainly not saying that is the best rationale, but it is based on current reality.
*As mentioned, preservation of the encryption keys is typically raised as a long-term concern (see below)
*Format obsolescence and the need for migration is of equal concern with encryption formats as it is with data storage formats themselves.
*Similar to the Tivoli Storage Manager example, many cloud storage consumers want to separate the responsibilities of data security from that of storage by uploading already encrypted content. However, the burden of client-side encryption poses a barrier to some.
*There is the potential for encryption requirements to force a revision of the architectural designs of preservation repositories. If ingesting and preserving content with potential HRCI (risk confidential information), PII (personally identifiable information), or other sensitive/private information, institutional policies or legal requirements may dictate security policies. This can determine encryption requirements which, in turn, can necessitiate the use of specific storage media and architecture (see note below on encrypting disk vs. encrypting tape).
*The aforementioned wariness is also likely caused by the uncertainty regarding its complication of format migrations, data mining, and other automated preservation functions. That added layer of complexity is itself an additional preservation risk.
*A key concern is that encryption will overcomplicate legitimate access to content.

'''Do you have any current requirements to do this (laws, policies)? What are the conditions under which you need to encrypt? Do you know of any upcoming requirements for you to do this?'''

*What started us having to consider encryption in our preservation repository was our email archiving project. We'll be preserving email with permanent scholarly value. Email is the first type of content we are preserving that could potentially have HRCI (high risk confidential information) or even just sensitive/private information. Because of its potential privacy issues and sheer quantity, we are treating all email that comes into the repository as potentially sensitive. Its range of potential sensitive/private information means that we are subject to the university security policy regarding HRCI/personal/private information as well as all relevant state and federal laws (HIPAA, FERPA, MA state encryption law, etc.) for this content.
*While the above requirements caused us to revise our policies & architectural design, it means that we will be able to accept sensitive content of any type (beyond email) when we are done.
*We received mixed advice regarding software vs. hardware encryption. We were told software encryption solutions were immature (performance problems and worse) and that hardware encryption was the way to go. Some of our system administrators looked at the encryption offerings and found some big drawbacks not even considering effect on preservation (expense mainly but also having to manage a couple of encryption key management devices).
*We have since come to the conclusion that we are not required to encrypt this content on storage disks, because we are taking other measures (private network address space, local firewalls, periodic penetration tests, encryption on transport, etc.). But, if we use tape as part of the storage solution we will have to encrypt the tapes. We are replacing the DRS storage system this year so, in part because of this encryption requirement, we are considering an all-disk solution (up-to-now we have always included 2 tape copies. along with disk storage).
*No (other respondents)

'''If you do it what technique(s)/strategies do you use? Do you isolate encrypted content from non-encrypted content?'''

*Duracloud, as a provider, is developing tools to accommodate a number of scenarios:
**client does all encryption and key management themselves
**client manages keys, but provides them to upload tooling to encrypt content prior to transit
**client wants persisted content to be encrypted, but would rather not deal with key management or the encryption process
**layered on top of these scenarios is the consideration of the contents' usability within the storage system; such as indexing or metadata extraction.
*We use Tivoli Storage Manager (TSM) client-side encryption for our tape backups. We do not operate the tape backup system we use, so we want to isolate our data security from the tape backup system environment.
*Encryption is done for security purposes - tracking is done w/ barcodes entered into the preservation metadata database, and vault system databases do not usually "refresh" with each other, so the 2 are not in sync. Again, not saying by any stretch this should/ could be considered "best practices" - it's just what typically happens. Refresh cycles (while another topic) are also a problem in this environment, as there is typically less interest in subsequent updates of the data tapes that have not "recouped" their initial cost of creation. I guess I'm trying to convey there is more interest in keeping the 10-20% of backups that have been profitable than a consistent policy that treats all digital preservation files equally.

'''On decryption keys'''

*Long-term secure preservation of the decryption keys themselves is typically raised as a concern, although personally I feel that solutions to this problem are straightforward, albeit complex. I view this as a compound problem that requires a combination of preservation storage principles and security principles to solve:
***Preservation storage - There have to be multiple copies of the keys (existing framework of geographic distribution should facilitate this)
**Security storage - The keys themselves obviously have to be secured in some way. This can be done with either additional encryption or physical security, ie a locking safe, or both. The key point is that this chain ultimately ends in human knowledge, i.e., people have to know secrets. The trick is ensuring that enough people know enough secrets to eventually lead to the encryption keys. Providing office staff at multiple sites with combinations to safes that contain the encrypted encryption keys that a more privileged group of repository administrators know the secret for is an example of adding multiple layers into the scheme.
**Geographic redundancy mitigates decryption key disaster planning
*Security risk can never be zero, but that the risk can be brought into an acceptable range with a scheme that is well-thought-out by existing digital preservation technology and policy frameworks.

'''Do you know of any relevant studies/papers, etc. about this topic?'''
*No (all respondents)... I think we've identified a real gap in the literature.

==Potential Future Discussion Topics==
Preservation Policies
* number of copies
* bit integrity check frequency
* storage hierarchy

Emerging Storage Technology
* data reduction/de-duplication
* device encryption
* cloud providers
* WORM devices
* federated clusters

Decision Factors
* collection size
* budget
* development resources

==Other Ideas==

NDSA:IRODS

2011-04-05T20:05:09Z

Awoods:

# What sort of use cases is your system designed to support? What doesn't this support?
#* Share Data
#* Build Digital Libraries
#* Build a Preservation Environment
#* Any group that needs to manage distributed data or to migrate data should consider iRODS.
# Is this a system or a prototype?
#* It is definitely in production, although there is a separate prototype for NARA.
# Who is using it for a preservation use case?
#* CDR and the Taiwan National Archive
# What preservation strategies would your system support?
#* The principle strategy is the instantiation of a standard set of enforceable policies in a preservation archive.
#* 120 policies have been identified to date. In identifying and reviewing the policies at a SAA workshop, there was a subset of 20 that at least 50% wanted. there is a long tail of policies that at least 1 organization wanted.
# What issues are there of mismatched semantics across system?
#* An example is NCAR, with mass storage form the 1960s that understood tape get/put. A disk cache had to be put in place on top of tape to interact with. It's the same with Cloud services, which also deals in get/put, and need a cache on top.
# What is the base level of functionality to be a part of iRODS?
#* This varies. There are specific functions for each local environment. What data processing needs are there? Where must they be run? etc.
# When managing large data collections, is distributed data integrity checking built into the system?
#* Yes, at the whichever locations where the data is stored. You can create procedures for independent checking.
# What infrastructure do you rely on? AND What resources are required to support a solution implemented in your environment?
#* Any operating system
#* Up to 1 million files cam run in a standalone instance
#* Over 1 million files, a distributed system is needed.
#* The number of files is the primary gating factor for the database. There is use with Postgres, MySQL, and Oracle, but most use Postgres.
# What is the largest current installation?
#* NASA, with 700 TB, 65-85 million files. A Particle data project in France has 2-3 PB.
# Why is the catalog in one central location?
#* Efficiency. There are three models in use in various installations. NIH required a central catalog, which can have slaves. NAO required multiple chained into a catalog. A project in the UK is using multiple Grids chained together.
# Are there any moving image projects?
#* Yes, Cinegrid. Also an ocean observation project with video observation files.
# How can the cloud environment impact digital preservation activities?
#* There are no assertions about integrity or any properties. The system has to independently record and assert.
#* Once a project reaches a certain amount of data, it should really work locally, not in the cloud. It can be more cost efficient/predictable if you expect to be running within local capacity.
# If we put data in your system today what systems and processes are in place so that we can get it back 50 years from now? (Take for granted a sophisticated audience that knows about multiple copies etc.)
#* In 50 years, NONE of our current infrastructure components will still be in place.
#* We need infrastructure independence - we have to be able to migrate based on policies, not a specific infrastructure, to a new infrastructure.
#* To do that, we must know all previous versions of policies, and which are applied to which objects. That is potentially easiest with a policy-based system like iRODS.
# What cloud services are supported so far?
#* S3 and EC3. It can also be run in a virtualized environment, such as the VCL project at NCSU.
# What about distributed checksums?
#* Can check in each and/or compare across multiple copies. In a local environment, it checks against the central catalog.
# Are there any privacy use cases to be aware of?
#* They have worked with a group on IRB issues, and can implement policies against a local IRB catalog. That data is NOT stored in the central catalog.
# Anything else?
#* The next release is February 2011.

NDSA:IRODS

2011-04-05T20:04:25Z

Awoods:

# Question: What sort of use cases is your system designed to support? What doesn't this support?
#* Share Data
#* Build Digital Libraries
#* Build a Preservation Environment
#* Any group that needs to manage distributed data or to migrate data should consider iRODS.
# Question: Is this a system or a prototype?
#* It is definitely in production, although there is a separate prototype for NARA.
# Question: Who is using it for a preservation use case?
#* CDR and the Taiwan National Archive
# What preservation strategies would your system support?
#* The principle strategy is the instantiation of a standard set of enforceable policies in a preservation archive.
#* 120 policies have been identified to date. In identifying and reviewing the policies at a SAA workshop, there was a subset of 20 that at least 50% wanted. there is a long tail of policies that at least 1 organization wanted.
# What issues are there of mismatched semantics across system?
#* An example is NCAR, with mass storage form the 1960s that understood tape get/put. A disk cache had to be put in place on top of tape to interact with. It's the same with Cloud services, which also deals in get/put, and need a cache on top.
# Question: What is the base level of functionality to be a part of iRODS?
#* This varies. There are specific functions for each local environment. What data processing needs are there? Where must they be run? etc.
# Question: When managing large data collections, is distributed data integrity checking built into the system?
#* Yes, at the whichever locations where the data is stored. You can create procedures for independent checking.
# What infrastructure do you rely on? AND What resources are required to support a solution implemented in your environment?
#* Any operating system
#* Up to 1 million files cam run in a standalone instance
#* Over 1 million files, a distributed system is needed.
#* The number of files is the primary gating factor for the database. There is use with Postgres, MySQL, and Oracle, but most use Postgres.
# What is the largest current installation?
#* NASA, with 700 TB, 65-85 million files. A Particle data project in France has 2-3 PB.
# Why is the catalog in one central location?
#* Efficiency. There are three models in use in various installations. NIH required a central catalog, which can have slaves. NAO required multiple chained into a catalog. A project in the UK is using multiple Grids chained together.
# Are there any moving image projects?
#* Yes, Cinegrid. Also an ocean observation project with video observation files.
# How can the cloud environment impact digital preservation activities?
#* There are no assertions about integrity or any properties. The system has to independently record and assert.
#* Once a project reaches a certain amount of data, it should really work locally, not in the cloud. It can be more cost efficient/predictable if you expect to be running within local capacity.
# If we put data in your system today what systems and processes are in place so that we can get it back 50 years from now? (Take for granted a sophisticated audience that knows about multiple copies etc.)
#* In 50 years, NONE of our current infrastructure components will still be in place.
#* We need infrastructure independence - we have to be able to migrate based on policies, not a specific infrastructure, to a new infrastructure.
#* To do that, we must know all previous versions of policies, and which are applied to which objects. That is potentially easiest with a policy-based system like iRODS.
# What cloud services are supported so far?
#* S3 and EC3. It can also be run in a virtualized environment, such as the VCL project at NCSU.
# What about distributed checksums?
#* Can check in each and/or compare across multiple copies. In a local environment, it checks against the central catalog.
# Are there any privacy use cases to be aware of?
#* They have worked with a group on IRB issues, and can implement policies against a local IRB catalog. That data is NOT stored in the central catalog.
# Anything else?
#* The next release is February 2011.

NDSA:Cloud Presentations

2011-04-04T16:43:03Z

Awoods: /* Solution Models and Environments */

In each case we would want to identify who would present, who will contact them. Then when they will present.

From there we can include specific questions we would like them to respond to.

==Presentation Schedule and Slides==
# Feb 1, Tues, 1:00 EST call with iRods Reagan Moore ([[NDSA:Media:NIAID.ppt|presentation]])
# Feb 14, Monday, 11:00 EST call with Duracloud ([[NDSA:Media:DuracloudNDSA.ppt|presentation]])
# Feb 17, Thurs, 11:00 EST call with MetaArchive/GDDP Katherine Skinner, Matt Schultz and Martin Halbert MetaArchive NDSA ([[NDSA:Media:MetaArchive NDSA Infrastructure.ppt|presentation]])

==People/Projects to Contact==
*Chronopolis (Mike Smorul will contact)
*Open questions from the Educopia Guide to Distributed Digital Preservation
*Commercial providers? (Who specifically would we want here? Please add them.)
**Azure (Leslie to contact)
**Amazon (Who will contact?)

==General Questions for Cloud Service Presenters==
Here we are working on a set of general questions for presenters to develop talks around.

# What sort of use cases is your system designed to support? What doesn't this support?
# What preservation standards would your system support?
# What resources are required to support a solution implemented in your environment?
# What infrastructure do you rely on?
# How can your system impact digital preservation activities?
# If we put data in your system today what systems and processes are in place so that we can get it back 10 years from now? (Take for granted a sophisticated audience that knows about multiple copies etc.)
# What types of materials does your system handle? (documents, audio files, video file, stills, data sets, etc) And give examples of those types in practice

==Questions for Member Institution Implementations of Large Scale Storage Architectures==
#What is the particular preservation goal or challenge you need to accomplish? (for example, re-use, public access, internal access, legal mandate, etc.)
#What large scale storage or cloud technologies are you using to meet that challenge? Further, which service providers or tools did you consider and how did you make your choice?
#Specifically, what kind of materials are you preserving (text, data sets, images, moving images, web pages, etc.)
#How big is your collection? (In terms of number of objects and storage space required)
#What are your performance requirements?
#What storage media have you elected to use? (Disk, Tape, etc)
#What do you think the key advantages of the system you use?
#What do you think are the key problems or disadvantages your system present?
#What important principles informed your decision about the particular tool or service you chose to use?
#How frequently do you migrate from one system to another?
# What characteristics of the storage system(s) you use do you feel are particularly well-suited to long-term digital preservation? (High levels of redundancy/resiliency, internal checksumming capabilities, automated tape refresh, etc)
# What functionality or processes have you developed to augment your storage systems in order to meet preservation goals? (Periodic checksum validation, limited human access or novel use of permissions schemes)
# Are there tough requirements for digital preservation, e.g. TRAC certification, that you wish were more readily handled by your storage system?

===Responses to questions===
====iRODS====
# ...

Other general notes:

* [Snavely] The need for each storage target to support a specific set of operations, and consistently with other storage targets, seems like a risk that comes along with the elegant abstraction that iRODS provides. Clear specifications help mitigate this risk.

====[[NDSA:DuraCloud]] direct responses====
Other general notes:

* [Snavely] Treatment of cloud provider is generally as a black box, without a strong sense of actual reliability of underlying storage systems. Cloud providers tend to promise checksum validation of contents, but recourse if validation fails was unknown (right?). Additional checksum validation has been augmented on top of cloud storage service by Duracloud.

====[[NDSA:MetaArchive/GDDP]] direct responses====
Other general notes:

* [Snavely] Built on LOCKSS, so data integrity assurances are provided by robust networked software model augmented to commodity hardware and storage. Federated nature provides integrity assurance but also a lack of central control in that the accidental loss of multiple caches is unlikely but e.g. scheduled maintenance or upgrades could coincidentally collide.

====Chronopolis====
# ...
====MicroSoft Azure====
# ...
====Amazon S3/EC2====
# ...

==General Concerns==
# confidential data
# encrypted data
# auditing
# preservation risks
# legal compliance
# ...

==Solution Models and Environments==
{| border="1"
!Name
!Offered as Service
!Deployed Locally
!Opensource
!Authentication Scheme
!Ingest Mechanism
!Export Mechanism
!Integrity/Validation Mechanism
!Replication Mechanism
!Administration Model (Federated, etc.)
!Tiering Support
|-
|iRODS
|
|
|
|
|
|
|
|
|
|
|-
|DuraCloud
|yes
|yes
|yes (Apache2)
|Basic Auth
|1:web-ui, 2:client-side utility, 3:REST-API
|1:web-ui, 2:client-side utility, 3:REST-API
|Checksum verified on ingest. On-demand checksum verification service.
|Built-in support for cross-cloud replication.
|
|
|-
|MetaArchive/GDDP
|
|
|
|
|
|
|
|
|
|
|-
|Chronopolis
|
|
|
|
|
|
|
|
|
|
|-
|Microsoft Azure
|
|
|
|
|
|
|
|
|
|
|-
|Amazon S3/EC2
|
|
|
|
|
|
|
|
|
|
|-
|}

NDSA:Cloud Presentations

2011-04-04T16:38:05Z

Awoods: /* MetaArchive/GDDP */

In each case we would want to identify who would present, who will contact them. Then when they will present.

From there we can include specific questions we would like them to respond to.

==Presentation Schedule and Slides==
# Feb 1, Tues, 1:00 EST call with iRods Reagan Moore ([[NDSA:Media:NIAID.ppt|presentation]])
# Feb 14, Monday, 11:00 EST call with Duracloud ([[NDSA:Media:DuracloudNDSA.ppt|presentation]])
# Feb 17, Thurs, 11:00 EST call with MetaArchive/GDDP Katherine Skinner, Matt Schultz and Martin Halbert MetaArchive NDSA ([[NDSA:Media:MetaArchive NDSA Infrastructure.ppt|presentation]])

==People/Projects to Contact==
*Chronopolis (Mike Smorul will contact)
*Open questions from the Educopia Guide to Distributed Digital Preservation
*Commercial providers? (Who specifically would we want here? Please add them.)
**Azure (Leslie to contact)
**Amazon (Who will contact?)

==General Questions for Cloud Service Presenters==
Here we are working on a set of general questions for presenters to develop talks around.

# What sort of use cases is your system designed to support? What doesn't this support?
# What preservation standards would your system support?
# What resources are required to support a solution implemented in your environment?
# What infrastructure do you rely on?
# How can your system impact digital preservation activities?
# If we put data in your system today what systems and processes are in place so that we can get it back 10 years from now? (Take for granted a sophisticated audience that knows about multiple copies etc.)
# What types of materials does your system handle? (documents, audio files, video file, stills, data sets, etc) And give examples of those types in practice

==Questions for Member Institution Implementations of Large Scale Storage Architectures==
#What is the particular preservation goal or challenge you need to accomplish? (for example, re-use, public access, internal access, legal mandate, etc.)
#What large scale storage or cloud technologies are you using to meet that challenge? Further, which service providers or tools did you consider and how did you make your choice?
#Specifically, what kind of materials are you preserving (text, data sets, images, moving images, web pages, etc.)
#How big is your collection? (In terms of number of objects and storage space required)
#What are your performance requirements?
#What storage media have you elected to use? (Disk, Tape, etc)
#What do you think the key advantages of the system you use?
#What do you think are the key problems or disadvantages your system present?
#What important principles informed your decision about the particular tool or service you chose to use?
#How frequently do you migrate from one system to another?
# What characteristics of the storage system(s) you use do you feel are particularly well-suited to long-term digital preservation? (High levels of redundancy/resiliency, internal checksumming capabilities, automated tape refresh, etc)
# What functionality or processes have you developed to augment your storage systems in order to meet preservation goals? (Periodic checksum validation, limited human access or novel use of permissions schemes)
# Are there tough requirements for digital preservation, e.g. TRAC certification, that you wish were more readily handled by your storage system?

===Responses to questions===
====iRODS====
# ...

Other general notes:

* [Snavely] The need for each storage target to support a specific set of operations, and consistently with other storage targets, seems like a risk that comes along with the elegant abstraction that iRODS provides. Clear specifications help mitigate this risk.

====[[NDSA:DuraCloud]] direct responses====
Other general notes:

* [Snavely] Treatment of cloud provider is generally as a black box, without a strong sense of actual reliability of underlying storage systems. Cloud providers tend to promise checksum validation of contents, but recourse if validation fails was unknown (right?). Additional checksum validation has been augmented on top of cloud storage service by Duracloud.

====[[NDSA:MetaArchive/GDDP]] direct responses====
Other general notes:

* [Snavely] Built on LOCKSS, so data integrity assurances are provided by robust networked software model augmented to commodity hardware and storage. Federated nature provides integrity assurance but also a lack of central control in that the accidental loss of multiple caches is unlikely but e.g. scheduled maintenance or upgrades could coincidentally collide.

====Chronopolis====
# ...
====MicroSoft Azure====
# ...
====Amazon S3/EC2====
# ...

==General Concerns==
# confidential data
# encrypted data
# auditing
# preservation risks
# legal compliance
# ...

==Solution Models and Environments==
{| border="1"
!Name
!Offered as Service
!Deployed Locally
!Opensource
!Authentication Scheme
!Ingest Mechanism
!Export Mechanism
!Integrity/Validation Mechanism
!Replication Mechanism
!Administration Model (Federated, etc.)
!Tiering Support
|-
|iRODS
|
|
|
|
|
|
|
|
|
|
|-
|DuraCloud
|
|
|
|
|
|
|
|
|
|
|-
|MetaArchive/GDDP
|
|
|
|
|
|
|
|
|
|
|-
|Chronopolis
|
|
|
|
|
|
|
|
|
|
|-
|Microsoft Azure
|
|
|
|
|
|
|
|
|
|
|-
|Amazon S3/EC2
|
|
|
|
|
|
|
|
|
|
|-
|}

NDSA:Cloud Presentations

2011-04-04T16:37:39Z

Awoods: /* DuraCloud */

In each case we would want to identify who would present, who will contact them. Then when they will present.

From there we can include specific questions we would like them to respond to.

==Presentation Schedule and Slides==
# Feb 1, Tues, 1:00 EST call with iRods Reagan Moore ([[NDSA:Media:NIAID.ppt|presentation]])
# Feb 14, Monday, 11:00 EST call with Duracloud ([[NDSA:Media:DuracloudNDSA.ppt|presentation]])
# Feb 17, Thurs, 11:00 EST call with MetaArchive/GDDP Katherine Skinner, Matt Schultz and Martin Halbert MetaArchive NDSA ([[NDSA:Media:MetaArchive NDSA Infrastructure.ppt|presentation]])

==People/Projects to Contact==
*Chronopolis (Mike Smorul will contact)
*Open questions from the Educopia Guide to Distributed Digital Preservation
*Commercial providers? (Who specifically would we want here? Please add them.)
**Azure (Leslie to contact)
**Amazon (Who will contact?)

==General Questions for Cloud Service Presenters==
Here we are working on a set of general questions for presenters to develop talks around.

# What sort of use cases is your system designed to support? What doesn't this support?
# What preservation standards would your system support?
# What resources are required to support a solution implemented in your environment?
# What infrastructure do you rely on?
# How can your system impact digital preservation activities?
# If we put data in your system today what systems and processes are in place so that we can get it back 10 years from now? (Take for granted a sophisticated audience that knows about multiple copies etc.)
# What types of materials does your system handle? (documents, audio files, video file, stills, data sets, etc) And give examples of those types in practice

==Questions for Member Institution Implementations of Large Scale Storage Architectures==
#What is the particular preservation goal or challenge you need to accomplish? (for example, re-use, public access, internal access, legal mandate, etc.)
#What large scale storage or cloud technologies are you using to meet that challenge? Further, which service providers or tools did you consider and how did you make your choice?
#Specifically, what kind of materials are you preserving (text, data sets, images, moving images, web pages, etc.)
#How big is your collection? (In terms of number of objects and storage space required)
#What are your performance requirements?
#What storage media have you elected to use? (Disk, Tape, etc)
#What do you think the key advantages of the system you use?
#What do you think are the key problems or disadvantages your system present?
#What important principles informed your decision about the particular tool or service you chose to use?
#How frequently do you migrate from one system to another?
# What characteristics of the storage system(s) you use do you feel are particularly well-suited to long-term digital preservation? (High levels of redundancy/resiliency, internal checksumming capabilities, automated tape refresh, etc)
# What functionality or processes have you developed to augment your storage systems in order to meet preservation goals? (Periodic checksum validation, limited human access or novel use of permissions schemes)
# Are there tough requirements for digital preservation, e.g. TRAC certification, that you wish were more readily handled by your storage system?

===Responses to questions===
====iRODS====
# ...

Other general notes:

* [Snavely] The need for each storage target to support a specific set of operations, and consistently with other storage targets, seems like a risk that comes along with the elegant abstraction that iRODS provides. Clear specifications help mitigate this risk.

====[[NDSA:DuraCloud]] direct responses====
Other general notes:

* [Snavely] Treatment of cloud provider is generally as a black box, without a strong sense of actual reliability of underlying storage systems. Cloud providers tend to promise checksum validation of contents, but recourse if validation fails was unknown (right?). Additional checksum validation has been augmented on top of cloud storage service by Duracloud.

====[[NDSA:MetaArchive/GDDP]]====
# ...

Other general notes:

* [Snavely] Built on LOCKSS, so data integrity assurances are provided by robust networked software model augmented to commodity hardware and storage. Federated nature provides integrity assurance but also a lack of central control in that the accidental loss of multiple caches is unlikely but e.g. scheduled maintenance or upgrades could coincidentally collide.

====Chronopolis====
# ...
====MicroSoft Azure====
# ...
====Amazon S3/EC2====
# ...

==General Concerns==
# confidential data
# encrypted data
# auditing
# preservation risks
# legal compliance
# ...

==Solution Models and Environments==
{| border="1"
!Name
!Offered as Service
!Deployed Locally
!Opensource
!Authentication Scheme
!Ingest Mechanism
!Export Mechanism
!Integrity/Validation Mechanism
!Replication Mechanism
!Administration Model (Federated, etc.)
!Tiering Support
|-
|iRODS
|
|
|
|
|
|
|
|
|
|
|-
|DuraCloud
|
|
|
|
|
|
|
|
|
|
|-
|MetaArchive/GDDP
|
|
|
|
|
|
|
|
|
|
|-
|Chronopolis
|
|
|
|
|
|
|
|
|
|
|-
|Microsoft Azure
|
|
|
|
|
|
|
|
|
|
|-
|Amazon S3/EC2
|
|
|
|
|
|
|
|
|
|
|-
|}

NDSA:DuraCloud

2011-03-25T20:54:21Z

Awoods:

== Questions to address ==
# What sort of use cases is your system designed to support? What doesn't this support?
## repository back up
## back up from file directory
## disaster recovery backup of content
## single file recovery
## Preservation activities that require additional compute resources, and space
## Staging area for pre-perservation ready content
## Provide predictable URLs, and we maintain the content ID provided
## Activities not currently supported\- file format migration, explicit versioning, not repository system ( so no collection or hierarchy mechanisms), no policy management implementations,  automatic repair of local file copy
# What preservation strategies would your system support?
## multiple copies in multiple locations under multiple administrations
## auto synchronization with primary copy
## all copies web accessible and can view/download
## can run bit integrity checking to compare primary and secondary copies with manifest
## format identification (in-progress)
## provenance auditing (on roadmap)
## repair of secondary copies (roadmap)
# What preservation standards would your system support?
## Any that involve specifications for a "bundle" of bits\- such as bag it
## Compatible for storing any type of package ( ie, AIP)
# What resources are required to support a solution implemented in your environment?
## almost none
##* you need one administrator to manage the DuraCloud account
##* you might require some technical help to get your content out of your local system and push a copy to DuraCloud
# What infrastructure do you rely on?
## public cloud storage
## public cloud compute
## private cloud storage
# How can the cloud environment impact digital preservation activities?
## hopefully make it easier to do support activities which are difficult to provision and manage internally
## relieves pressure of managing/upgrading internal hardware, and forecasting server & storage requirements
# If we put data in your system today what systems and processes are in place so that we can get it back 50 years from now? (Take for granted a sophisticated audience that knows about multiple copies etc.)
## You own and manage your own account and data\- you are not handing it over to us\- so you can do what you want with it at any time
## The software is all open source, so if you ever decide to run the whole stack/application on your own\- you can
## The system is tied to multiple cloud providers, lower the risk if one goes out of business.
## Your original copy is your local copy, and most likely the copy of record.  DuraCloud is just a backup.
## If one provider goes out of business we will assist you to move your content out and to another provider.

== Concerns to address ==
# confidential data
## DuraCloud is one low level component of an overall preservation strategy.  It does not address fine-grained policy and access control considerations.  It can be used to house entire collections of confidential data, and/or support a system which provides granular controls, but it does not do so itself. Does support basic authentication,and you can make spaces within duracloud dark or light.
# encrypted data
## DuraCloud can store any "bundle of bits".  It does not provide it's own primitives for encryption.  Due to the remote nature of many Duracloud use cases, maintaining encryption on an end-to-end basis is out of scope.
# auditing
## auditing of content
## system audit potential
# preservation risks
## Cloud is emerging market
## ability to fund preservation solutions-particularly when online
# legal compliance
## Content access and copyright is controlled and managed by the user/account holder

NDSA:Cloud Presentations

2011-03-24T13:34:55Z

Awoods: /* Questions for Implementers of Large Scale Storage and Cloud Services */

In each case we would want to identify who would present, who will contact them. Then when they will present.

From there we can include specific questions we would like them to respond to.

==Presentation Schedule and Slides==
# Feb 1, Tues, 1:00 EST call with iRods Reagan Moore ([[NDSA:Media:NIAID.ppt|presentation]])
# Feb 14, Monday, 11:00 EST call with Duracloud ([[NDSA:Media:DuracloudNDSA.ppt|presentation]])
# Feb 17, Thurs, 11:00 EST call with MetaArchive/GDDP Katherine Skinner, Matt Schultz and Martin Halbert MetaArchive NDSA ([[NDSA:Media:MetaArchive NDSA Infrastructure.ppt|presentation]])

==People/Projects to Contact==
*Chronopolis (Mike Smorul will contact)
*Open questions from the Educopia Guide to Distributed Digital Preservation
*Commercial providers? (Who specifically would we want here? Please add them.)
**Azure (Leslie to contact)
**Amazon (Who will contact?)

==General Questions for Cloud Service Presenters==
Here we are working on a set of general questions for presenters to develop talks around.

# What sort of use cases is your system designed to support? What doesn't this support?
# What preservation standards would your system support?
# What resources are required to support a solution implemented in your environment?
# What infrastructure do you rely on?
# How can your system impact digital preservation activities?
# If we put data in your system today what systems and processes are in place so that we can get it back 10 years from now? (Take for granted a sophisticated audience that knows about multiple copies etc.)
# What types of materials does your system handle? (documents, audio files, video file, stills, data sets, etc) And give examples of those types in practice

==Questions for Member Institution Implementations of Large Scale Storage Architectures==
#What is the particular preservation goal or challenge you need to accomplish? (for example, re-use, public access, internal access, legal mandate, etc.)
#What large scale storage or cloud technologies are you using to meet that challenge? Further, which service providers or tools did you consider and how did you make your choice?
#Specifically, what kind of materials are you preserving (text, data sets, images, moving images, web pages, etc.)
#How big is your collection? (In terms of number of objects and storage space required)
#What are your performance requirements?
#What storage media have you elected to use? (Disk, Tape, etc)
#What do you think the key advantages of the system you use?
#What do you think are the key problems or disadvantages your system present?
#What important principles informed your decision about the particular tool or service you chose to use?
#How frequently do you migrate from one system to another?
# What characteristics of the storage system(s) you use do you feel are particularly well-suited to long-term digital preservation? (High levels of redundancy/resiliency, internal checksumming capabilities, automated tape refresh, etc)
# What functionality or processes have you developed to augment your storage systems in order to meet preservation goals? (Periodic checksum validation, limited human access or novel use of permissions schemes)
# Are there tough requirements for digital preservation, e.g. TRAC certification, that you wish were more readily handled by your storage system?

===Responses to questions===
====iRODS====
# ...

Other general notes:

* [Snavely] The need for each storage target to support a specific set of operations, and consistently with other storage targets, seems like a risk that comes along with the elegant abstraction that iRODS provides. Clear specifications help mitigate this risk.

====DuraCloud====
# ...

Other general notes:

* [Snavely] Treatment of cloud provider is generally as a black box, without a strong sense of actual reliability of underlying storage systems. Cloud providers tend to promise checksum validation of contents, but recourse if validation fails was unknown (right?). Additional checksum validation has been augmented on top of cloud storage service by Duracloud.

====MetaArchive/GDDP====
# ...

Other general notes:

* [Snavely] Built on LOCKSS, so data integrity assurances are provided by robust networked software model augmented to commodity hardware and storage. Federated nature provides integrity assurance but also a lack of central control in that the accidental loss of multiple caches is unlikely but e.g. scheduled maintenance or upgrades could coincidentally collide.

====Chronopolis====
# ...
====MicroSoft Azure====
# ...
====Amazon S3/EC2====
# ...

==General Concerns==
# confidential data
# encrypted data
# auditing
# preservation risks
# legal compliance
# ...

==Solution Models and Environments==
{| border="1"
!Name
!Offered as Service
!Deployed Locally
!Opensource
!Authentication Scheme
!Ingest Mechanism
!Export Mechanism
!Integrity/Validation Mechanism
!Replication Mechanism
!Administration Model (Federated, etc.)
!Tiering Support
|-
|iRODS
|
|
|
|
|
|
|
|
|
|
|-
|DuraCloud
|
|
|
|
|
|
|
|
|
|
|-
|MetaArchive/GDDP
|
|
|
|
|
|
|
|
|
|
|-
|Chronopolis
|
|
|
|
|
|
|
|
|
|
|-
|Microsoft Azure
|
|
|
|
|
|
|
|
|
|
|-
|Amazon S3/EC2
|
|
|
|
|
|
|
|
|
|
|-
|}

NDSA:Cloud Presentations

2011-02-04T23:17:43Z

Awoods: /* General Guiding Questions for Presenters */

In each case we would want to identify who would present, who will contact them. Then when they will present.

From there we can include specific questions we would like them to respond to.

==Presentation Schedule==
Once we start scheduling presenters we will keep a list of the talks here.
# Feb 1, Tues, 1:00 EST call with iRods Reagan Moore ([[NDSA:Media:NIAID.ppt|presentation]])
# Feb 14, Monday, 11:00 EST call with Duracloud
# Feb 17, Thurs, 11:00 EST call with MetaArchive/GDDP Katherine Skinner, Matt Schultz and Martin Halbert

==People/Projects to Contact==
*DuraCloud/Duraspace (Leslie to contact)
*Chronopolis (Mike Smorul will contact)
*Open questions from the Educopia Guide to Distributed Digital Preservation http://www.metaarchive.org/GDDP (Martin will contact)
*Irods: Reagan Moore, 2/1/2011 see slides: NIAID.ppt
*Commercial providers? (Who specifically would we want here? Please add them.)
**Azure (Leslie to contact)
**Amazon (Who will contact?)

==General Guiding Questions for Presenters==
Here we are working on a set of general questions for presenters to develop talks around.

# What sort of use cases is your system designed to support? What doesn't this support?
# What preservation strategies would your system support?
# What preservation standards would your system support?
# What resources are required to support a solution implemented in your environment
# What infrastructure do you rely on?
# How can the cloud environment impact digital preservation activities?
# If we put data in your system today what systems and processes are in place so that we can get it back 50 years from now? (Take for granted a sophisticated audience that knows about multiple copies etc.)

===Responses to questions===
====iRODS====
# ...
====DuraCloud====
# ...
====MetaArchive/GDDP====
# ...
====Chronopolis====
# ...
====MicroSoft Azure====
# ...
====Amazon S3/EC2====
# ...

==General Concerns==
# confidential data
# encrypted data
# auditing
# preservation risks
# legal compliance
# ...

==Solution Models and Environments==
{| border="1"
!Name
!Offered as Service
!Deployed Locally
!Opensource
!Authentication Scheme
!Ingest Mechanism
!Export Mechanism
|-
|iRODS
|
|
|
|
|
|
|-
|DuraCloud
|
|
|
|
|
|
|-
|MetaArchive/GDDP
|
|
|
|
|
|
|-
|Chronopolis
|
|
|
|
|
|
|-
|MicroSoft Azure
|
|
|
|
|
|
|-
|Amazon S3/EC2
|
|
|
|
|
|
|-
|}

NDSA:Cloud Presentations

2011-02-04T23:04:39Z

Awoods: /* General Guiding Questions for Presenters */

In each case we would want to identify who would present, who will contact them. Then when they will present.

From there we can include specific questions we would like them to respond to.

==Presentation Schedule==
Once we start scheduling presenters we will keep a list of the talks here.
# Feb 1, Tues, 1:00 EST call with iRods Reagan Moore ([[NDSA:Media:NIAID.ppt|presentation]])
# Feb 14, Monday, 11:00 EST call with Duracloud
# Feb 17, Thurs, 11:00 EST call with MetaArchive/GDDP Katherine Skinner, Matt Schultz and Martin Halbert

==People/Projects to Contact==
*DuraCloud/Duraspace (Leslie to contact)
*Chronopolis (Mike Smorul will contact)
*Open questions from the Educopia Guide to Distributed Digital Preservation http://www.metaarchive.org/GDDP (Martin will contact)
*Irods: Reagan Moore, 2/1/2011 see slides: NIAID.ppt
*Commercial providers? (Who specifically would we want here? Please add them.)
**Azure (Leslie to contact)
**Amazon (Who will contact?)

==General Guiding Questions for Presenters==
Here we are working on a set of general questions for presenters to develop talks around.

# What sort of use cases is your system designed to support? What doesn't this support?
# What preservation strategies would your system support?
# What preservation standards would your system support?
# What resources are required to support a solution implemented in your environment
# What infrastructure do you rely on?
# How can the cloud environment impact digital preservation activities?
# If we put data in your system today what systems and processes are in place so that we can get it back 50 years from now? (Take for granted a sophisticated audience that knows about multiple copies etc.)

===Responses to questions===
====iRODS====
# ...
====DuraCloud====
# ...
====MetaArchive/GDDP====
# ...
====Chronopolis====
# ...
====MicroSoft Azure====
# ...
====Amazon S3/EC2====
# ...

NDSA:Cloud Presentations

2011-02-04T22:32:59Z

Awoods: /* Presentation Schedule */

In each case we would want to identify who would present, who will contact them. Then when they will present.

From there we can include specific questions we would like them to respond to.

==Presentation Schedule==
Once we start scheduling presenters we will keep a list of the talks here.
# Feb 1, Tues, 1:00 EST call with iRods Reagan Moore ([[NDSA:Media:NIAID.ppt|presentation]])
# Feb 14, Monday, 11:00 EST call with Duracloud
# Feb 17, Thurs, 11:00 EST call with MetaArchive/GDDP Katherine Skinner, Matt Schultz and Martin Halbert

==People/Projects to Contact==
*DuraCloud/Duraspace (Leslie to contact)
*Chronopolis (Mike Smorul will contact)
*Open questions from the Educopia Guide to Distributed Digital Preservation http://www.metaarchive.org/GDDP (Martin will contact)
*Irods: Reagan Moore, 2/1/2011 see slides: NIAID.ppt
*Commercial providers? (Who specifically would we want here? Please add them.)
**Azure (Leslie to contact)
**Amazon (Who will contact?)

==General Guiding Questions for Presenters==
Here we are working on a set of general questions for presenters to develop talks around.

*What sort of use cases is your system designed to support? What doesn't this support?
*What preservation strategies or standards would your system support?
*What resources are required to support a solution implemented in your environment
*How can the cloud environment impact digital preservation activities?
*If we put data in your system today what systems and processes are in place so that we can get it back 50 years from now? (Take for granted a sophisticated audience that knows about multiple copies etc.)
*What infrastructure do you rely on?

NDSA:NIAID.ppt

2011-02-04T22:27:55Z

Awoods: uploaded a new version of "File:NIAID.ppt": This is the iRODS presentation given by Reagan Moore, Feb 1, 2011

Reagan Moore's iRODS presentation

NDSA:Cloud Presentations

2011-02-04T22:06:33Z

Awoods: /* Presentation Schedule */

In each case we would want to identify who would present, who will contact them. Then when they will present.

From there we can include specific questions we would like them to respond to.

==Presentation Schedule==
Once we start scheduling presenters we will keep a list of the talks here.
# Feb 1, Tues, 1:00 EST call with iRods Reagan Moore (presentation)
# Feb 14, Monday, 11:00 EST call with Duracloud
# Feb 17, Thurs, 11:00 EST call with MetaArchive/GDDP Katherine Skinner, Matt Schultz and Martin Halbert

==People/Projects to Contact==
*DuraCloud/Duraspace (Leslie to contact)
*Chronopolis (Mike Smorul will contact)
*Open questions from the Educopia Guide to Distributed Digital Preservation http://www.metaarchive.org/GDDP (Martin will contact)
*Irods: Reagan Moore, 2/1/2011 see slides: NIAID.ppt
*Commercial providers? (Who specifically would we want here? Please add them.)
**Azure (Leslie to contact)
**Amazon (Who will contact?)

==General Guiding Questions for Presenters==
Here we are working on a set of general questions for presenters to develop talks around.

*What sort of use cases is your system designed to support? What doesn't this support?
*What preservation strategies or standards would your system support?
*What resources are required to support a solution implemented in your environment
*How can the cloud environment impact digital preservation activities?
*If we put data in your system today what systems and processes are in place so that we can get it back 50 years from now? (Take for granted a sophisticated audience that knows about multiple copies etc.)
*What infrastructure do you rely on?