NDSA:DuraCloud

From DLF Wiki

Questions to address

  1. What sort of use cases is your system designed to support? What doesn't this support?
    1. repository back up
    2. back up from file directory
    3. disaster recovery backup of content
    4. single file recovery
    5. Preservation activities that require additional compute resources, and space
    6. Staging area for pre-perservation ready content
    7. Provide predictable URLs, and we maintain the content ID provided
    8. Activities not currently supported\- file format migration, explicit versioning, not repository system ( so no collection or hierarchy mechanisms), no policy management implementations,  automatic repair of local file copy
  2. What preservation strategies would your system support?
    1. multiple copies in multiple locations under multiple administrations
    2. auto synchronization with primary copy
    3. all copies web accessible and can view/download
    4. can run bit integrity checking to compare primary and secondary copies with manifest
    5. format identification (in-progress)
    6. provenance auditing (on roadmap)
    7. repair of secondary copies (roadmap)
  3. What preservation standards would your system support?
    1. Any that involve specifications for a "bundle" of bits\- such as bag it
    2. Compatible for storing any type of package ( ie, AIP)
  4. What resources are required to support a solution implemented in your environment?
    1. almost none
      • you need one administrator to manage the DuraCloud account
      • you might require some technical help to get your content out of your local system and push a copy to DuraCloud
  5. What infrastructure do you rely on?
    1. public cloud storage
    2. public cloud compute
    3. private cloud storage
  6. How can the cloud environment impact digital preservation activities?
    1. hopefully make it easier to do support activities which are difficult to provision and manage internally
    2. relieves pressure of managing/upgrading internal hardware, and forecasting server & storage requirements
  7. If we put data in your system today what systems and processes are in place so that we can get it back 50 years from now? (Take for granted a sophisticated audience that knows about multiple copies etc.)
    1. You own and manage your own account and data\- you are not handing it over to us\- so you can do what you want with it at any time
    2. The software is all open source, so if you ever decide to run the whole stack/application on your own\- you can
    3. The system is tied to multiple cloud providers, lower the risk if one goes out of business.
    4. Your original copy is your local copy, and most likely the copy of record.  DuraCloud is just a backup.
    5. If one provider goes out of business we will assist you to move your content out and to another provider.

Concerns to address

  1. confidential data
    1. DuraCloud is one low level component of an overall preservation strategy.  It does not address fine-grained policy and access control considerations.  It can be used to house entire collections of confidential data, and/or support a system which provides granular controls, but it does not do so itself. Does support basic authentication,and you can make spaces within duracloud dark or light.
  2. encrypted data
    1. DuraCloud can store any "bundle of bits".  It does not provide it's own primitives for encryption.  Due to the remote nature of many Duracloud use cases, maintaining encryption on an end-to-end basis is out of scope.
  3. auditing
    1. auditing of content
    2. system audit potential
  4. preservation risks
    1. Cloud is emerging market
    2. ability to fund preservation solutions-particularly when online
  5. legal compliance
    1. Content access and copyright is controlled and managed by the user/account holder