NDSA:Cloud Presentations: Difference between revisions

Latest revision as of 17:59, 29 November 2016

In each case we would want to identify who would present, who will contact them. Then when they will present.

From there we can include specific questions we would like them to respond to.

Presentation Schedule and Slides

Feb 1, Tues, 1:00 EST call with iRods Reagan Moore (presentation)
Feb 14, Monday, 11:00 EST call with Duracloud (presentation)
Feb 17, Thurs, 11:00 EST call with MetaArchive/GDDP Katherine Skinner, Matt Schultz and Martin Halbert MetaArchive NDSA (presentation)

People/Projects to Contact

Chronopolis (Mike Smorul will contact)
Open questions from the Educopia Guide to Distributed Digital Preservation
Commercial providers? (Who specifically would we want here? Please add them.)
- Azure (Leslie to contact)
- Amazon (Who will contact?)

General Questions for Cloud Service Presenters

Here we are working on a set of general questions for presenters to develop talks around.

What sort of use cases is your system designed to support? What doesn't this support?
What preservation standards would your system support?
What resources are required to support a solution implemented in your environment?
What infrastructure do you rely on?
How can your system impact digital preservation activities?
If we put data in your system today what systems and processes are in place so that we can get it back 10 years from now? (Take for granted a sophisticated audience that knows about multiple copies etc.)
What types of materials does your system handle? (documents, audio files, video file, stills, data sets, etc) And give examples of those types in practice

Responses to questions

NDSA:iRODS direct responses

Other general notes:

[Snavely] The need for each storage target to support a specific set of operations, and consistently with other storage targets, seems like a risk that comes along with the elegant abstraction that iRODS provides. Clear specifications help mitigate this risk.

NDSA:DuraCloud direct responses

Other general notes:

[Snavely] Treatment of cloud provider is generally as a black box, without a strong sense of actual reliability of underlying storage systems. Cloud providers tend to promise checksum validation of contents, but recourse if validation fails was unknown (right?). Additional checksum validation has been augmented on top of cloud storage service by Duracloud.

NDSA:MetaArchive/GDDP direct responses

Other general notes:

[Snavely] Built on LOCKSS, so data integrity assurances are provided by robust networked software model augmented to commodity hardware and storage. Federated nature provides integrity assurance but also a lack of central control in that the accidental loss of multiple caches is unlikely but e.g. scheduled maintenance or upgrades could coincidentally collide.

Solution Models and Environments

Name	Offered as Service	Deployed Locally	Opensource	Authentication Scheme	Ingest Mechanism	Export Mechanism	Integrity/Validation Mechanism	Replication Mechanism	Administration Model (Federated, etc.)	Tiering Support
iRODS	Offered as Service	Deployed Locally	Opensource	Authentication Scheme	Ingest Mechanism	Export Mechanism	Integrity/Validation Mechanism	Replication Mechanism	Content Administration Model (Federated, etc.)	Tiering Support	Certifications
DuraCloud	yes	yes	yes (Apache2)	Basic Auth	1:web-ui, 2:client-side utility, 3:REST-API	1:web-ui, 2:client-side utility, 3:REST-API	Checksum verified on ingest. On-demand checksum verification service.	Built-in support for cross-cloud replication.	Local	No
MetaArchive/GDDP	Mixed - PLN service layer on top of local LOCKSS nodes	Mixed - PLN service layer on top of local LOCKSS nodes	No	IP-based	LOCKSS harvesting plugins	LOCKSS web proxy	LOCKSS distributed integrity checking	LOCKSS P2P	Single superuser across all nodes	No
Chronopolis	Yes	No	No	SRB/Irods based	SRB/Irods based	SRB/Irods based	Local checksums	SRB/Irods	Single superuser	No
Microsoft Azure	Yes	No	No	Multiple	.Net/WIF	Multiple APIs, .Net	Not known/propietary	Not known/propietary	Single super user	Not known/propietary
Amazon S3/EC2	Yes	No	Opensource	Multiple, including certs; proprietary / limited delegation model	Restful API's	Restful API's	Proprietary	Proprietary	Single superuser	Yes
DVN/Safearchive	Yes	Yes	Opensource	Basic Auth/IP	Proprietary UI/Batch UI/LOCKSS harvesting plugins	OAI/Lockss harvesting/proprietary	LOCKS distributed integrity checks with additional TRAC auditing layer	LOCKS with additional TRAC-based provisioning layer	Federated & distributed	No

NDSA:Cloud Presentations: Difference between revisions

Latest revision as of 17:59, 29 November 2016

Contents

Presentation Schedule and Slides

People/Projects to Contact

General Questions for Cloud Service Presenters

Responses to questions

NDSA:iRODS direct responses

NDSA:DuraCloud direct responses

NDSA:MetaArchive/GDDP direct responses

Solution Models and Environments

Navigation menu

@@ Line 1: / Line 1: @@
+[[File:NDSA Logo.png|thumb]]
 In each case we would want to identify who would present, who will contact them. Then when they will present.
@@ Line 43: / Line 44: @@
 * [Snavely] Built on LOCKSS, so data integrity assurances are provided by robust networked software model augmented to commodity hardware and storage. Federated nature provides integrity assurance but also a lack of central control in that the accidental loss of multiple caches is unlikely but e.g. scheduled maintenance or upgrades could coincidentally collide.
-====Chronopolis====
-# ...
-====MicroSoft Azure====
-# ...
-====Amazon S3/EC2====
-# ...
-==Questions for Member Institution Implementations of Large Scale Storage Architectures==
-#What is the particular preservation goal or challenge you need to accomplish? (for example, re-use, public access, internal access, legal mandate, etc.)
-#What large scale storage or cloud technologies are you using to meet that challenge? Further, why did you choose these particular technologies?
-#Specifically, what kind of materials are you preserving (text, data sets, images, moving images, web pages, etc.)
-#How big is your collection? (In terms of number of objects and storage space required)
-#What are your performance requirements? Further, why are these your particular requirements?
-#What storage media have you elected to use? (Disk, Tape, etc) Further, why did you choose these particular media?
-#What do you think the key advantages of the system you use?
-#What do you think are the key problems or disadvantages your system present?
-#What important principles informed your decision about the particular tool or service you chose to use?
-#How frequently do you migrate from one system to another? Further, what is it that prompts you to make these migrations?
-# What characteristics of the storage system(s) you use do you feel are particularly well-suited to long-term digital preservation? (High levels of redundancy/resiliency, internal checksumming capabilities, automated tape refresh, etc)
-# What functionality or processes have you developed to augment your storage systems in order to meet preservation goals? (Periodic checksum validation, limited human access or novel use of permissions schemes)
-# Are there tough requirements for digital preservation, e.g. TRAC certification, that you wish were more readily handled by your storage system?
-===Responses to questions===
-====[[NDSA:Florida Center for Library Automation]]====
-====[[NDSA:HathiTrust]]====
-====[[NDSA:National Library of Medicine Responses]]====
-====[[NDSA:Penn State]]====
-====[[NDSA:WGBH Responses]]====
-. What is the particular preservation goal or challenge you need to accomplish?
-(for example, re-use, public access, internal access, legal mandate, etc.)
-NYU Libraries processes, enables access to, and preserves digital materials
-that come from both the NYU community and from collaborating partner
-organizations.
-. What large scale storage or cloud technologies are you using to meet that
-challenge? Further, why did you choose these particular technologies?
-Our current repository asset store runs on SunFire X4500 and X4540 storage
-servers. The data servers are mirrored and backed up to tape.  We are
-building a new repository system using Isilon storage arrays.  The Isilon
-arrays are mirrored, geographically distributed, and backed up to tape.
-We are not pursuing cloud storage at this time.
-. Specifically, what kind of materials are you preserving (text, data sets,
-images, moving images, web pages, etc.)
-Our preservation repository contains:
-- texts
-- images
-- video
-- audio
-. How big is your collection?
-(In terms of number of objects and storage space required)
-Combined existing and new repository systems:
-,594 objects
-TB  (63 TB of video)
-. What are your performance requirements? Further, why are these your
-particular requirements?
-The storage solution must be fast enough to support ongoing fixity,
-ingest, and access operations.
-. What storage media have you elected to use? (Disk, Tape, etc)
-Further, why did you choose these particular media?
-We use both disk and tape (for backup).
-The first and second copies are stored on disk.
-The third copy is stored on tape.
-We need content on disk because we serve some content directly from
-repository storage.  We also transcode to create access copies served
-through streaming media servers.
-. What do you think the key advantages of the system you use?
-The new system is under construction, but will be able to support various
-curation, publication, and preservation workflows. The underlying storage
-solution will allow us to easily add capacity to the system as needed.
-. What do you think are the key problems or disadvantages your system present?
-Ingest in our current system can be rather slow due to the ingest
-mechanisms in our application.
-. What important principles informed your decision about the particular
-tool or service you chose to use?
-We requested that the storage system be scalable, and ideally present a
-single filesystem to the applications using the storage.  Our systems group
-then researched multiple storage solutions.
-. How frequently do you migrate from one system to another?
-Further, what is it that prompts you to make these migrations?
-We are coming up on our first major migration in approximately four years.
-In addition to content in our preservation repository, we have a legacy
-content that is stored across multiple systems.  The new repository should
-allow us to aggregate and manage all of our content in a single system.
-. What characteristics of the storage system(s) you use do you feel are
-particularly well-suited to long-term digital preservation? (High levels
-of redundancy/resiliency, internal checksumming capabilities, automated
-tape refresh, etc)
-The Isilon storage system is designed to scale and includes configurable
-data integrity and data recovery features.
-. What functionality or processes have you developed to augment your storage
-systems in order to meet preservation goals? (Periodic checksum validation,
-limited human access or novel use of permissions schemes)
-Ongoing fixity checks and "completeness" checks.
-. Are there tough requirements for digital preservation, e.g. TRAC
-certification, that you wish were more readily handled by your storage
-system?
-Not at this time.
-====[[NDSA:Your Institution Here]]====
-==General Concerns==
-# confidential data
-# encrypted data
-# auditing
-# preservation risks
-# legal compliance
-# ...
 ==Solution Models and Environments==
@@ Line 210: / Line 60: @@
 |-
 |iRODS
-|
+|Offered as Service
-|
+|Deployed Locally
-|
+|Opensource
-|
+|Authentication Scheme
-|
+|Ingest Mechanism
-|
+|Export Mechanism
-|
+|Integrity/Validation Mechanism
-|
+|Replication Mechanism
-|
+|Content Administration Model (Federated, etc.)
-|
+|Tiering Support
+|Certifications
 |-
 |DuraCloud
@@ Line 230: / Line 81: @@
 |Checksum verified on ingest. On-demand checksum verification service.
 |Built-in support for cross-cloud replication.
-|
+|Local
-|
+|No
 |-
 |MetaArchive/GDDP
-|
+|Mixed - PLN service layer on top of local LOCKSS nodes
-|
+|Mixed - PLN service layer on top of local LOCKSS nodes
-|
+|No
-|
+|IP-based
-|
+|LOCKSS harvesting plugins
-|
+|LOCKSS web proxy
-|
+|LOCKSS distributed integrity checking
-|
+|LOCKSS P2P
-|
+|Single superuser across all nodes
-|
+|No
 |-
 |Chronopolis
-|
+|Yes
-|
+|No
-|
+|No
-|
+|SRB/Irods based
-|
+|SRB/Irods based
-|
+|SRB/Irods based
-|
+|Local checksums
-|
+|SRB/Irods
-|
+|Single superuser
-|
+|No
 |-
 |Microsoft Azure
-|
+|Yes
-|
+|No
-|
+|No
-|
+|Multiple
-|
+| .Net/WIF
-|
+| Multiple APIs, .Net
-|
+|Not known/propietary
-|
+|Not known/propietary
-|
+|Single super user
-|
+|Not known/propietary
 |-
 |Amazon S3/EC2
-|
+|Yes
-|
+|No
-|
+|Opensource
-|
+|Multiple, including certs; proprietary / limited delegation model
-|
+|Restful API's
-|
+|Restful API's
-|
+|Proprietary
-|
+|Proprietary
-|
+|Single superuser
-|
+|Yes
+|-
+|DVN/Safearchive
+|Yes
+|Yes
+|Opensource
+|Basic Auth/IP
+|Proprietary UI/Batch UI/LOCKSS harvesting plugins
+|OAI/Lockss harvesting/proprietary
+|LOCKS distributed integrity checks with additional TRAC auditing layer
+|LOCKS with additional TRAC-based provisioning layer
+|Federated & distributed
+|No
 |-
 |}

NDSA:Cloud Presentations: Difference between revisions

Latest revision as of 17:59, 29 November 2016

Presentation Schedule and Slides

People/Projects to Contact

General Questions for Cloud Service Presenters

Responses to questions

NDSA:iRODS direct responses

NDSA:DuraCloud direct responses

NDSA:MetaArchive/GDDP direct responses

Solution Models and Environments

Navigation menu

Search