NDSA:Tuesday, April 26, 2016

From DLF Wiki
Revision as of 14:51, 20 May 2016 by Sschaefer (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

NDSA Infrastructure Working Group Call Tuesday, April 26, 2pm-3pm ET

In Attendance

Agenda Items

  1. Welcome new members
  2. Overview of new Oracle cloud storage offering (Art Pasquinelli)
  3. Comments on the DuraSpace and Lyrasis intent to merge
  4. Fixity conversation with Standard Working Group
  5. OAIS revision - NDSA working group on this?
  6. Other Updates or outstanding items from previous meeting?

Call-in: To join go to: https://www.uberconference.com/clir-dlf or Call 202-750-4186. No PIN needed.

Meeting minutes

There were 2 new members welcomed to the Infrastructure Interest Group:

1) Dave with DPN, which has 61 institutional members of which are mostly universities interested in long term preservation. They are a provider of long term storage.

2) Erin O'Meara at the Gates Archives in the Seattle area, that is funded by the Gates family and foundation. There are a handful of archivists. It is a born-digital focused archive. The organization has been around 5 years. Now that they have their organization situated, it is the right time to join NDSA.

Agenda item number 3: Comments on the Duraspace and Lyrasis intent to merge. Sibyl: Nick and Carol listed some really good questions concerning the statements that Duraspace and Lyrasis made. She is not sure of a good way to share the comments that we have.

Carol took the comments and organized them around the main points. Nick: sending out the link to the comments document to the mailing list.

Sibyl: Set a deadline for the return of the comments from the Infrastructure group members. One week is the deadline for any additional comments regarding the intent to merge document.

Another question: Is this the format that we want to submit the comments to Duraspace/Lyrasis?

One way to reconfigure the comment document: instead of including the entire statement from Duraspace and Lyrasis, paraphrase their original statement, and list the questions below the paraphrased text.

Nick: We can edit out some of the places where there is nothing to respond to and reduce some of the document.

Carol: will do the editing/paraphrasing of the document.

Sibyl: We can send out the finalized document to the email address listed on Lyrasis and Duraspace's original document.

Sibyl: Does anyone have a problem with Nick/Carol submitting it on behalf of the Infrastructure Interest Group?

No comment from any of the Infrastructure Interest Group members on the conference call.

Sibyl: Thanks to Carol for putting the document together and agreeing to do the editing on it.

Nick: Thanks to Carol as well, it is a great document.

Sibyl: Agenda item number 4: Fixity Conversation with Standard Working Group We are thinking of setting up a joint meeting on all interested parties with the Standard Working Group. We can send out a doodle poll to see how many folks can attend. Everyone who is interested can fill in their availability. Sibyl will get in touch with Aaron to set up that call.

Sibyl: Agenda item number 5: OAIS revision. We might want to do this in conjunction with the standards group. If we are interested in that, is that something that we should start initiating?

Euan: thinks that it might be more of something for the Standard Working Group.

Sibyl: We might want to comment on some parts of it. Sibyl will talk to Erin regarding collaboration.

Agenda item number 6: Anyone want to bring up anything additional?

Recap of action items: Respond to the Duraspace and Lyrasis intent to merge. Carol will format it.

Sibyl will set up a doodle poll about the fixity conversation and regarding feedback and assistance concerning the OAIS revision.

Presentation from Art Pasquinelli from Oracle:

Oracle has put a lot of investment into what is going on with archiving.

The service that they are offering is similar to Amazon S3

The amount of content that institutions are receiving is going up, from audio-visual (AV) content to information from sensors and instrumentation. He had a chart that shows the exponential growth of that information from 2013-2017. Audio-visual is on top of everyone’s mind with big files that no one knows how to handle. The content comes into a library unexpectedly.

Oracle is currently storing about 700 petabytes of data, as a cloud provider, that is mainly on the SaaS side. They have 4 archive place points; they are offering backup and archive services and storage as well:

(From the slide) 1) backup and archive 2) disaster recovery 3) content and big data storage 4) (didn’t catch the 4th item on the slide)

They offer a software appliance so that you can put data into the cloud.

They have data centers in multiple geographic locations interconnected as one public cloud.

The current pricing is $33 dollars a month, non-metered. The backup and recovery services are done through RMAN. The capabilities are baked into the database.

At $12,000 per petabyte per year, that opens up a different price point for disaster recovery, backup, and archiving. It gives you a different tier. It includes geographic dispersion as well.

He mentioned that many institutions moving from CAPEX to OPEX (capital expenditures to operating expenditures?).

Oracle is currently talking with Islandora, Preservica, DPN, AP Trust.

Oracle is very open with any permutations of how you may want to use the service.

The backend of this Oracle service is RMAN, which is a familiar interface to many.

Pricing: .001 GB/mon, $12,000 per petabyte The system uses OpenStack Swift.

There is a possible wait time of up to 4 hour before restoration is completed.

How it works: the customer would encrypt the data before sending it to them. They do MD5 check on the encrypted file they receive.

Oracle has marketed the product initially to research and higher education customers. They do have free trials available. For higher education institutions, they have engaged their product managers to use the higher education institution as a beta site program. This product was announced last July.

There are data centers in Chicago and Ashburn in the US.

(The intention of this service is that it will function more like a dark archive). If the stored data is accessed often, once the level of access gets to around 10%, the price of the service does go up. Instead of $12,000 per petabyte, the cost will go up to $20,000-24,000 instead. You do have the option of looking at the items in storage and then move them into the archive.

You can replicate the data to both datacenters when ingesting, but the cost is double for that level of service.

In the next month, they will add the ability to move the ingested data from one data center to the other for a cost. Right now, you can ingest into any one data center but are not yet able to move the data back and forth between data centers.

Pricing Comparison: EMC2 $.053 GB/month IBM $.04 GB/month Amazon $.011 GB/month Oracle $.001 GB/month

Oracle bought Front Porch Digital, so everything is standards based for audiovisual content. They do have Front Porch Digital capability now.

You can set up two tiers for AV content, one for access and one for a dark archive

The archive cloud is coming soon

Cloud storage partners:

Commvault is done CloudBerry they are working on They are talking with Hydra, Islandora, etc. They need some guidance from the community on this.

Possible discussion points: If we have disk to disk or tape storage, they can work with us. If we are involved in areas where we want to put content overseas, they have other data centers in other areas of the world.

In terms of digital preservation, some companies do go out of business. If you have 50TB, you may want to put it in a combination of places such as Amazon, Oracle, and Google to hedge your bets. In terms of a data management plan, the cloud is a perfect place to keep an extra copy to meet your data management plan. The data is safe. You can keep a copy in the library, but the second copy is being migrated automatically by Oracle, so that provide some safety.

IT departments are sending data to Iron Mountain, they send petabytes to Iron Mountain currently. But with the Oracle service, data is retrievable within a max of 4 hours.

Oracle doesn’t have Internet2 currently, they are working on it.

They do have a port for iRODS. They expect the archive cloud to be up very soon.

Anyone using Oracle's web center, they can do the connection there to the Archive Cloud.


Oracle is looking for people who are looking for the price point, and they aren't scared with working with a new service.

This is the first public presentation given.

Question from Dave: Commodity internet is not a good transfer method for institutions. Where are they [Oracle] at for connectivity?

Art: They have fast connect options. They don't have Internet2 yet. They are working on it. Oracle does have the capability to send a ZFS box. You would load up to 400TB on, then send it back to them and then Oracle would load it. Amazon offers the same service as well. No tape loading ability yet though. Anything that can be put in or taken out of an Equinix data center can be done.


Action Items

Next Working Group Meeting

Tuesday May 31, 2-3pm ET

Discussion