NDSA:Tuesday, Mar 25, 2014: Difference between revisions

From DLF Wiki
Jump to navigation Jump to search
m (32 revisions imported: Migrate NDSA content from Library of Congress)
 
(20 intermediate revisions by 3 users not shown)
Line 1: Line 1:
Return to [[NDSA:Infrastructure Working Group#Meeting Schedules, Minutes and Agendas | Meeting Schedules, Minutes and Agendas]]
==Roster==
==Roster==
*Trevor Owens, Library of Congress
*Trevor Owens, Library of Congress
Line 4: Line 6:
*Barrie Howard, Library of Congress
*Barrie Howard, Library of Congress
*Dave MacCarn, WGBH  
*Dave MacCarn, WGBH  
*Jim Harper, PFA Inc.
*Jim Harper, Property Records Industry Association (PRIA)
*Joe Pawletko, New York University
*Joe Pawletko, New York University
*Martin Jacobson, U.S. National Archives and Records Administration  
*Martin Jacobson, U.S. National Archives and Records Administration  
Line 12: Line 14:
*Chelcie Rowell, Wake Forest University
*Chelcie Rowell, Wake Forest University
*Kat Bell, Dance Heritage Coalition
*Kat Bell, Dance Heritage Coalition
*Leah Prescott, Georgetown Law
*Leah Prescott, Georgetown University Law Center
*Ernest Bryant, U.S. National Archives and Records Administration
*Ernest Bryant, U.S. National Archives and Records Administration


Line 25: Line 27:


==Action Items==
==Action Items==
*Recommend what call the document, update it, and distribute to a larger group to solicit
*Draft a lightning talk proposal on the fixity check factsheet - Trevor
*Pull out thematic topics from blog posts, and share the ideas with the group - Trevor


==Discussion==
==Discussion==
''Update on 2015 National Agenda for Digital Stewardship''<br/>
''Update on 2015 National Agenda for Digital Stewardship''<br/>
Please read the 2014 National Agenda, if you haven't. This group had a call, and put together some ideas and passed them along to the Coordinating Committee. If you have anything to add, email you ideas back to the list, or pass them along to Trevor or Karen.
Please read the [http://www.digitalpreservation.gov/ndsa/nationalagenda/ 2014 National Agenda], if you haven't. This group had a call, and put together some ideas and passed them along to the Coordinating Committee. If you have anything to add, email you ideas back to the list, or pass them along to Trevor or Karen.<br/><br/>
''Fixity check factsheet''<br/>
''Fixity check factsheet''<br/>
There hasn't been a lot of feedback from blog post. This document was developed on a previous call where the group worked up a factsheet. There's been a lot of positive response from people, so we're in a position to call it done and release a first version. A general announcement should go out to NDSA-All providing another week or two for responses, and then we'll call it a day.
There hasn't been a lot of feedback from [http://blogs.loc.gov/digitalpreservation/2014/02/check-yourself-how-and-when-to-check-fixity/ blog post about the fixity check document]. This document was developed on a previous call where the group worked up a [http://blogs.loc.gov/digitalpreservation/files/2014/02/NDSA-Checking-your-digital-content-Draft-2-5-14.pdf draft factsheet]. There's been a lot of positive response from people, so we're in a position to call it done and release a first version. A general announcement should go out to NDSA-All providing another week or two for responses, and then we'll call it a day.<br/><br/>
''Update on NDSA Storage Survey report''<br/>
''Update on NDSA Storage Survey report''<br/>
Once it comes together, we may need some help writing the actual report. Does anyone want to join the group and write that report? It will still be awhile. We're updating it from last time, in about 2 - 3 months. Leah would like to participate. Once the charting stuff is finished. We'll do it again in 2015. It takes awhile to get the analysis done. When the data comes back, and is in shape we'll move forward.
It's taking a little time to get the data into shape and the analysis done. Once it comes together, someone needs to lead writing the actual report. The previous report provides a foundation, so the writing will involve updating from last time. It will still be awhile to get the charting stuff finished, so maybe in about 2 - 3 months. Leah Prescott will help lead the writing. The next storage survey will be done again in 2015.<br/><br/>
''Ideas for potential speakers''<br/>
''Ideas for potential speakers''<br/>
ArchivesSpace was well attended, and awaiting interview responses. of interest to infrastructure people, and NDSA people. DPN; we did have a call. Maybe someone who is an implementer can speak. Mark Leggett from Islandora spoke recently. Olive Library Project from CMU, emulation and virtualization as a service. A service to LTO tape from 4 to 6. Object store open source like SEF nodes could be interesting. Amazon is built on top of object stores, so there is a serious of projects that like that model of interfacing with your storage over those protocol. Share things over the list
The [http://www.archivesspace.org/ ArchivesSpace] webinar was well attended, and Trevor is awaiting interview responses from Brad. What topics are of interest to infrastructure people, and the NDSA as a whole? There has been a presentation on [http://www.dpn.org/ DPN], but maybe someone who is an implementer can speak. Mark Leggott spoke recently about [http://islandora.ca/ Islandora]. One potential topic is the Olive Library Project from Carnegie-Mellon, which is providing emulation and virtualization as a service. A service to LTO tape from 4 to 6. Open source Swift nodes could be an interesting topic. Amazon is built on top of object stores, and there are a number of projects, e.g., [https://swiftstack.com/openstack-swift/architecture/ SwiftStack], CEPH [http://ceph.com/] that like that model of interfacing with your storage. If anyone has any ideas, please share things over the list<br/><br/>
''Digital Preservation 2014 meeting''<br/>
''Digital Preservation 2014 meeting''<br/>
We had over 80 proposals. Mark your calendars for July 22-24. Would it be interesting to have a face to face. We could meet up for happy hour. A lot of the SIGs would meet for breakfast before. It would be good to meet face to face for those who will be there. Lightning Talk on the fixity check factsheet - Trevor
The call for proposals yielded over 80 proposals. The meeting takes place in the DC Metro Area from July 22-24, so mark your calendars. If it would be interesting to have a face to face, that can be arranged. The group can ask for meeting space, or just meet up for happy hour. A lot of special interest groups meet for breakfast before the program starts. There was consensus that it would be good to meet face to face for those who will be attending.<br/><br/>
''Open source software in digital preservation projects (interview series)''<br/>
''Open source software in digital preservation projects (interview series)''<br/>
Pull out some thematic things from the blog posts, and share with the group - Trevor
Trevor will pull out some thematic things on from the blog posts, and share the ideas with the group.<br/><br/>
''Future directions<br/>
''Future directions<br/>
An open discussion followed where the caller's discussed their local set up:
An open discussion followed where the caller's discussed their local set up:
*Jim Harper, PFA Inc, local gov't in backing up their records, bp in preservation electronic records. As industry changes, tech changes and new talen wants to do things differently and how we monitor these changes. What we do it, and how do we sustain it. Great variety of things. Document the exchange of property
*PRIA works with local governments in backing up their records, and is interested in learning about best practices in the preservation electronic records. There are industry and technology changes, and new talent wants to do things differently, so they need to keep up and figure out how to monitor these changes and figure out how to be sustainable. PRIA does a variety of things, but their core business is to help preserve the documentation of the exchange of property.
*Leah Prescott at Georgetown Law at beginning of server to store bagged files and METS records for metadata and workflow process for digital content. virtual storage in a server farm. it's not something the Law Library has done before. Working with born-digital procedures for WRLC and main campus to acquire a DAMS.
*Georgetown Law is at the beginning implementing a new server to store bagged files and METS records for metadata, and document the workflow process for digital content. They basically have virtual storage in a server farm. It's not something the Law Library has done before. They are also working on developing procedures for born-digital content with the [http://www.wrlc.org/ Washington Research Library Consortium (WRLC)], and the Georgetown main campus to acquire a DAMS.
*Joe Pawlekto, NYU, in house digitization and a lot of images, audio, video, using BagIt and usin git at upload. Amazon storage, and microservices approach to fixity checks. Re-engineering message architecture, and an event logger to log things as they happen. Can talk about this in a couple of months
*NYU doing a lot of in-house digitization, and has a lot of images, audio, and video. They are using [http://en.wikipedia.org/wiki/BagIt BagIt], and Git at upload, plus Amazon storage, and a micro services approach to fixity checks. They are currently re-engineering their message architecture to include an event logger to log things as they happen. Joe could talk about this in a couple of months.
*Kevin, NARA is re-architecting ERA. Digital processing environment, preprocessing of materials that comes in prior to putting into a repository. A cloud-based staging area with some tools. Need to find out what they can share.
*NARA is re-architecting its electronic records archive (ERA), but can't disclose any details at the moment. They will be looking at their digital processing environment, and pre-processing materials that come in prior to putting into a repository. The ERA may include a cloud-based staging area with some tools. Kevin will find out what he can share, and get back to the group.
*Kat at Dance Heritage, Dance digitization is implementing LTO 6, but can't share publicly. We're digitizing through hubs that are in NY, DC, and SF. Unique moving image materials and creating preservation copies and access nodes. Dave Rice is main technical consultant, and best person to talk to. QC tools into final stage of dev bootcamp tomorrow in SF. BayVC stuff on NEH grant. Digital management and tools, got an NEH P&A grant. Lauren just got picked up by the Library.
*Dance Heritage Coalition does a lot of digitization for their partners, and is looking into implementing LTO-6 tape drives and can't yet share details publicly. Digitizing happens through hubs that are in DC, NY, and SF. There is a lot of unique moving image materials, and they are creating preservation copies and access nodes. [https://www.bavc.org/dave-rice Dave Rice, BAVC,] is main technical consultant, and the best person to talk to. He has developed some quality control tools through an NEH grant, and they're getting into the final stage of the project. A development bootcamp is being held in San Francisco on March 26. BAVC received the [https://www.bavc.org/BAVC-awarded-NEH-preservation-grant grant from NEH].
*Dave, WGBH, NEH digital preservation grant to build a Hydra stack on fedora, and wrapping it up. Wanted to see if they could build something to handle large files, and have it replicated. Built off thing at Penn State, but accommodate their needs. Will take all file tapes. Run into challenges managing large files, and moving things around. Were using proprietary tape robot for years, but tested the Hydra implementation. Managing the expectation of the user cause you're not going to get that 100MB file back immediately. Re-working their workflow. Instead of relying robot system, go back to a vault. Pull things from the archive, pull the LTO tape, and can pull the file back to their computer. Very difficult to put a lot of money on infrastructure. Give a talk on down the road. The Hydra system works, and they have total control over the code. fedora 4 has come out. How do you migrate from 3 to 4, and what does it offer?
*WGBH, received an NEH digital preservation grant to build a Hydra stack on fedora, and are wrapping it up. They wanted to see if they could build something to handle large files, and have it replicated. It was modeled off a thing at Penn State, but needed to accommodate WGBH's specific needs. They have run into some challenges managing large files, and moving things around. Prior to this project they were using a proprietary tape robot for years, but then tested the Hydra implementation. Managing the expectations of users has been a big lesson learned because you're not going to get a 100GB file back immediately. They are re-working their workflow. Instead of relying on the robot system, they moving back to using a vault. People will have to go pull LTO tape drives from the archive so that a particular file can be pulled back to the user's computer. They have found it's very difficult to put a lot of money on infrastructure. The Hydra system works, and they have total control over their code. fedora 4 has come out, and they're thinking about how you migrate from 3 to 4, and what does 4 offer? They can give a talk on down the road.
*Trevor has been working with best editions statement on software, and LC may be putting out some format guidance to come out in the near future. It'll be broadly distributed.
*The Library of Congress has been working on best editions statements regarding the deposit of software, and may be putting out some format guidance in the near future. It'll be broadly distributed.
*Shawn, MSU, they just sent Media Preserve 40 VHS and discussed what mezzanine. They have a fedora repository, and store on a SAN. Tried Archivematica with fedora. Pretty much an Islandora shop. 12 TB of data with mezanine files. Drupal for access on top of Archivematica, then lost a key staff member and has derailed things.
*Michigan State University just sent Media Preserve 40 VHS tapes to reformat and receive preservation masters and mezzanine formats. They have a fedora repository, and store on a SAN. They tried [https://www.archivematica.org/wiki/Main_Page Archivematica] with fedora underneath and Drupal on top for access. They are now pretty much an Islandora shop. They hold 12 TB of data with mezzanine files. They lost a key staff member, which derailed things for awhile.
*Martin, CSU, sent out Digital POWRR grant. They and others have issues for access and preservation, espectially money. Tried to look at tools that are available out there either OS or for purchase. Five different institutions, different sizes, different constituencies. Tested: DuraCloud; MetaArchive; Archivematica; Preservica. Take results and use them going forward. Not sure if this will be consortially addressed. Law passed for state universities to have an open access mandate. Mandate to store and provide pertetual access.
*Chicago State University has been working on the [http://digitalpowrr.niu.edu/ Digital POWRR project], funded by IMLS. They, and other small- to medium-sized institutions, have issues for access and preservation, especially funding. They looked at tools that are available either as open source, or for purchase. Five different institutions, of different sizes and constituencies participated in the project. They have tested DuraCloud, MetaArchive, Archivematica, and [http://preservica.com/ Preservica]. They are building capacity and knowledge, and will use what they learned going forward. They are not sure if any next steps will be consortially addressed. A State of Illinois legal mandate to provide open access to research articles, [http://www.ilga.gov/legislation/publicacts/fulltext.asp?Name=098-0295 Open Access to Research Articles Act (Public Act 098-0295)], is one driver for the work of the Digital POWRR project.


==Documents==
==Documents==
[[File:NDSA Fixity Check Project Concept Draft v6 5.pdf]]
[[File:NDSA Fixity Check Project Concept Draft v6 5.pdf]]
CSU is part of an IMLS grant working on tools for DP for small to mid-sized academic institutions. They finished the tool testing, and are aobut to finish workshops? Digital POWRR
The Fixity Check factsheet is missing, add as #2.
The call for the summer meeting has closed, but if there are any requests send them to Trevor, #5.

Latest revision as of 15:20, 11 February 2016

Return to Meeting Schedules, Minutes and Agendas

Roster

  • Trevor Owens, Library of Congress
  • Karen Cariani, WGBH
  • Barrie Howard, Library of Congress
  • Dave MacCarn, WGBH
  • Jim Harper, Property Records Industry Association (PRIA)
  • Joe Pawletko, New York University
  • Martin Jacobson, U.S. National Archives and Records Administration
  • Shawn Nicholson, Michigan State University
  • Kevin McCarthy, U.S. National Archives and Records Administration
  • Martin Kong, Chicago State University
  • Chelcie Rowell, Wake Forest University
  • Kat Bell, Dance Heritage Coalition
  • Leah Prescott, Georgetown University Law Center
  • Ernest Bryant, U.S. National Archives and Records Administration

Agenda

  1. Update on 2015 National Agenda for Digital Stewardship
  2. Fixity check factsheet
  3. Update on NDSA Storage Survey report
  4. Ideas for potential speakers - ArchivesSpace was well attended, and awaiting interview responses
  5. Digital Preservation 2014 meeting
  6. Open source software in digital preservation projects (interview series)
  7. Future directions

Action Items

  • Draft a lightning talk proposal on the fixity check factsheet - Trevor
  • Pull out thematic topics from blog posts, and share the ideas with the group - Trevor

Discussion

Update on 2015 National Agenda for Digital Stewardship
Please read the 2014 National Agenda, if you haven't. This group had a call, and put together some ideas and passed them along to the Coordinating Committee. If you have anything to add, email you ideas back to the list, or pass them along to Trevor or Karen.

Fixity check factsheet
There hasn't been a lot of feedback from blog post about the fixity check document. This document was developed on a previous call where the group worked up a draft factsheet. There's been a lot of positive response from people, so we're in a position to call it done and release a first version. A general announcement should go out to NDSA-All providing another week or two for responses, and then we'll call it a day.

Update on NDSA Storage Survey report
It's taking a little time to get the data into shape and the analysis done. Once it comes together, someone needs to lead writing the actual report. The previous report provides a foundation, so the writing will involve updating from last time. It will still be awhile to get the charting stuff finished, so maybe in about 2 - 3 months. Leah Prescott will help lead the writing. The next storage survey will be done again in 2015.

Ideas for potential speakers
The ArchivesSpace webinar was well attended, and Trevor is awaiting interview responses from Brad. What topics are of interest to infrastructure people, and the NDSA as a whole? There has been a presentation on DPN, but maybe someone who is an implementer can speak. Mark Leggott spoke recently about Islandora. One potential topic is the Olive Library Project from Carnegie-Mellon, which is providing emulation and virtualization as a service. A service to LTO tape from 4 to 6. Open source Swift nodes could be an interesting topic. Amazon is built on top of object stores, and there are a number of projects, e.g., SwiftStack, CEPH [1] that like that model of interfacing with your storage. If anyone has any ideas, please share things over the list

Digital Preservation 2014 meeting
The call for proposals yielded over 80 proposals. The meeting takes place in the DC Metro Area from July 22-24, so mark your calendars. If it would be interesting to have a face to face, that can be arranged. The group can ask for meeting space, or just meet up for happy hour. A lot of special interest groups meet for breakfast before the program starts. There was consensus that it would be good to meet face to face for those who will be attending.

Open source software in digital preservation projects (interview series)
Trevor will pull out some thematic things on from the blog posts, and share the ideas with the group.

Future directions
An open discussion followed where the caller's discussed their local set up:

  • PRIA works with local governments in backing up their records, and is interested in learning about best practices in the preservation electronic records. There are industry and technology changes, and new talent wants to do things differently, so they need to keep up and figure out how to monitor these changes and figure out how to be sustainable. PRIA does a variety of things, but their core business is to help preserve the documentation of the exchange of property.
  • Georgetown Law is at the beginning implementing a new server to store bagged files and METS records for metadata, and document the workflow process for digital content. They basically have virtual storage in a server farm. It's not something the Law Library has done before. They are also working on developing procedures for born-digital content with the Washington Research Library Consortium (WRLC), and the Georgetown main campus to acquire a DAMS.
  • NYU doing a lot of in-house digitization, and has a lot of images, audio, and video. They are using BagIt, and Git at upload, plus Amazon storage, and a micro services approach to fixity checks. They are currently re-engineering their message architecture to include an event logger to log things as they happen. Joe could talk about this in a couple of months.
  • NARA is re-architecting its electronic records archive (ERA), but can't disclose any details at the moment. They will be looking at their digital processing environment, and pre-processing materials that come in prior to putting into a repository. The ERA may include a cloud-based staging area with some tools. Kevin will find out what he can share, and get back to the group.
  • Dance Heritage Coalition does a lot of digitization for their partners, and is looking into implementing LTO-6 tape drives and can't yet share details publicly. Digitizing happens through hubs that are in DC, NY, and SF. There is a lot of unique moving image materials, and they are creating preservation copies and access nodes. Dave Rice, BAVC, is main technical consultant, and the best person to talk to. He has developed some quality control tools through an NEH grant, and they're getting into the final stage of the project. A development bootcamp is being held in San Francisco on March 26. BAVC received the grant from NEH.
  • WGBH, received an NEH digital preservation grant to build a Hydra stack on fedora, and are wrapping it up. They wanted to see if they could build something to handle large files, and have it replicated. It was modeled off a thing at Penn State, but needed to accommodate WGBH's specific needs. They have run into some challenges managing large files, and moving things around. Prior to this project they were using a proprietary tape robot for years, but then tested the Hydra implementation. Managing the expectations of users has been a big lesson learned because you're not going to get a 100GB file back immediately. They are re-working their workflow. Instead of relying on the robot system, they moving back to using a vault. People will have to go pull LTO tape drives from the archive so that a particular file can be pulled back to the user's computer. They have found it's very difficult to put a lot of money on infrastructure. The Hydra system works, and they have total control over their code. fedora 4 has come out, and they're thinking about how you migrate from 3 to 4, and what does 4 offer? They can give a talk on down the road.
  • The Library of Congress has been working on best editions statements regarding the deposit of software, and may be putting out some format guidance in the near future. It'll be broadly distributed.
  • Michigan State University just sent Media Preserve 40 VHS tapes to reformat and receive preservation masters and mezzanine formats. They have a fedora repository, and store on a SAN. They tried Archivematica with fedora underneath and Drupal on top for access. They are now pretty much an Islandora shop. They hold 12 TB of data with mezzanine files. They lost a key staff member, which derailed things for awhile.
  • Chicago State University has been working on the Digital POWRR project, funded by IMLS. They, and other small- to medium-sized institutions, have issues for access and preservation, especially funding. They looked at tools that are available either as open source, or for purchase. Five different institutions, of different sizes and constituencies participated in the project. They have tested DuraCloud, MetaArchive, Archivematica, and Preservica. They are building capacity and knowledge, and will use what they learned going forward. They are not sure if any next steps will be consortially addressed. A State of Illinois legal mandate to provide open access to research articles, Open Access to Research Articles Act (Public Act 098-0295), is one driver for the work of the Digital POWRR project.

Documents

File:NDSA Fixity Check Project Concept Draft v6 5.pdf