NDSA:February 24, 2014 Standards and Practices Working Group Notes

From DLF Wiki

Return to Meeting Schedules, Minutes and Agendas

Participants on the Call

  • Barrie Howard
  • Amy Kirchhoff
  • Andrea Goethals
  • Butch Lazorchak
  • Carol Kussmann
  • Carolyn Campbell
  • David Lake
  • Deborah Kempe
  • Dina Sokolova
  • Felicity Dykas
  • Kate Murray
  • Mariella Soprano
  • Mary Vardigan
  • Michelle Paolillo
  • Midge Coates
  • Rosie Storey
  • Winston Atkins

New Members

Rosie Storey (Library of Congress), who is a software developer encouraged to participate by colleague Kate Zwaard.

Project Updates

Video deep dive

Digital video conversation from a couple of week's ago went well. There are some notes here. Kate will be putting together plans for next steps, and has sent out a Doodle poll for scheduling the next call to look at suggestions. The NDSA:Infrastructure Working Group is also interested in opening up this topic for their participants. The WebEx will be posted to the whole group, whether you opt to answer the poll or not.

Wikipedia Signal post

The Wikipedia Signal post is up! Go read it here and make comments. Andrea discusses the project a little, challenges, achievements, and the folks who contributed a lot of effort, especially our colleagues at Columbia. There is an explicit ask in the blog post about whether anyone wants to take over leading the project. If you're interested, please contact Steven Paul Davis.

PDF/A-3 Document (almost done, minor fixes)

The PDF A/3 document was published last week, and Butch Lazorchak led the effort. Read his Signal blog post Signal here. Also on by-line were Sheila Morrissey and Caroline Arms. Lots of good feedback on Twitter. Its highly recommended that we all read it, and make comments to Butch, wlaz at loc dot gov, or on blog. Kudos to all authors!

Fixity Document

A Signal blog post about the fixity factsheet was published on 2/7, and is available from here. It's still an informal document at this point, and the next steps are to encourage folks to share the it, collect comments, refine, and post as a more formal document to NDSA website. As a follow-up to factsheet, there will be a blog post about different roles and placement of fixity data in media files. This work is a little different than with "documents". If you have any comments, Kate would love to hear them.

2015 National Agenda

The Alliance has begun the process of updated the National Agenda. This year there's an opportunity to tap into the wealth of experience that resides in our working groups. The Coordinating Committee has launched a concerted effort to get feedback from working groups and the members. Read through 2014 document and provide feedback by mid-March on different sections addressed there. Not all the sections map to named working groups, and standards-related issues are a bit buried in the sections, e.g., infrastructure. Contact Andrea, Barrie, Kate, or Coordinating Committee with comments. Input on overarching issues would be useful, including exemplary initiatives and projects.

Self-assessment and Audit Project

Archivematica hosting a Drupal based app that institutions can download from here. It helps institutions work through a TRAC self-assessment. The next focus of the project will be getting back to guidance, and examples of how to go through the process. The team wants to have something to share by the annual meeting in July.

Group discussion of content and metadata packages (Packaging forms/SIP components/metadata, e.g. MXF AS-07, METS)

A couple of calls ago, discussed interests of attendees on the group. Looked for overlap. First was video and led to those discussions. Next up was metadata packaging.

Rosie from LC: Using BagIt. Where possible, try and get content providers to use BagIt. Usually providers are happy to do so. In it BagIt spec, there is a baginfo.txt file where you can put MD. There is a manifest with fixity info. When bag received at LC, have an inventory DB. The baginfo.txt file feeds into the inventory DB. Users can browse in that DB via a web application to find and get to the files. Keep that baginfo.txt file on the system in case the DB get's lost. Keeping it in sync is one of the problems. Working on a related project to address BagIt spec version 2 ("baguette"). Enhanced ability to keep MD at root level of bag and across the entirety of bag. Also trying to come up with a less intrusive way to keep the content on disk to keep fixities and MD embedded. Maybe over next 6 months to a year. Will be posting to Google Digital Curation Group. Amy K. asked if it would be possible to get into BagIt redesign during the design phase.

Andrea from Harvard: In past have mixture of some content modeled using METS (mainily page turn objects) and lots of other content they don't use METS for. Putting second generation DB in place and thinking about how to treat MD. Making it a first citizen of repository. Migrating all MD this year. Serializing MD to METS files. No standards around the first time they built their DB. Changing all schemas, now, and changed data model to be more consistent with PREMIS data model (objects that have files, rather than just files). MD coming in from lots of data sources. Old files, catalog records, running FITS tool against all files and that is providing technical and format MD. Challenge is to figure out where to draw the line of what to do now and what to put off to the future. Can't do everything right now.

David Lake from NARA and much of what Andrea discussed resonates with him. Currently in the process of developing a new SIP specification for the ERA project. The original SIP created was from back in 2005/6. Quite limitted in functionality, but it did much of the basics (capture fixity, created manifest, etc.). Homegrown. Limitted in ability to provide MD at various levels. An opportunity to re-examine what they are doing in this area. Doing a lot of work to take processing capabilities out of the DRA system and put them in a more flexible environment. Need a multitude of tools to process different datasets as they come in. Doing some refactoring. Repository has an XML file of MD for every object in the repository. It is based on PREMIS. Limitations with that. Thinking about SIP spec right now. Using METS, heavily, in construction of SIPs. In SIP, assuming they'll be making changes on the backend with the repository schema. Biggest challenges are accomodating massive collections that are expecting to come in -- volume will be a challenge. Especially, being able to take in MD across the different types of formats they receive, especially records in hierarchical format. How to model those complex types of records in SIP in a way that repository will be able to parse out and manage MD and relationships between the files once they get into the repository. Planning to have a draft produced for a pilot project later this year. May take it to wider community for constructive review.

Carolyn Campbell, Georgetown University Law Library -- use METS, now. Want to use METS in a DSpace repository. Current project. Trying to get METS integrated into DSpace repository and not sure how to do it.

Amy -- Portico takes in content packaged in many different ways and normalizes everything to its content model, which was informed by PREMIS, DIDL and METS. The Portico MD closely follows the 6 part content model: content type, content set, archival unit, content unit, functional unit, and storage unit. Portico prefers to export content in BagIt, using rsync where possible (to leverage its built in fixity check functionality). Portico imposes a specific directory structure on the payload of the bags. Amy will send round some pictures of the Portico content model.

A question was asked on whether in Portico's experience it was possible to put all the DMD into the baginfo.txt file. Amy explained that at Portico we do not try to. Portico expects its content recipients to read the NLM/JATS XML files and/or the PMD file for DMD and article structural information. We do impose a structure on the payload and provide an XML file with each journal directory that includes some business information such as journal title, that does not always come in the NLM/JATS files.

Question: Do we want to blog post about this, Andrea asks? Or just keep it as an internal discussion? Or discuss further. Folks who spoke will write up a paragraph.

LISTSERV and WebEx Migration

The Library of Congress provides some communications infrastructure in its role as NDSA Secretariat. There are some changes coming, and the Library will be moving the various LISTSERV email lists from the domain list.digitalpreservation.gov to listserv.loc.gov. Reminders will be sent out in advance of the migration, in case people have set up rules in their email account. The web conferencing service will change, too. Currently we use Cisco WebEx, and are not sure what the new service will be yet. Library staff are evaluating choices, and again will notify everyone of the migration in advance of the switch.

Action Items

  • Amy from ITHAKA will send round a picture of the Portico content model
  • Working group members who spoke about their experiences with MD and packaging will write up a paragraph