NDSA:March 24, 2014 Standards and Practices Working Group Notes

From DLF Wiki
Revision as of 13:01, 4 April 2014 by Winston Atkins (talk | contribs) (Created page with 'Return to Meeting Schedules, Minutes and Agendas == Participants on the Call == *Amy Kir…')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Return to Meeting Schedules, Minutes and Agendas

Participants on the Call

  • Amy Kirchhoff
  • Andrea Goethals
  • Carolyn Campbell
  • Hannah Frost
  • John Spencer
  • Joshua Sternfeld
  • Kate Murray
  • Kevin DeVorsey
  • Melitte Buchman
  • Midge Coates
  • Paula De Stefano
  • Robert Spangler
  • Winston Atkins

New Members

  • Don Chalfant (NARA)
  • Lynda Schmitz Fuhrig (Smithsonian Institution Archives)


News and Project Updates

  • Listserv change - Barrie Howard
    • Reminder that the listserv will be hosted at The Library of Congress enterprise domain, LISTSERV.LOC.GOV, rather than on the NDIIPP digital preservation domain, LIST.DIGITALPRESERVATION.GOV.
    • The address for this list will change to NDSA-STANDARDS@LISTSERV.LOC.GOV
  • Call for participation for Digital Preservation 2014: **Contact Andrea if you are attending and want to meet.
  • National Agenda: Contact Andrea, Kate, or Barrie if you have issues to add.
  • Digital video exploration - Kate Murray
    • Kate has publicized this subgroup's work to NDSA and is pleased that members from other working groups are interested in participating.
    • The group developed a preliminary list of the biggest issues we face in dealing with digital video, and in its 31 March meeting, will use that list to develop a one-question survey.
    • The list is available at the NDSA:Digital video exploration page.
  • PDF/A-3 Document (published!) - Kate Murray
    • The document has generated many questions, so Don Chalfant and Kevin DeVorsey have joined the call for the discussion.
    • The report has also generated a lot of discussion within PDF/A community.
  • Fixity document and blog posts - Kate Murray
    • Blog post to be shaped into a more formal document
    • Kate's blog post on fixity in video generated good discussion.
  • Metadata packaging discussion - Andrea Goethals
    • Last meeting included a discussion on metadata packaging.
    • Some interest in using the discussion to create a Signal blog post, with a paragraph from each of the institutions who spoke up. This needs a coordinator to develop a guest post, though.
    • No volunteers came forward, so the blog post was tabled.
  • Self-assessment and audit project – Andrea Goethals
    • Archivematica is now hosting the Drupal-based self-assessment tool.
    • Additional activity to begin in April.


Discussion: Email Formats and Preservation

  • Background
    • Kate will write blog post on preserving email for the 4 April issue of Signal. It will include high-level information from today’s discussion.
    • It will make the point that email messages are not typical formats; they are more like web-based or WARC content.
    • Digitalpreservation.gov’s Format Description Categories includes two email formats in the Texts category, with five more descriptions in the wings. The complexity of email formats will soon lead to creation of a separate category for email, and the extant descriptions will be moved there from Text.

NARA's New Guidance (Kevin and Don)

  • NARA hase developed new approach to email (Capstone) which transferred entire bodies of content, rather than requiring selection of individual messages. Consequently, NARA faces processing large number of messages at once.
  • They have identified formats they feel best suited for aggregation and for individual messages:
    • PST and MBOX for aggregation
    • MSG, EML, XML, MBOX for individual messages.
  • In addition, NARA must address accessioning messages from Lotus Notes, which is used heavily in the classified sections of the government.
    • Notes is particularly troublesome because it is proprietary, and with a limited number of export options.
  • NARA's approach to deciding which formats to ingest made them address several complex issues, including:
    • What is normal?
      • There are many formats in use;
      • Should we let odd formats disappear if they were not used for permanent records; and
      • How to approach the long time horizon, during which a format may be used, become superseded, and only afterwards transferred to NARA.
  • NARA’s new guidance to agencies will help NARA maintain header information
    • Previous guidance on submitting email was not sufficient, so the new guidance is more prescriptive.
    • This is possible because NARA has a new technical team that can build on existing technical guidance; it intends the new guidance to be more detailed and able to base procedures on the technical infrastructures agencies are most likely to employ. This should lead to more predictable deposits.

Smithsonian Institution Archives email accessioning (Lynda)

  • Currently: All of the Smithsonian Institution uses same system (Outlook), which has been a benefit for older accessions
    • The scale of accessions has grown sharply, though, and it is not unusual to receive large (2GB) file accessions.
    • Furthermore, content in Outlook's PST format can become corrupted easily.
    • The accessioning process runs email through a parser.
    • Tool: MessageSave converts PST files to MBOX: http://www.techhit.com/messagesave/.
    • Tool: The SIA is also working with Stanford on ePADD (Email Process Appraise Discover Deliver), a project funded by the NHPRC that will result in MBOX files going into a system that enables selection by archivists. More here

Harvard's Email Accessioning and Ingest (Andrea)

  • Harvard is using a tool similar to ePADD, which normalizes email from a variety of formats (e.g., Eudora, Mac mail, Thunderbird, and others)
  • Harvard's email ingest process has uncovered additional challenges, including:
    • Eudora dissociates attachments from the email message, so Harvard must develop means to rebuild info regarding attachments;
    • Email content is different
      • For the first time, Harvard has included personally sensitive data in repository;
      • This led to the re-architecture of the repository to accept and manage HRCI (high risk confidential information)
    • Currently, the are also developing the means to record pre-repository normalization events in PREMIS, which doesn't handle it well.

Future Meetings

  • Think about other topics you want to explore, or guest speakers you would like to invite.
  • Our discussions on video, packaging, email have led to deeper and interesting discussions.
  • Please send email to the list to suggest topics.


  • Next call: April 21 1:00 EDT