NDSA:PDF Exploration

From DLF Wiki

Back to Standards Working Group Main Page

Title of Activity or Project

NDSA PDF/A-3 Scoping Project

One Sentence Description:

NDSA PDF/A-3 Scoping Project working group members will research the pros and cons of using the PDF/A-3 standard as an all-purpose wrapper in different preservation scenarios, including use as an extension to PDF/A-1 and PDF/A-2 in circumstances for which those formats have been adopted or recommended and use as a wrapper for various digital asset/media types, such as textual, audio, video, photo, and GIS data.

Report Drafts

Final Subgroup Draft Version 12 was completed on Dec. 20, 2014 and sent to the Standards WG co-chairs for comment.

Review should be completed by early January at which point it will be sent to the NDSA Coordinating Committee for final review. Release target is February 2014.

Version 8 was finished on 11/12/13 and has been sent out for comment to the following:

  • NDSA Standards Working Group members
  • Johan van der Knijff, Johan.vanderKnijff@KB.NL
  • Andy Jackson, Andrew.Jackson@BL.UK
  • William Kilbride, william@dpconline.org
  • Mike Neubert (LC)
  • Ardie Bausenbach (LC)
  • Leslie Johnson (LC)
  • Theron (Ted) Westervelt (LC)
  • Duff Johnson, duff@duff-johnson.com
  • Leonard Rosenthol, lrosenth@adobe.com
  • Steve Levenson, Stephen_Levenson@ao.uscourts.gov
  • Reynold Schweickhardt, Reynold@mail.house.gov

Earlier Draft Version 6

Statement of the Problem and Goals for Addressing the Problem:

The single extension to PDF/A-2 in PDF/A-3 is the ability to embed files of any type within a PDF/A document. PDF/A-3 was designed to accommodate supplementary media files for text documents. Issues raised by this extension include:

  • Is PDF/A-3 appropriate as a de facto normalization wrapper format for some or all media types or in particular circumstances?
  • For circumstances where PDF/A-2 has already been deemed an appropriate preservation format (primarily for textual documents), what are the risks and opportunities offered by the ability to embed content in non-PDF formats?

The goal is to develop guidelines for the appropriate use of PDF/A-3 with respect to different scenarios that include both detailed technical information and a practical quick reference guide for end-users.

Strategic Value of Activity:

  • Improve understanding of best practices for using PDF/A-3 in digital preservation activities
  • Enhance consistency and improve long-term viability of digitally preserved content
  • Provide guidance to those considering PDF/A-3 as a long-term archiving format

Required Resources:

  • Time of working group members
  • Publishing venue(s)
  • Communication channels

Roadmap:

  1. Hold regular working group conference calls (monthly, between NDSA Standards WG calls)
  2. Draft document and review
  3. Invite broader NDSA member feedback
  4. Publish document (digitalpreservation.gov, others?)

Dissemination of Knowledge:

  • Publish report on digitalpreservation.gov
  • Write a blog post
  • Announce on NDSA member organization communication channels
  • Present at conferences that members (and non-members?) are attending

Signifiers of Success and Outcomes:

  • Completed guidelines document published on digitalpreservation.gov
  • Guidelines document referenced on related Wikipedia pages
  • Guidelines referenced in FDD (format description document) for PDF/A-3 [1]
  • Guidelines in use or recommended by NDSA participating organizations or others
  • Publication at other conferences/other journals

Questions to Ask and Answer

  • Talk about background (what is pdf/a-3 and how is it different from earlier versions of PDF/A)(Butch, plus Caroline's 2-pager)
  • Iterate categories of materials/use cases/concrete examples where it makes sense to use A-3 and other categories where it doesn't make sense. Example: if you're sending a video file don't put it in a PDF! If you had a certain kind of a journal article that had a static version of the spreadsheet in the doc but a malleable version embedded perhaps that argues for it. (Don, Kevin, Kate)
  • Risks to the format (scenarios in why this might be bad and why) (Sheila)
  • Possibilities of the format (scenarios in why this might be good and why) (Chris)
  • Have list of defined terms in our document. How do these relate to the terms in the ISO spec. Leverage NDSA Levels of Preservation glossary. Link to glossary.

PDF/A-3 Use Case Scenarios

A Template might include:

  • Actors
  • Actions
  • Example: Federal agency with a document management system puts an MPEG video file (and nothing else) into a PDF/A-3 file to store and then, later, to submit as an SIP (Submission Information Package) to NARA for long-term management.
  • Example: Publisher has a text-only article and puts it into a PDF/A-3 file, even though, in the past, the publisher used PDF/A-2. The article is then sent to library where it will be preserved for the long term.
  • Example: Publisher has an article that includes a complicated table, "frozen" in place, and puts it into a PDF/A-3 file, along with the Excel file from which the table was generated, in order to make it easier for a future researcher to have a malleable version of the table for use when writing another article on the same subject.
  • Example: Data creator has a digital map, a report, a database, digital photos, and detailed metadata that comprise a whole and wants to archive these together for the long-term.
  • Example from Luratech Webinar used to show primary intent of PDF/A-3: PDF/A document with diagram based on data, with embedded spreadsheet associated with diagram, metadata associated with subsection of document, source word-processing file, and audio rendering of the document (perhaps for accessibility).
  • See case #1 from Luratech Webinar: Scanned documents, with the scanned image as the main PDF/A content, with native metadata in XML embedded.
  • Use case #2 from Luratech Webinar: "Hybrid archiving" used when document in its active life cycle, further versions might be created. Create PDF/A-3 for archive-ready rendition and embed the document in its native (e.g., word-processor) format. Built in to a standard workflow, this would leave documents "archive ready" at all times.
  • Use case #3 from Luratech Webinar: Human-readable invoice with embedded data marked up in CEN Core Invoice Standard (XML).

Members

  • Caroline Arms, Library of Congress (caar@loc.gov)
  • Don Chalfant, NARA (Donald.Chalfant@nara.gov)
  • Kevin DeVorsey, NARA (Kevin.DeVorsey@nara.gov)
  • Chris Dietrich, National Park Service (chris_dietrich@nps.gov)
  • Carl Fleischhauer, Library of Congress (cfle@loc.gov)
  • Butch Lazorchak, Library of Congress (wlaz@loc.gov)
  • Sheila Morrissey, Ithaka (Sheila.Morrissey@ithaka.org)
  • Kate Murray, Library of Congress (kmur@loc.gov)

Calls and Notes

Call information:

  • Call-in toll-free number (US/Canada): 866-469-3239
  • Participant access code: 21408589

Next call: November 11, 2013 at 1:00pm ET

Notes:

NDSA:November 1, 2013 Call

NDSA:August 8, 2013 Call

NDSA:May 31, 2013 Call

NDSA:March 29, 2013 Call

NDSA:March 25, 2013 Call

NDSA:February 19, 2013 Call

NDSA:January 22, 2013 Call

Background Materials

Possible future actions

  • Set up future call with Duff Johnson of the PDF Association
  • Track PDF Validator activity
  • Once charter is reviewed by main NDSA Standards Group, extend participation call to
  • Set up calls with Steve Levinson (U.S. Courts) and Leonard Rosenthal (Adobe)
  • Extend invitation to join beyond active NDSA participants, e.g. to LC staff involved in Best Edition choices.