NDSA:PDF Exploration
Back to Standards Working Group Main Page
Title of Activity or Project
NDSA PDF/A-3 Scoping Project
One Sentence Description:
NDSA PDF/A-3 Scoping Project working group members will research the pros and cons of using the PDF/A-3 standard as an all-purpose wrapper in different preservation scenarios, including use as an extension to PDF/A-1 and PDF/A-2 in circumstances for which those formats have been adopted or recommended and use as a wrapper for various digital asset/media types, such as textual, audio, video, photo, and GIS data.
Statement of the Problem and Goals for Addressing the Problem:
The single extension to PDF/A-2 in PDF/A-3 is the ability to embed files of any type within a PDF/A document. PDF/A-3 was designed to accommodate supplementary media files for text documents. Issues raised by this extension include:
- Is PDF/A-3 appropriate as a de facto normalization wrapper format for some or all media types or in particular circumstances?
- For circumstances where PDF/A-2 has already been deemed an appropriate preservation format (primarily for textual documents), what are the risks and opportunities offered by the ability to embed content in non-PDF formats?
The goal is to develop guidelines for the appropriate use of PDF/A-3 with respect to different scenarios that include both detailed technical information and a practical quick reference guide for end-users.
Strategic Value of Activity:
- Improve understanding of best practices for using PDF/A-3 in digital preservation activities
- Enhance consistency and improve long-term viability of digitally preserved content
- Provide guidance to those considering PDF/A-3 as a long-term archiving format
Required Resources:
- Time of working group members
- Publishing venue(s)
- Communication channels
Roadmap:
- Hold regular working group conference calls (monthly, between NDSA Standards WG calls)
- Draft document and review
- Invite broader NDSA member feedback
- Publish document (digitalpreservation.gov, others?)
Dissemination of Knowledge:
- Publish report on digitalpreservation.gov
- Write a blog post
- Announce on NDSA member organization communication channels
- Present at conferences that members (and non-members?) are attending
Signifiers of Success and Outcomes:
- Completed guidelines document published on digitalpreservation.gov
- Guidelines document referenced on related Wikipedia pages
- Guidelines referenced in FDD (format description document) for PDF/A-3 [1]
- Guidelines in use or recommended by NDSA participating organizations or others
- Publication at other conferences/other journals
Questions to Ask and Answer
A Google doc has been set up to provide an environment for shared work. All group members should be able to edit the document, but if you have trouble drop a note to Butch. The next draft should be finalized in mid-September 2013.
- Talk about background (what is pdf/a-3 and how is it different from earlier versions of PDF/A)(Butch, plus Caroline's 2-pager)
- Iterate categories of materials/use cases/concrete examples where it makes sense to use A-3 and other categories where it doesn't make sense. Example: if you're sending a video file don't put it in a PDF! If you had a certain kind of a journal article that had a static version of the spreadsheet in the doc but a malleable version embedded perhaps that argues for it. (Don, Kevin, Kate)
- Risks to the format (scenarios in why this might be bad and why) (Sheila)
- Possibilities of the format (scenarios in why this might be good and why) (Chris)
- Have list of defined terms in our document. How do these relate to the terms in the ISO spec. Leverage NDSA Levels of Preservation glossary. Link to glossary.
PDF/A-3 Use Case Scenarios
A Template might include:
- Actors
- Actions
- Example: Federal agency with a document management system puts an MPEG video file (and nothing else) into a PDF/A-3 file to store and then, later, to submit as an SIP (Submission Information Package) to NARA for long-term management.
- Example: Publisher has a text-only article and puts it into a PDF/A-3 file, even though, in the past, the publisher used PDF/A-2. The article is then sent to library where it will be preserved for the long term.
- Example: Publisher has an article that includes a complicated table, "frozen" in place, and puts it into a PDF/A-3 file, along with the Excel file from which the table was generated, in order to make it easier for a future researcher to have a malleable version of the table for use when writing another article on the same subject.
- Example: Data creator has a digital map, a report, a database, digital photos, and detailed metadata that comprise a whole and wants to archive these together for the long-term.
- Example from Luratech Webinar used to show primary intent of PDF/A-3: PDF/A document with diagram based on data, with embedded spreadsheet associated with diagram, metadata associated with subsection of document, source word-processing file, and audio rendering of the document (perhaps for accessibility).
- See case #1 from Luratech Webinar: Scanned documents, with the scanned image as the main PDF/A content, with native metadata in XML embedded.
- Use case #2 from Luratech Webinar: "Hybrid archiving" used when document in its active life cycle, further versions might be created. Create PDF/A-3 for archive-ready rendition and embed the document in its native (e.g., word-processor) format. Built in to a standard workflow, this would leave documents "archive ready" at all times.
- Use case #3 from Luratech Webinar: Human-readable invoice with embedded data marked up in CEN Core Invoice Standard (XML).
Members
- Caroline Arms, Library of Congress (caar@loc.gov)
- Don Chalfant, NARA (Donald.Chalfant@nara.gov)
- Kevin DeVorsey, NARA (Kevin.DeVorsey@nara.gov)
- Chris Dietrich, National Park Service (chris_dietrich@nps.gov)
- Carl Fleischhauer, Library of Congress (cfle@loc.gov)
- Butch Lazorchak, Library of Congress (wlaz@loc.gov)
- Sheila Morrissey, Ithaka (Sheila.Morrissey@ithaka.org)
- Kate Murray, Library of Congress (kmur@loc.gov)
Calls and Notes
Call information:
- Call-in toll-free number (US/Canada): 866-469-3239
- Participant access code: 21408589
Next call: November 11, 2013 at 1:00pm ET
Notes:
Background Materials
- Library of Congress Sustainability of Digital Formats DRAFT PDF/A-3 format description document (FDD) COMMENTS PLEASE to caar@loc.gov and cfle@loc.gov
- Blog Post on PDF/A-3 on the Signal
- Sheila M. Morrissey, The Network is the Format: PDF and the Long-term Use of Digital Content, Archiving 2012, pg. 200-203 (2012)
- Ithaka comments on ISO 19005-3 draft
- Caroline's thoughts on PDF/A-3 circulated in late November, 2012
- Video of Webinar by Luratech on PDF/A-3 Nov 8, 2012. Includes uses cases and demos.
- Slides used for Luratech Webinar Nov 8, 2012. Includes uses cases and demos. Do not distribute.
- Digital Preservation Coalition (DPC) 2013-03-13 Workshop on PDF/A-3 Includes links to pressenters slides and to William Kilbride's comments
- Unofficial XMP notes from 2011 explorations by Caroline Arms -- Please do not distribute
Possible future actions
- Set up future call with Duff Johnson of the PDF Association
- Track PDF Validator activity
- Once charter is reviewed by main NDSA Standards Group, extend participation call to
- Set up calls with Steve Levinson (U.S. Courts) and Leonard Rosenthal (Adobe)
- Extend invitation to join beyond active NDSA participants, e.g. to LC staff involved in Best Edition choices.