NDSA:PDF Exploration: Difference between revisions

From DLF Wiki
Jump to navigation Jump to search
(→‎Signifiers of Success and Outcomes:: Added reference from FDD as outcome)
m (95 revisions imported: Migrate NDSA content from Library of Congress)
 
(47 intermediate revisions by 4 users not shown)
Line 5: Line 5:


==One Sentence Description:==
==One Sentence Description:==
NDSA PDF/A-3 Scoping Project working group members will research the pros and cons of using the PDF/A-3 standard as an all-purpose wrapper for various digital asset/media types including: textual, audio, video, photo, and GIS data.
NDSA PDF/A-3 Scoping Project working group members will research the pros and cons of using the PDF/A-3 standard as an all-purpose wrapper in different preservation scenarios, including use as an extension to PDF/A-1 and PDF/A-2 in circumstances for which those formats have been adopted or recommended and use as a wrapper for various digital asset/media types, such as textual, audio, video, photo, and GIS data.
 
==Report Drafts==
 
Final Subgroup Draft [[NDSA:Media: NDSA PDF_A3 Subcommittee Report_v12.doc| Version 12]] was completed on Dec. 20, 2014 and sent to the Standards WG co-chairs for comment.
 
Review should be completed by early January at which point it will be sent to the NDSA Coordinating Committee for final review. Release target is February 2014.
 
Version 8 was finished on 11/12/13 and has been sent out for comment to the following:
 
*NDSA Standards Working Group members
*Johan van der Knijff, Johan.vanderKnijff@KB.NL
*Andy Jackson, Andrew.Jackson@BL.UK
*William Kilbride, william@dpconline.org
*Mike Neubert (LC)
*Ardie Bausenbach (LC)
*Leslie Johnson (LC)
*Theron (Ted) Westervelt (LC)
*Duff Johnson, duff@duff-johnson.com
*Leonard Rosenthol, lrosenth@adobe.com
*Steve Levenson, Stephen_Levenson@ao.uscourts.gov
*Reynold Schweickhardt, Reynold@mail.house.gov
 
Earlier Draft [[NDSA:Media: NDSA PDF_A3 Subcommittee Report_v0 6.doc | Version 6]]


==Statement of the Problem and Goals for Addressing the Problem:==
==Statement of the Problem and Goals for Addressing the Problem:==
It is unclear whether PDF/A-3, which was designed to accommodate supplementary media files for text documents, is appropriate as a de facto normalization wrapper format for all media types. The goal is to develop guidelines for the appropriate use of PDF/A-3 with respect to different media types that includes both detailed technical information and a practical quick reference guide for end-users.
The single extension to PDF/A-2 in PDF/A-3 is the ability to embed files of any type within a PDF/A document. 
PDF/A-3 was designed to accommodate supplementary media files for text documents. Issues raised by this extension include:
 
* Is PDF/A-3 appropriate as a de facto normalization wrapper format for some or all media types or in particular circumstances?
* For circumstances where PDF/A-2 has already been deemed an appropriate preservation format (primarily for textual documents), what are the risks and opportunities offered by the ability to embed content in non-PDF formats?
 
The goal is to develop guidelines for the appropriate use of PDF/A-3 with respect to different scenarios that include both detailed technical information and a practical quick reference guide for end-users.


==Strategic Value of Activity:==
==Strategic Value of Activity:==
Line 40: Line 69:


==Questions to Ask and Answer==
==Questions to Ask and Answer==
*Talk about background (what is pdf/a-3 and how is it different from earlier versions of PDF/A)
 
*Iterate categories of materials/use cases/concrete examples where it makes sense to use A-3 and other categories where it doesn't make sense. Example: if you're sending a video file don't put it in a PDF! If you had a certain kind of a journal article that had a static version of the spreadsheet in the doc but a malleable version embedded perhaps that argues for it.  
 
*Risks to the format (scenarios in why this might be bad and why)
 
*Possibilities of the format (scenarios in why this might be good and why)
 
*Talk about background (what is pdf/a-3 and how is it different from earlier versions of PDF/A)(Butch, plus Caroline's 2-pager)
*Iterate categories of materials/use cases/concrete examples where it makes sense to use A-3 and other categories where it doesn't make sense. Example: if you're sending a video file don't put it in a PDF! If you had a certain kind of a journal article that had a static version of the spreadsheet in the doc but a malleable version embedded perhaps that argues for it. (Don, Kevin, Kate)
*Risks to the format (scenarios in why this might be bad and why) (Sheila)
*Possibilities of the format (scenarios in why this might be good and why) (Chris)
*Have list of defined terms in our document. How do these relate to the terms in the ISO spec. Leverage NDSA Levels of Preservation glossary. Link to glossary.
*Have list of defined terms in our document. How do these relate to the terms in the ISO spec. Leverage NDSA Levels of Preservation glossary. Link to glossary.


==PDF/A-3 Use Case Scenarios==
==PDF/A-3 Use Case Scenarios==
Add them here! We can create a separate page as necessary.
----
Example:  Federal agency with a document management system puts an MPEG video file (and nothing else) into a PDF/A-3 file to store and then, later, to submit as an SIP (Submission Information Package) to NARA for long-term management.


Example: Publisher has a text-only article and puts it into a PDF/A-3 file, even though, in the past, the publisher used PDF/A-2.  The article is then sent to library where it will be preserved for the long term.
A Template might include:
*Actors
*Actions
 
*Example:  Federal agency with a document management system puts an MPEG video file (and nothing else) into a PDF/A-3 file to store and then, later, to submit as an SIP (Submission Information Package) to NARA for long-term management.
 
*Example: Publisher has a text-only article and puts it into a PDF/A-3 file, even though, in the past, the publisher used PDF/A-2.  The article is then sent to library where it will be preserved for the long term.
 
*Example: Publisher has an article that includes a complicated table, "frozen" in place, and puts it into a PDF/A-3 file, along with the Excel file from which the table was generated, in order to make it easier for a future researcher to have a malleable version of the table for use when writing another article on the same subject.
 
*Example: Data creator has a digital map, a report, a database, digital photos, and detailed metadata that comprise a whole and wants to archive these together for the long-term.
 
*Example from Luratech Webinar used to show primary intent of PDF/A-3:  PDF/A document with diagram based on data, with embedded spreadsheet associated with diagram, metadata associated with subsection of document, source word-processing file, and audio rendering of the document (perhaps for accessibility).


Example: Publisher has an article that includes a complicated table, "frozen" in place, and puts it into a PDF/A-3 file, along with the Excel file from which the table was generated, in order to make it easier for a future researcher to have a malleable version of the table for use when writing another article on the same subject.
*See case #1 from Luratech Webinar: Scanned documents, with the scanned image as the main PDF/A content, with native metadata in XML embedded.


Example: Data creator has a digital map, a report, a database, digital photos, and detailed metadata that comprise a whole and wants to archive these together for the long-term.
*Use case #2 from Luratech Webinar: "Hybrid archiving" used when document in its active life cycle, further versions might be created.  Create PDF/A-3 for archive-ready rendition and embed the document in its native (e.g., word-processor) format.  Built in to a standard workflow, this would leave documents "archive ready" at all times.
 
*Use case #3 from Luratech Webinar: Human-readable invoice with embedded data marked up in CEN Core Invoice Standard (XML).


==Members==
==Members==
Line 62: Line 106:
*Kevin DeVorsey, NARA (Kevin.DeVorsey@nara.gov)
*Kevin DeVorsey, NARA (Kevin.DeVorsey@nara.gov)
*Chris Dietrich, National Park Service (chris_dietrich@nps.gov)
*Chris Dietrich, National Park Service (chris_dietrich@nps.gov)
*Carl Fleischauer, Library of Congress (cfle@loc.gov)
*Carl Fleischhauer, Library of Congress (cfle@loc.gov)
*Butch Lazorchak, Library of Congress (wlaz@loc.gov)
*Butch Lazorchak, Library of Congress (wlaz@loc.gov)
*Sheila Morrissey, Ithaka (Sheila.Morrissey@ithaka.org)
*Sheila Morrissey, Ithaka (Sheila.Morrissey@ithaka.org)
*Kate Murrary, NARA (Kate.Murray1@nara.gov)
*Kate Murray, Library of Congress (kmur@loc.gov)


==Calls and Notes==
==Calls and Notes==
Line 74: Line 118:
*Participant access code:          21408589  
*Participant access code:          21408589  


Next call: Tuesday Jan. 22, 2013, 2:00 P.M.
Next call: November 11, 2013 at 1:00pm ET
 
Notes:
 
[[NDSA:November 1, 2013 Call]]
 
[[NDSA:August 8, 2013 Call]]
 
[[NDSA:May 31, 2013 Call]]
 
[[NDSA:March 29, 2013 Call]]
 
[[NDSA:March 25, 2013 Call]]
 
[[NDSA:February 19, 2013 Call]]
 
[[NDSA:January 22, 2013 Call]]


==Background Materials==
==Background Materials==
Line 80: Line 140:
*[http://www.digitalpreservation.gov:8081/formats/fdd/fdd000360.shtml Library of Congress Sustainability of Digital Formats DRAFT PDF/A-3 format description document (FDD)]  COMMENTS PLEASE to caar@loc.gov and cfle@loc.gov
*[http://www.digitalpreservation.gov:8081/formats/fdd/fdd000360.shtml Library of Congress Sustainability of Digital Formats DRAFT PDF/A-3 format description document (FDD)]  COMMENTS PLEASE to caar@loc.gov and cfle@loc.gov
*[http://blogs.loc.gov/digitalpreservation/2012/11/all-in-embedded-files-in-pdfa/ Blog Post on PDF/A-3 on the Signal]
*[http://blogs.loc.gov/digitalpreservation/2012/11/all-in-embedded-files-in-pdfa/ Blog Post on PDF/A-3 on the Signal]
*[[NDSA:Media: TheNetworkIsTheFormat.pdf | Sheila M. Morrissey, The Network is the Format: PDF and the Long-term Use of Digital Content, Archiving 2012, pg. 200-203 (2012)]]
*[http://www.portico.org/digital-preservation/wp-content/uploads/2012/12/Archiving2012TheNetworkIsTheFormat.pdf Sheila M. Morrissey, The Network is the Format: PDF and the Long-term Use of Digital Content, Archiving 2012, pg. 200-203 (2012)]
*[[NDSA:Media: CommentsOnISO19005-3_smorrissey.pdf | Ithaka comments on ISO 19005-3 draft]]
*[[NDSA:Media: CommentsOnISO19005-3_smorrissey.pdf | Ithaka comments on ISO 19005-3 draft]]
*[[NDSA:Media: PDFA3-crathoughts_20121126.doc | Caroline's thoughts on PDF/A-3 circulated in late November, 2012]]
*[[NDSA:Media: PDFA3-crathoughts_20121126.doc | Caroline's thoughts on PDF/A-3 circulated in late November, 2012]]
*[http://www.youtube.com/watch?v=g-tJRSsZHyc Video of Webinar by Luratech on PDF/A-3]  Nov 8, 2012.  Includes uses cases and demos.
*[http://www.youtube.com/watch?v=g-tJRSsZHyc Video of Webinar by Luratech on PDF/A-3]  Nov 8, 2012.  Includes uses cases and demos.
*[[NDSA:Media: Luratech-PDFA3-Webinar-ENG.pdf | Slides used for Luratech Webinar]]  Nov 8, 2012.  Includes uses cases and demos.  Do not distribute.
*[[NDSA:Media: Luratech-PDFA3-Webinar-ENG.pdf | Slides used for Luratech Webinar]]  Nov 8, 2012.  Includes uses cases and demos.  Do not distribute.
*In future set up calls with Steve Levinson (U.S. Courts) and Leonard Rosenthal (Adobe)
*[http://www.dpconline.org/events/details/55-DPC_PDFA3_briefing?xref=58 Digital Preservation Coalition (DPC) 2013-03-13 Workshop on PDF/A-3] Includes links to pressenters slides and to William Kilbride's comments
 
*Unofficial XMP notes from 2011 explorations by Caroline Arms -- Please do not distribute
**[[NDSA:Media: XMPbackground_20111130_cra.pdf‎ | Notes on XMP and tools available to LC]]
**[[NDSA:Media: XMPexplore_20111209_cra.pdf‎ | Summary of exploration of XMP use external to LC]]
 
==Possible future actions==
*Set up future call with Duff Johnson of the PDF Association
*Track PDF Validator activity
*Once charter is reviewed by main NDSA Standards Group, extend participation call to
*Set up calls with Steve Levinson (U.S. Courts) and Leonard Rosenthal (Adobe)
*Extend invitation to join beyond active NDSA participants, e.g. to LC staff involved in Best Edition choices.

Latest revision as of 15:19, 11 February 2016

Back to Standards Working Group Main Page

Title of Activity or Project

NDSA PDF/A-3 Scoping Project

One Sentence Description:

NDSA PDF/A-3 Scoping Project working group members will research the pros and cons of using the PDF/A-3 standard as an all-purpose wrapper in different preservation scenarios, including use as an extension to PDF/A-1 and PDF/A-2 in circumstances for which those formats have been adopted or recommended and use as a wrapper for various digital asset/media types, such as textual, audio, video, photo, and GIS data.

Report Drafts

Final Subgroup Draft Version 12 was completed on Dec. 20, 2014 and sent to the Standards WG co-chairs for comment.

Review should be completed by early January at which point it will be sent to the NDSA Coordinating Committee for final review. Release target is February 2014.

Version 8 was finished on 11/12/13 and has been sent out for comment to the following:

  • NDSA Standards Working Group members
  • Johan van der Knijff, Johan.vanderKnijff@KB.NL
  • Andy Jackson, Andrew.Jackson@BL.UK
  • William Kilbride, william@dpconline.org
  • Mike Neubert (LC)
  • Ardie Bausenbach (LC)
  • Leslie Johnson (LC)
  • Theron (Ted) Westervelt (LC)
  • Duff Johnson, duff@duff-johnson.com
  • Leonard Rosenthol, lrosenth@adobe.com
  • Steve Levenson, Stephen_Levenson@ao.uscourts.gov
  • Reynold Schweickhardt, Reynold@mail.house.gov

Earlier Draft Version 6

Statement of the Problem and Goals for Addressing the Problem:

The single extension to PDF/A-2 in PDF/A-3 is the ability to embed files of any type within a PDF/A document. PDF/A-3 was designed to accommodate supplementary media files for text documents. Issues raised by this extension include:

  • Is PDF/A-3 appropriate as a de facto normalization wrapper format for some or all media types or in particular circumstances?
  • For circumstances where PDF/A-2 has already been deemed an appropriate preservation format (primarily for textual documents), what are the risks and opportunities offered by the ability to embed content in non-PDF formats?

The goal is to develop guidelines for the appropriate use of PDF/A-3 with respect to different scenarios that include both detailed technical information and a practical quick reference guide for end-users.

Strategic Value of Activity:

  • Improve understanding of best practices for using PDF/A-3 in digital preservation activities
  • Enhance consistency and improve long-term viability of digitally preserved content
  • Provide guidance to those considering PDF/A-3 as a long-term archiving format

Required Resources:

  • Time of working group members
  • Publishing venue(s)
  • Communication channels

Roadmap:

  1. Hold regular working group conference calls (monthly, between NDSA Standards WG calls)
  2. Draft document and review
  3. Invite broader NDSA member feedback
  4. Publish document (digitalpreservation.gov, others?)

Dissemination of Knowledge:

  • Publish report on digitalpreservation.gov
  • Write a blog post
  • Announce on NDSA member organization communication channels
  • Present at conferences that members (and non-members?) are attending

Signifiers of Success and Outcomes:

  • Completed guidelines document published on digitalpreservation.gov
  • Guidelines document referenced on related Wikipedia pages
  • Guidelines referenced in FDD (format description document) for PDF/A-3 [1]
  • Guidelines in use or recommended by NDSA participating organizations or others
  • Publication at other conferences/other journals

Questions to Ask and Answer

  • Talk about background (what is pdf/a-3 and how is it different from earlier versions of PDF/A)(Butch, plus Caroline's 2-pager)
  • Iterate categories of materials/use cases/concrete examples where it makes sense to use A-3 and other categories where it doesn't make sense. Example: if you're sending a video file don't put it in a PDF! If you had a certain kind of a journal article that had a static version of the spreadsheet in the doc but a malleable version embedded perhaps that argues for it. (Don, Kevin, Kate)
  • Risks to the format (scenarios in why this might be bad and why) (Sheila)
  • Possibilities of the format (scenarios in why this might be good and why) (Chris)
  • Have list of defined terms in our document. How do these relate to the terms in the ISO spec. Leverage NDSA Levels of Preservation glossary. Link to glossary.

PDF/A-3 Use Case Scenarios

A Template might include:

  • Actors
  • Actions
  • Example: Federal agency with a document management system puts an MPEG video file (and nothing else) into a PDF/A-3 file to store and then, later, to submit as an SIP (Submission Information Package) to NARA for long-term management.
  • Example: Publisher has a text-only article and puts it into a PDF/A-3 file, even though, in the past, the publisher used PDF/A-2. The article is then sent to library where it will be preserved for the long term.
  • Example: Publisher has an article that includes a complicated table, "frozen" in place, and puts it into a PDF/A-3 file, along with the Excel file from which the table was generated, in order to make it easier for a future researcher to have a malleable version of the table for use when writing another article on the same subject.
  • Example: Data creator has a digital map, a report, a database, digital photos, and detailed metadata that comprise a whole and wants to archive these together for the long-term.
  • Example from Luratech Webinar used to show primary intent of PDF/A-3: PDF/A document with diagram based on data, with embedded spreadsheet associated with diagram, metadata associated with subsection of document, source word-processing file, and audio rendering of the document (perhaps for accessibility).
  • See case #1 from Luratech Webinar: Scanned documents, with the scanned image as the main PDF/A content, with native metadata in XML embedded.
  • Use case #2 from Luratech Webinar: "Hybrid archiving" used when document in its active life cycle, further versions might be created. Create PDF/A-3 for archive-ready rendition and embed the document in its native (e.g., word-processor) format. Built in to a standard workflow, this would leave documents "archive ready" at all times.
  • Use case #3 from Luratech Webinar: Human-readable invoice with embedded data marked up in CEN Core Invoice Standard (XML).

Members

  • Caroline Arms, Library of Congress (caar@loc.gov)
  • Don Chalfant, NARA (Donald.Chalfant@nara.gov)
  • Kevin DeVorsey, NARA (Kevin.DeVorsey@nara.gov)
  • Chris Dietrich, National Park Service (chris_dietrich@nps.gov)
  • Carl Fleischhauer, Library of Congress (cfle@loc.gov)
  • Butch Lazorchak, Library of Congress (wlaz@loc.gov)
  • Sheila Morrissey, Ithaka (Sheila.Morrissey@ithaka.org)
  • Kate Murray, Library of Congress (kmur@loc.gov)

Calls and Notes

Call information:

  • Call-in toll-free number (US/Canada): 866-469-3239
  • Participant access code: 21408589

Next call: November 11, 2013 at 1:00pm ET

Notes:

NDSA:November 1, 2013 Call

NDSA:August 8, 2013 Call

NDSA:May 31, 2013 Call

NDSA:March 29, 2013 Call

NDSA:March 25, 2013 Call

NDSA:February 19, 2013 Call

NDSA:January 22, 2013 Call

Background Materials

Possible future actions

  • Set up future call with Duff Johnson of the PDF Association
  • Track PDF Validator activity
  • Once charter is reviewed by main NDSA Standards Group, extend participation call to
  • Set up calls with Steve Levinson (U.S. Courts) and Leonard Rosenthal (Adobe)
  • Extend invitation to join beyond active NDSA participants, e.g. to LC staff involved in Best Edition choices.