NDSA:PDF Exploration: Difference between revisions
m 95 revisions imported: Migrate NDSA content from Library of Congress |
|||
(59 intermediate revisions by 5 users not shown) | |||
Line 5: | Line 5: | ||
==One Sentence Description:== | ==One Sentence Description:== | ||
NDSA PDF/A-3 Scoping Project working group members will research the pros and cons of using the PDF/A-3 standard as an all-purpose wrapper for various digital asset/media types | NDSA PDF/A-3 Scoping Project working group members will research the pros and cons of using the PDF/A-3 standard as an all-purpose wrapper in different preservation scenarios, including use as an extension to PDF/A-1 and PDF/A-2 in circumstances for which those formats have been adopted or recommended and use as a wrapper for various digital asset/media types, such as textual, audio, video, photo, and GIS data. | ||
==Report Drafts== | |||
Final Subgroup Draft [[NDSA:Media: NDSA PDF_A3 Subcommittee Report_v12.doc| Version 12]] was completed on Dec. 20, 2014 and sent to the Standards WG co-chairs for comment. | |||
Review should be completed by early January at which point it will be sent to the NDSA Coordinating Committee for final review. Release target is February 2014. | |||
Version 8 was finished on 11/12/13 and has been sent out for comment to the following: | |||
*NDSA Standards Working Group members | |||
*Johan van der Knijff, Johan.vanderKnijff@KB.NL | |||
*Andy Jackson, Andrew.Jackson@BL.UK | |||
*William Kilbride, william@dpconline.org | |||
*Mike Neubert (LC) | |||
*Ardie Bausenbach (LC) | |||
*Leslie Johnson (LC) | |||
*Theron (Ted) Westervelt (LC) | |||
*Duff Johnson, duff@duff-johnson.com | |||
*Leonard Rosenthol, lrosenth@adobe.com | |||
*Steve Levenson, Stephen_Levenson@ao.uscourts.gov | |||
*Reynold Schweickhardt, Reynold@mail.house.gov | |||
Earlier Draft [[NDSA:Media: NDSA PDF_A3 Subcommittee Report_v0 6.doc | Version 6]] | |||
==Statement of the Problem and Goals for Addressing the Problem:== | ==Statement of the Problem and Goals for Addressing the Problem:== | ||
The single extension to PDF/A-2 in PDF/A-3 is the ability to embed files of any type within a PDF/A document. | |||
PDF/A-3 was designed to accommodate supplementary media files for text documents. Issues raised by this extension include: | |||
* Is PDF/A-3 appropriate as a de facto normalization wrapper format for some or all media types or in particular circumstances? | |||
* For circumstances where PDF/A-2 has already been deemed an appropriate preservation format (primarily for textual documents), what are the risks and opportunities offered by the ability to embed content in non-PDF formats? | |||
The goal is to develop guidelines for the appropriate use of PDF/A-3 with respect to different scenarios that include both detailed technical information and a practical quick reference guide for end-users. | |||
==Strategic Value of Activity:== | ==Strategic Value of Activity:== | ||
Line 35: | Line 64: | ||
* Completed guidelines document published on digitalpreservation.gov | * Completed guidelines document published on digitalpreservation.gov | ||
* Guidelines document referenced on related Wikipedia pages | * Guidelines document referenced on related Wikipedia pages | ||
* Guidelines referenced in FDD (format description document) for PDF/A-3 [http://www.digitalpreservation.gov/formats/fdd/fdd000360.shtml] | |||
* Guidelines in use or recommended by NDSA participating organizations or others | * Guidelines in use or recommended by NDSA participating organizations or others | ||
* Publication at other conferences/other journals | * Publication at other conferences/other journals | ||
==Questions to Ask and Answer== | ==Questions to Ask and Answer== | ||
*Talk about background (what is pdf/a-3 and how is it different from earlier versions of PDF/A) | |||
*Iterate categories of materials/use cases/concrete examples where it makes sense to use A-3 and other categories where it doesn't make sense. Example: if you're sending a video file don't put it in a PDF! If you had a certain kind of a journal article that had a static version of the spreadsheet in the doc but a malleable version embedded perhaps that argues for it. | |||
*Risks to the format (scenarios in why this might be bad and why) | |||
*Possibilities of the format (scenarios in why this might be good and why) | |||
*Talk about background (what is pdf/a-3 and how is it different from earlier versions of PDF/A)(Butch, plus Caroline's 2-pager) | |||
*Iterate categories of materials/use cases/concrete examples where it makes sense to use A-3 and other categories where it doesn't make sense. Example: if you're sending a video file don't put it in a PDF! If you had a certain kind of a journal article that had a static version of the spreadsheet in the doc but a malleable version embedded perhaps that argues for it. (Don, Kevin, Kate) | |||
*Risks to the format (scenarios in why this might be bad and why) (Sheila) | |||
*Possibilities of the format (scenarios in why this might be good and why) (Chris) | |||
*Have list of defined terms in our document. How do these relate to the terms in the ISO spec. Leverage NDSA Levels of Preservation glossary. Link to glossary. | *Have list of defined terms in our document. How do these relate to the terms in the ISO spec. Leverage NDSA Levels of Preservation glossary. Link to glossary. | ||
==PDF/A-3 Use Case Scenarios== | ==PDF/A-3 Use Case Scenarios== | ||
A Template might include: | |||
*Actors | |||
*Actions | |||
Example: Publisher has an article that includes a complicated table, "frozen" in place, and puts it into a PDF/A-3 file, along with the Excel file from which the table was generated, in order to make it easier for a future researcher to have a malleable version of the table for use when writing another article on the same subject. | *Example: Federal agency with a document management system puts an MPEG video file (and nothing else) into a PDF/A-3 file to store and then, later, to submit as an SIP (Submission Information Package) to NARA for long-term management. | ||
*Example: Publisher has a text-only article and puts it into a PDF/A-3 file, even though, in the past, the publisher used PDF/A-2. The article is then sent to library where it will be preserved for the long term. | |||
*Example: Publisher has an article that includes a complicated table, "frozen" in place, and puts it into a PDF/A-3 file, along with the Excel file from which the table was generated, in order to make it easier for a future researcher to have a malleable version of the table for use when writing another article on the same subject. | |||
*Example: Data creator has a digital map, a report, a database, digital photos, and detailed metadata that comprise a whole and wants to archive these together for the long-term. | |||
*Example from Luratech Webinar used to show primary intent of PDF/A-3: PDF/A document with diagram based on data, with embedded spreadsheet associated with diagram, metadata associated with subsection of document, source word-processing file, and audio rendering of the document (perhaps for accessibility). | |||
*See case #1 from Luratech Webinar: Scanned documents, with the scanned image as the main PDF/A content, with native metadata in XML embedded. | |||
*Use case #2 from Luratech Webinar: "Hybrid archiving" used when document in its active life cycle, further versions might be created. Create PDF/A-3 for archive-ready rendition and embed the document in its native (e.g., word-processor) format. Built in to a standard workflow, this would leave documents "archive ready" at all times. | |||
*Use case #3 from Luratech Webinar: Human-readable invoice with embedded data marked up in CEN Core Invoice Standard (XML). | |||
==Members== | ==Members== | ||
Line 59: | Line 106: | ||
*Kevin DeVorsey, NARA (Kevin.DeVorsey@nara.gov) | *Kevin DeVorsey, NARA (Kevin.DeVorsey@nara.gov) | ||
*Chris Dietrich, National Park Service (chris_dietrich@nps.gov) | *Chris Dietrich, National Park Service (chris_dietrich@nps.gov) | ||
*Carl | *Carl Fleischhauer, Library of Congress (cfle@loc.gov) | ||
*Butch Lazorchak, Library of Congress (wlaz@loc.gov) | *Butch Lazorchak, Library of Congress (wlaz@loc.gov) | ||
*Sheila Morrissey, Ithaka (Sheila.Morrissey@ithaka.org) | *Sheila Morrissey, Ithaka (Sheila.Morrissey@ithaka.org) | ||
*Kate | *Kate Murray, Library of Congress (kmur@loc.gov) | ||
==Calls and Notes== | ==Calls and Notes== | ||
Line 71: | Line 118: | ||
*Participant access code: 21408589 | *Participant access code: 21408589 | ||
Next call: | Next call: November 11, 2013 at 1:00pm ET | ||
Notes: | |||
[[NDSA:November 1, 2013 Call]] | |||
[[NDSA:August 8, 2013 Call]] | |||
[[NDSA:May 31, 2013 Call]] | |||
[[NDSA:March 29, 2013 Call]] | |||
[[NDSA:March 25, 2013 Call]] | |||
[[NDSA:February 19, 2013 Call]] | |||
[[NDSA:January 22, 2013 Call]] | |||
==Background Materials== | ==Background Materials== | ||
*[http://www.digitalpreservation.gov:8081/formats/fdd/fdd000360.shtml Library of Congress Sustainability of Digital Formats DRAFT PDF/A-3 | *[http://www.digitalpreservation.gov:8081/formats/fdd/fdd000360.shtml Library of Congress Sustainability of Digital Formats DRAFT PDF/A-3 format description document (FDD)] COMMENTS PLEASE to caar@loc.gov and cfle@loc.gov | ||
*[http://blogs.loc.gov/digitalpreservation/2012/11/all-in-embedded-files-in-pdfa/ Blog Post on PDF/A-3 on the Signal] | *[http://blogs.loc.gov/digitalpreservation/2012/11/all-in-embedded-files-in-pdfa/ Blog Post on PDF/A-3 on the Signal] | ||
*Ithaka comments on ISO 19005-3 draft | *[http://www.portico.org/digital-preservation/wp-content/uploads/2012/12/Archiving2012TheNetworkIsTheFormat.pdf Sheila M. Morrissey, The Network is the Format: PDF and the Long-term Use of Digital Content, Archiving 2012, pg. 200-203 (2012)] | ||
* | *[[NDSA:Media: CommentsOnISO19005-3_smorrissey.pdf | Ithaka comments on ISO 19005-3 draft]] | ||
* | *[[NDSA:Media: PDFA3-crathoughts_20121126.doc | Caroline's thoughts on PDF/A-3 circulated in late November, 2012]] | ||
* | *[http://www.youtube.com/watch?v=g-tJRSsZHyc Video of Webinar by Luratech on PDF/A-3] Nov 8, 2012. Includes uses cases and demos. | ||
*[[NDSA:Media: Luratech-PDFA3-Webinar-ENG.pdf | Slides used for Luratech Webinar]] Nov 8, 2012. Includes uses cases and demos. Do not distribute. | |||
*[http://www.dpconline.org/events/details/55-DPC_PDFA3_briefing?xref=58 Digital Preservation Coalition (DPC) 2013-03-13 Workshop on PDF/A-3] Includes links to pressenters slides and to William Kilbride's comments | |||
*Unofficial XMP notes from 2011 explorations by Caroline Arms -- Please do not distribute | |||
**[[NDSA:Media: XMPbackground_20111130_cra.pdf | Notes on XMP and tools available to LC]] | |||
**[[NDSA:Media: XMPexplore_20111209_cra.pdf | Summary of exploration of XMP use external to LC]] | |||
==Possible future actions== | |||
*Set up future call with Duff Johnson of the PDF Association | |||
*Track PDF Validator activity | |||
*Once charter is reviewed by main NDSA Standards Group, extend participation call to | |||
*Set up calls with Steve Levinson (U.S. Courts) and Leonard Rosenthal (Adobe) | |||
*Extend invitation to join beyond active NDSA participants, e.g. to LC staff involved in Best Edition choices. |
Latest revision as of 14:19, 11 February 2016
Back to Standards Working Group Main Page
Title of Activity or Project
NDSA PDF/A-3 Scoping Project
One Sentence Description:
NDSA PDF/A-3 Scoping Project working group members will research the pros and cons of using the PDF/A-3 standard as an all-purpose wrapper in different preservation scenarios, including use as an extension to PDF/A-1 and PDF/A-2 in circumstances for which those formats have been adopted or recommended and use as a wrapper for various digital asset/media types, such as textual, audio, video, photo, and GIS data.
Report Drafts
Final Subgroup Draft Version 12 was completed on Dec. 20, 2014 and sent to the Standards WG co-chairs for comment.
Review should be completed by early January at which point it will be sent to the NDSA Coordinating Committee for final review. Release target is February 2014.
Version 8 was finished on 11/12/13 and has been sent out for comment to the following:
- NDSA Standards Working Group members
- Johan van der Knijff, Johan.vanderKnijff@KB.NL
- Andy Jackson, Andrew.Jackson@BL.UK
- William Kilbride, william@dpconline.org
- Mike Neubert (LC)
- Ardie Bausenbach (LC)
- Leslie Johnson (LC)
- Theron (Ted) Westervelt (LC)
- Duff Johnson, duff@duff-johnson.com
- Leonard Rosenthol, lrosenth@adobe.com
- Steve Levenson, Stephen_Levenson@ao.uscourts.gov
- Reynold Schweickhardt, Reynold@mail.house.gov
Earlier Draft Version 6
Statement of the Problem and Goals for Addressing the Problem:
The single extension to PDF/A-2 in PDF/A-3 is the ability to embed files of any type within a PDF/A document. PDF/A-3 was designed to accommodate supplementary media files for text documents. Issues raised by this extension include:
- Is PDF/A-3 appropriate as a de facto normalization wrapper format for some or all media types or in particular circumstances?
- For circumstances where PDF/A-2 has already been deemed an appropriate preservation format (primarily for textual documents), what are the risks and opportunities offered by the ability to embed content in non-PDF formats?
The goal is to develop guidelines for the appropriate use of PDF/A-3 with respect to different scenarios that include both detailed technical information and a practical quick reference guide for end-users.
Strategic Value of Activity:
- Improve understanding of best practices for using PDF/A-3 in digital preservation activities
- Enhance consistency and improve long-term viability of digitally preserved content
- Provide guidance to those considering PDF/A-3 as a long-term archiving format
Required Resources:
- Time of working group members
- Publishing venue(s)
- Communication channels
Roadmap:
- Hold regular working group conference calls (monthly, between NDSA Standards WG calls)
- Draft document and review
- Invite broader NDSA member feedback
- Publish document (digitalpreservation.gov, others?)
Dissemination of Knowledge:
- Publish report on digitalpreservation.gov
- Write a blog post
- Announce on NDSA member organization communication channels
- Present at conferences that members (and non-members?) are attending
Signifiers of Success and Outcomes:
- Completed guidelines document published on digitalpreservation.gov
- Guidelines document referenced on related Wikipedia pages
- Guidelines referenced in FDD (format description document) for PDF/A-3 [1]
- Guidelines in use or recommended by NDSA participating organizations or others
- Publication at other conferences/other journals
Questions to Ask and Answer
- Talk about background (what is pdf/a-3 and how is it different from earlier versions of PDF/A)(Butch, plus Caroline's 2-pager)
- Iterate categories of materials/use cases/concrete examples where it makes sense to use A-3 and other categories where it doesn't make sense. Example: if you're sending a video file don't put it in a PDF! If you had a certain kind of a journal article that had a static version of the spreadsheet in the doc but a malleable version embedded perhaps that argues for it. (Don, Kevin, Kate)
- Risks to the format (scenarios in why this might be bad and why) (Sheila)
- Possibilities of the format (scenarios in why this might be good and why) (Chris)
- Have list of defined terms in our document. How do these relate to the terms in the ISO spec. Leverage NDSA Levels of Preservation glossary. Link to glossary.
PDF/A-3 Use Case Scenarios
A Template might include:
- Actors
- Actions
- Example: Federal agency with a document management system puts an MPEG video file (and nothing else) into a PDF/A-3 file to store and then, later, to submit as an SIP (Submission Information Package) to NARA for long-term management.
- Example: Publisher has a text-only article and puts it into a PDF/A-3 file, even though, in the past, the publisher used PDF/A-2. The article is then sent to library where it will be preserved for the long term.
- Example: Publisher has an article that includes a complicated table, "frozen" in place, and puts it into a PDF/A-3 file, along with the Excel file from which the table was generated, in order to make it easier for a future researcher to have a malleable version of the table for use when writing another article on the same subject.
- Example: Data creator has a digital map, a report, a database, digital photos, and detailed metadata that comprise a whole and wants to archive these together for the long-term.
- Example from Luratech Webinar used to show primary intent of PDF/A-3: PDF/A document with diagram based on data, with embedded spreadsheet associated with diagram, metadata associated with subsection of document, source word-processing file, and audio rendering of the document (perhaps for accessibility).
- See case #1 from Luratech Webinar: Scanned documents, with the scanned image as the main PDF/A content, with native metadata in XML embedded.
- Use case #2 from Luratech Webinar: "Hybrid archiving" used when document in its active life cycle, further versions might be created. Create PDF/A-3 for archive-ready rendition and embed the document in its native (e.g., word-processor) format. Built in to a standard workflow, this would leave documents "archive ready" at all times.
- Use case #3 from Luratech Webinar: Human-readable invoice with embedded data marked up in CEN Core Invoice Standard (XML).
Members
- Caroline Arms, Library of Congress (caar@loc.gov)
- Don Chalfant, NARA (Donald.Chalfant@nara.gov)
- Kevin DeVorsey, NARA (Kevin.DeVorsey@nara.gov)
- Chris Dietrich, National Park Service (chris_dietrich@nps.gov)
- Carl Fleischhauer, Library of Congress (cfle@loc.gov)
- Butch Lazorchak, Library of Congress (wlaz@loc.gov)
- Sheila Morrissey, Ithaka (Sheila.Morrissey@ithaka.org)
- Kate Murray, Library of Congress (kmur@loc.gov)
Calls and Notes
Call information:
- Call-in toll-free number (US/Canada): 866-469-3239
- Participant access code: 21408589
Next call: November 11, 2013 at 1:00pm ET
Notes:
Background Materials
- Library of Congress Sustainability of Digital Formats DRAFT PDF/A-3 format description document (FDD) COMMENTS PLEASE to caar@loc.gov and cfle@loc.gov
- Blog Post on PDF/A-3 on the Signal
- Sheila M. Morrissey, The Network is the Format: PDF and the Long-term Use of Digital Content, Archiving 2012, pg. 200-203 (2012)
- Ithaka comments on ISO 19005-3 draft
- Caroline's thoughts on PDF/A-3 circulated in late November, 2012
- Video of Webinar by Luratech on PDF/A-3 Nov 8, 2012. Includes uses cases and demos.
- Slides used for Luratech Webinar Nov 8, 2012. Includes uses cases and demos. Do not distribute.
- Digital Preservation Coalition (DPC) 2013-03-13 Workshop on PDF/A-3 Includes links to pressenters slides and to William Kilbride's comments
- Unofficial XMP notes from 2011 explorations by Caroline Arms -- Please do not distribute
Possible future actions
- Set up future call with Duff Johnson of the PDF Association
- Track PDF Validator activity
- Once charter is reviewed by main NDSA Standards Group, extend participation call to
- Set up calls with Steve Levinson (U.S. Courts) and Leonard Rosenthal (Adobe)
- Extend invitation to join beyond active NDSA participants, e.g. to LC staff involved in Best Edition choices.