NDSA:January 22, 2013 Call

From DLF Wiki

Back to Standards Working Group Main Page

Back to PDF Exploration Page

Agenda

None in advance

Participants

Kevin DeVorsey, Don Chalfant, Kate Murray, Sheila Morrissey, Carl Fleischhauer, Caroline Arms, Butch Lazorchak

Meeting Notes

Discussion on PDF as a "wrapper" and what that means for this effort.

What is the "use case" where someone would include embedded files in a PDF/A-3 document but NOT feel that information was important to preserve?

Example: Epub3 packaging. Define behaviors for handlers of different kinds of content. This is a "less encapsulated" container than PDF/A-3.

Complexity=risk. Hard to say that we'd ever get to the point where we'd differentiate between "record" material and "non-record" material within a single file.

The downstream uses of content is not only recreating the user experience of 2013 but to manipulate an entire corpora of files. This is different from the preservation of the original user experience.

SIP-based formats for bundling things together are more flexible.

Embedding things in a PDF with such a limited description of what it is is troubling. You're forced to provide a mime-type but that's probably not good enough. That should be documented as part of this analysis.

Are more people using XML/SGML now? Get these from almost every single publisher but largely get header/abstract as opposed to fulltext. Page image becomes the full intellectual content of the document for something like 20 million journal articles.

The use of tablets and phones is putting pressure on PDF as a format.

Scholarly communication, just what exactly is a publication? Many more things are out in the universe, not necessarily XML. Adobe in their suite of tools now knows how to make ePub.

Are there some narrowly defined uses where PDF/A-3 would be useful? Redundant information, the spreadsheet that represents the frozen information in the PDF.

Imaging the uses cases is one of the outputs of this group. Not necessarily outright prohibition but to articulate the positive side of it.

In academic circles, making data available as well as the conclusions. Some of these packaged up presentations would make it easier to put together the data with the conclusions. Anything in the profile that would support the effort to share data and make it available?

Supplementary materials: not the main body of an article but ancillary and enriching materials that support but are not essential to the original document.

Embedding in a PDF for an article is not a solution for preserving the data. The data should be preserved in an archive so that it was available for people searching across datasets. Appropriate for distributing to the immediate generation of readers.

Would need more metadata in the PDF/A to describe the embedded materials.

We just need to articulate the terrain really well, not solve all the problems.

The relationship between the embedded objects and the PDF/A-3 wrapper has not been articulated clearly enough. This is an area we could provide some guidance on describing the materials to a much greater degree.

Data for useful reuse or validation, you need a lot of quite specific information about the data and how it was collected. More than just a spreadsheet of values. For this reason the embedded data should not be thought of as preserved just because it's been embedded.

Can we stipulate the requirements that people would put in their archival documents. The history to date of the use of embedded XMP metadata has not been entirely successful.

Is our hesitation to embrace PDF/A-3 based on our lack of tools? Other workflows have excellent tools but perhaps our workflow has poor tools?

PDF/A-3 might be better than the older ways to embed audiovisual content in a PDF. At least with A-3 what seems to be intended is that you have things that can live as independent files and they can be extracted as independent files. Conforming readers are supposed to be able to create copies of those embedded documents external to the PDF document.