NDSA:March 29, 2013 Call

From DLF Wiki
Jump to navigation Jump to search

Back to Standards Working Group Main Page

Back to PDF Exploration Page

Agenda

Discussion with Stephen Levenson, an IT Specialist for Policy and Planning at the Office of the US Courts and the chair of the PDF/A working group.

Participants

Don Chalfant, Kate Murray, Sheila Morrissey, Kevin DeVorsey, Chris Dietrich, Carl Fleischhauer, Stephen Levenson, Butch Lazorchak

Meeting Notes

Stephen on how standards are implemented in the ISO community.

PDF/A-3 doesn't necessarily replace A-1 or A-2. Should be able to use a PDF/A-1 file 30 years from now. The methodology used should not change for rendering these files in the future.

PDF/A movement highly influenced by manufacturers, now PDF/A center, dominated by the Germans. Had many use cases for instances where creators wanted to include the original files wihtin a PDF/A document.

Brazilian government wanted to preserve their material as XML but XML wasn't trusted by users because of complexity. Wanted to make a more presentation-ready format but didn't want to throw away the XML.

U.S. Courts, bankruptcy court, claims, when an individual goes into court, and the claims are laid out, in order for someone to assert the claim, we print a document for them. That's what they bring back to court to assert their claim. Ginny Mae started this and Mastercard is also movin gon this. We get the PDF but then have to reenter the data from this doucment in their case management systems.

We're putting an XML output of what the claim represents inside the PDF document. Ginny Mae's automated processes work on this XML.

Down with PDF/A-3 we have downstream functions that leverage the inner materials. Adobe's server product does not currently output A-3 files.

When you look at a A-3 document in Adobe Reader it will render the

Chris D. How will hidden content be protected from certain readers of the document.

Steve: For A-3 you can still include the information as "private data" that would make it hidden.

Caroline:

Steve: conforming reader would recognize it as A-3 and set up an additional dialog.

Caroline: we're in the posotion of having to preserve files that somebody else created. We need tools to characterize files. Some PDF/A-3 files may not have any embedded content so would actually behave like a PDF/A-2. But would the

Steve: We'd have to talk to the developers about that. There is a vendor that is looking to create an independent service to validate the writers to ensure that they are actually complying with the standards. The software would have to get certified that it works. Then we'd actually have validators at ingestion. We have to get somebody interested in creating this software as a business and we think we finally have somebody who will do this.

DOD standard 5015.2 that says if you're going to be a document management system you have to do certain things. Has to be sent to Fort Chacuca in AZ to a testing center to ensure that it conforms to DOD 5015.2. And this validator would do the same kind of thing.

right now we have no validator that says a Word document is actually a word document. There are a lot of bad writers out there.

Kevin: we're working on policy and guidance side. We ask people to keep temporary, permanent and non-record material separate from each other. Does PDF/A-3 run counter to that? Might encourage people to mix record and non-record material together in the same file?

Steve: If there's a relationship, don't you need that for provenance information?

Kevin: we need to educate our folks.

Steve: We've been dealing with current technologies on these things and who knows what technology will afford us in the future. We'll be able to hedge our bets.

Sheila: Question to me is what is the relationship between the PDF/A-3 container and the embedded XML, that is, in the Brazilian example, which one has the force of law? And how do you ensure that they say the same thing? In Germany they're planing on using this for commerce and the validation of invoices, but processing thing in the mass pragmatically means that you're going to look at one or the other. What warrant is there that they're going to stay the same way. The Germans said that the "embedded content" has no standing. Only the archival version has standing."

Steve: I can talk about legal. We would assume that the entire document was the legal evidence. In the case of dual content, according to the "best evidence" rule. If the company put a courtesy copy inside and that it's the main document that is their record. For example, if you requested an invoice and you received what you might see in a PDF versus XML data, then that's the evidence.

Folks coming in to a reading room. We may have to set up rules at ingestion, they could either strip it out and put the file on a diet and store the other content. We, in the committee, didn't want to dictate to preservationists how to do their job.

If NARA said they didn't want a part of A-3 then the agencies shouldn't store in A-3. In our Pacer system you can pull down a PDF document but stored inside is an MP3 files that allows you to understand the provenance a little more. The PDF is the metadata around the MP3 file. these are temporary records, so it's not the same issues.

Chris: So the PDF is acting as a manifest for the MP3 file. No validators, if someone if processing a bunch of files into PDF and embedding an XML version. Something could go wrong and you embed the wrong versions of the XML. There's not way to validate the right coordinated content.

Steve: archivists are going to have to get more involved in advising creators on the types of files they create.