NDSA:August 21, 2013 Meeting Minutes: Difference between revisions

← Older edit Newer edit →

Revision as of 16:36, 21 August 2013

Attendees

Bailey, Jefferson, Metropolitan NY Library Council
Grotke, Abbie | Web Archiving Team Lead, Library of Congress, and Co-Chair of the NDSA Content Working Group | abgr@LOC.GOV | 202-707-2833 | @agrotke
Hartman, Cathy | Associate Dean of Libraries, University of North Texas/ Co-Chair of the NDSA Content Working Group | cathy.hartman@UNT.EDU
McCain, Edward | University of Missouri | mccaing@missouri.edu
McAninch, Glen | Kentucky Department for Libraries and Archives | Glen.McAninch@ky.gov
McMillan, Gail | Virginia Polytechnic Institute and State University | gailmac@vt.edu
Moffatt, Christie | National Library of Medicine | moffattc@mail.nlm.nih.gov
Rudersdorf, Amy | Digital Public Library of America | amy@dp.la
Stoller, Michael | New York University | Michael.stoller@NYU.EDU
Taylor, Nicholas | Stanford Univ. Libraries
Wurl, Joel | National Endowment for the Humanities | jwurl@neh.gov

Agenda

Brainstorm of:

what questions we'd like to repeat from the 2011 survey --what
topics/issues that were brought up in survey that we might want to delve deeper into
new questions we might ask

Discussion Notes

Kristine Hanna (IA) couldn't join us today but submitted these comments to Abbie, which she shared with group:

1) I think it would be extremely helpful to see the progress organizations have made in the last two years. Sort of a "are you better or worse off than in 2011" type of polling.

2) I keep hearing over and over gain the need to understand internal work flows, skill sets required, resources needed to initiate and sustain a web archiving program. And the answers might be different for a one person shop than with a five person team. Or they may be the same. Perhaps we could have more in depth questions around this area.

General Comments

Michael found it comforting in reading original survey that things they were struggling with, others were struggling with too.

What to repeat

We didn't get too much into this but talked about problematic questions that we might drop or restructure for this year. Jefferson reported that in analyzing results, respondents seemed to have trouble with questions 13-15 (those about "what subjects in your archives" - we discussed possibly reformulating these questions but no solutions proposed yet. There was interest in keeping in some questions about news, media and journalism. Maybe we can tie those with policy questions regarding certain types of content?

Possible New Areas To Explore

WORKFLOWS - would be helpful to learn more about what workflows people have in place for acquisition of web content. Multiple choice might be hard; descriptive open-ended comment field? (see additional workflow comment in metadata below)

STAFFING/SKILLSETS - All agreed that questions about staffing would be useful. Jefferson suggested we look at the infrastructure wg's staffing survey to see how they asked questions about this (full time vs part time, who is doing what? what skills do they need? Who is selecting what is being captured?

METADATA - In Q24 we ask about whether people do catalog records, but didn't get into more detail. Would be good to inquire about how you structure descriptions, what fields are used, what formats (MODS, DUBLIN CORE, etc.). What data is auto-generated/extracted from archive vs. manually created or edited? What percent of auto generated data needs to be corrected? What is the workflow for metadata creation? Are there difficulties resolving descriptions with existing standards?

METRICS - how are people counting their data? how much do they have/volume/size of data? TB, URLs archived, other metrics in use?

Expand on topics from 2011 Survey

TOOLS (Besides crawling and access): Curator tools, supplemental capture tools, other non-crawling modes of aquiring content.

COLLECTION/SELECTION POLICIES: We asked about policies before in Q8-9-10 (links to #9 responses only available to NDSA members, not in final report) how are you making decisions about frequency and depth of collections? Is it budgetary or curatorial? Are your collections related to existing (print?) collections or are they new and different things?

PRESERVATION METHODS - Glen found the section on downloading copies/transferring data (Q20) interesting, would like this asked again but also go into what preservation methods are used: checksums, validation of files, etc.

ACCESS: expand on questions

ROBOTS POLICIES: expand Q28 re: robots.txt to get more details. Nicholas said open-ended comments about this hint at some possible checkbox options if not respecting robots: 1) organizations own copyright 2) we seek permissions so we ignore robots 3) discretionary (if this, ask for details - why would be ignored or not).

PERMISSIONS POLICIES: Get more granularity on permissions policies - has something changed in your policies since ARL guidance issued? Are people relying on embargoes (time frame for those)? Are there other external policies that influence your approach?

RESEARCH USE/ USAGE STATISTICS - a bit along the lines of researcher use in original survey, we wonder about usage statistics that people might be gathering, how they are doing this, what types of "hits" are they getting. We did not that Archive-IT doesn't currently track this but a future release will make that easier. We discussed how Q25 was open ended and many of the answers were "we're not sure, too soon to tell" -