NDSA:August 21, 2013 Meeting Minutes: Difference between revisions

From DLF Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 43: Line 43:


WORKFLOWS - would be helpful to learn more about what workflows people have in place for acquisition of web content.  
WORKFLOWS - would be helpful to learn more about what workflows people have in place for acquisition of web content.  
Multiple choice might be hard; descriptive open-ended comment field?   
Multiple choice might be hard; descriptive open-ended comment field?  (see additional workflow comment in metadata below)


STAFFING/SKILLSETS - All agreed that questions about staffing would be useful. Jefferson suggested we look at the infrastructure wg's staffing survey to see how they asked questions about this (full time vs part time, who is doing what? what skills do they need?
STAFFING/SKILLSETS - All agreed that questions about staffing would be useful. Jefferson suggested we look at the infrastructure wg's staffing survey to see how they asked questions about this (full time vs part time, who is doing what? what skills do they need?  Who is selecting what is being captured?  


METADATA = In Q24 we ask about whether people do catalog records, but didn't get into more detail. Would be good to inquire about how you structure descriptions, what fields are used, what formats (MODS, DUBLIN CORE, etc.). What data is auto-generated/extracted from archive vs. manually created or edited? What percent of auto generated data needs to be corrected? What is the workflow for metadata creation? Are there difficulties resolving descriptions with existing standards?  
METADATA - In Q24 we ask about whether people do catalog records, but didn't get into more detail. Would be good to inquire about how you structure descriptions, what fields are used, what formats (MODS, DUBLIN CORE, etc.). What data is auto-generated/extracted from archive vs. manually created or edited? What percent of auto generated data needs to be corrected? What is the workflow for metadata creation? Are there difficulties resolving descriptions with existing standards?
 
METRICS - how are people counting their data? how much do they have/volume/size of data? TB, URLs archived, other metrics in use?




Line 53: Line 55:
===Expand on topics from 2011 Survey===
===Expand on topics from 2011 Survey===


TOOLS (Besides crawling and access)
TOOLS (Besides crawling and access): Curator tools, supplemental capture tools, other non-crawling modes of aquiring content.
 
COLLECTION/SELECTION POLICIES: We asked about policies before in Q8-9-10 (links to #9 responses only available to NDSA members, not in final report) how are you making decisions about frequency and depth of collections? Is it budgetary or curatorial? Are your collections related to existing (print?) collections or are they new and different things?


PRESERVATION METHODS - Glen found the section on downloading copies/transferring data (Q20) interesting, would like this asked again but also go into what preservation methods are used: checksums, validation of files, etc.  
PRESERVATION METHODS - Glen found the section on downloading copies/transferring data (Q20) interesting, would like this asked again but also go into what preservation methods are used: checksums, validation of files, etc.  


ACCESS: expand on questions
ACCESS: expand on questions
ROBOTS POLICIES: expand Q28 re: robots.txt to get more details. Nicholas said open-ended comments about this hint at some possible checkbox options if not respecting robots: 1) organizations own copyright 2) we seek permissions so we ignore robots 3) discretionary (if this, ask for details - why would be ignored or not).
PERMISSIONS POLICIES: Get more granularity on permissions policies - has something changed in your policies since ARL guidance issued? Are people relying on embargoes (time frame for those)? Are there other external policies that influence your approach?
RESEARCH USE/ USAGE STATISTICS - a bit along the lines of researcher use in original survey, we wonder about usage statistics that people might be gathering, how they are doing this, what types of "hits" are they getting. We did not that Archive-IT doesn't currently track this but a future release will make that easier. We discussed how Q25 was open ended and many of the answers were "we're not sure, too soon to tell" -

Revision as of 17:36, 21 August 2013

Attendees

  • Bailey, Jefferson, Metropolitan NY Library Council
  • Grotke, Abbie | Web Archiving Team Lead, Library of Congress, and Co-Chair of the NDSA Content Working Group | abgr@LOC.GOV | 202-707-2833 | @agrotke
  • Hartman, Cathy | Associate Dean of Libraries, University of North Texas/ Co-Chair of the NDSA Content Working Group | cathy.hartman@UNT.EDU
  • McCain, Edward | University of Missouri | mccaing@missouri.edu
  • McAninch, Glen | Kentucky Department for Libraries and Archives | Glen.McAninch@ky.gov
  • McMillan, Gail | Virginia Polytechnic Institute and State University | gailmac@vt.edu
  • Moffatt, Christie | National Library of Medicine | moffattc@mail.nlm.nih.gov
  • Rudersdorf, Amy | Digital Public Library of America | amy@dp.la
  • Stoller, Michael | New York University | Michael.stoller@NYU.EDU
  • Taylor, Nicholas | Stanford Univ. Libraries
  • Wurl, Joel | National Endowment for the Humanities | jwurl@neh.gov

Agenda

Brainstorm of:

  • what questions we'd like to repeat from the 2011 survey --what
  • topics/issues that were brought up in survey that we might want to delve deeper into
  • new questions we might ask

Discussion Notes

Kristine Hanna (IA) couldn't join us today but submitted these comments to Abbie, which she shared with group:

1) I think it would be extremely helpful to see the progress organizations have made in the last two years. Sort of a "are you better or worse off than in 2011" type of polling.

2) I keep hearing over and over gain the need to understand internal work flows, skill sets required, resources needed to initiate and sustain a web archiving program. And the answers might be different for a one person shop than with a five person team. Or they may be the same. Perhaps we could have more in depth questions around this area.

General Comments

  • Michael found it comforting in reading original survey that things they were struggling with, others were struggling with too.


What to repeat

We didn't get too much into this but talked about problematic questions that we might drop or restructure for this year. Jefferson reported that in analyzing results, respondents seemed to have trouble with questions 13-15 (those about "what subjects in your archives" - we discussed possibly reformulating these questions but no solutions proposed yet. There was interest in keeping in some questions about news, media and journalism. Maybe we can tie those with policy questions regarding certain types of content?


Possible New Areas To Explore

WORKFLOWS - would be helpful to learn more about what workflows people have in place for acquisition of web content. Multiple choice might be hard; descriptive open-ended comment field? (see additional workflow comment in metadata below)

STAFFING/SKILLSETS - All agreed that questions about staffing would be useful. Jefferson suggested we look at the infrastructure wg's staffing survey to see how they asked questions about this (full time vs part time, who is doing what? what skills do they need? Who is selecting what is being captured?

METADATA - In Q24 we ask about whether people do catalog records, but didn't get into more detail. Would be good to inquire about how you structure descriptions, what fields are used, what formats (MODS, DUBLIN CORE, etc.). What data is auto-generated/extracted from archive vs. manually created or edited? What percent of auto generated data needs to be corrected? What is the workflow for metadata creation? Are there difficulties resolving descriptions with existing standards?

METRICS - how are people counting their data? how much do they have/volume/size of data? TB, URLs archived, other metrics in use?


Expand on topics from 2011 Survey

TOOLS (Besides crawling and access): Curator tools, supplemental capture tools, other non-crawling modes of aquiring content.

COLLECTION/SELECTION POLICIES: We asked about policies before in Q8-9-10 (links to #9 responses only available to NDSA members, not in final report) how are you making decisions about frequency and depth of collections? Is it budgetary or curatorial? Are your collections related to existing (print?) collections or are they new and different things?

PRESERVATION METHODS - Glen found the section on downloading copies/transferring data (Q20) interesting, would like this asked again but also go into what preservation methods are used: checksums, validation of files, etc.

ACCESS: expand on questions

ROBOTS POLICIES: expand Q28 re: robots.txt to get more details. Nicholas said open-ended comments about this hint at some possible checkbox options if not respecting robots: 1) organizations own copyright 2) we seek permissions so we ignore robots 3) discretionary (if this, ask for details - why would be ignored or not).

PERMISSIONS POLICIES: Get more granularity on permissions policies - has something changed in your policies since ARL guidance issued? Are people relying on embargoes (time frame for those)? Are there other external policies that influence your approach?

RESEARCH USE/ USAGE STATISTICS - a bit along the lines of researcher use in original survey, we wonder about usage statistics that people might be gathering, how they are doing this, what types of "hits" are they getting. We did not that Archive-IT doesn't currently track this but a future release will make that easier. We discussed how Q25 was open ended and many of the answers were "we're not sure, too soon to tell" -