NDSA:Web Archiving Survey: Difference between revisions

Revision as of 11:16, 12 September 2011

The goals of the survey are to find out the scope of collecting web content in the United States: what organizations collection development policies state (if they have one), what they are actually collecting, and what services are being used to archive, among other things.

If anyone is interested in helping develop this survey, contact Abbie Grotke (abgr@loc.gov).

Draft Survey Questions

Organization information (name, URL, contact, etc.)
- Type of organization:
  - Historical Society
  - College or University
  - Museum
  - Public Library
  - Consortium
  - K-12 School
  - Federal Government
  - State Government (including Archives, state records centers, or Libraries)
  - City Government
  - County Government
  - Other (please describe)
Are you a member of:
- NDSA
- IIPC [are there others?]
What year did you begin archiving?
Are you using an external service or company to archive, or crawling in-house?
- If using an external service, what one? (Archive-IT, IA's crawling services, Hanzo, Iterasi, other) [what is IA's big crawling service called officially?]
- If you are crawling in-house, what crawling tools are you using (Heritrix, Httrack, other)
Does your organization have collection policies that cover web archiving?
- Are these publicly accessible (provide URL)?
- If not, but you are willing to share with NDSA members, please email to ndsa@loc.gov with the subject line: Web Archiving Survey Selection Policy
Do you use web archiving primarily to a) Archive your own web site as a type of institutional record or b)Archive content from other organizations for future research use. or c) both [with an option for comments/description]
Scope of collecting, various Qs about what they archive (allow for comments or description):

What are the general categories that

- Arts & Humanities (Dance, Music, Art, Literature, Film, Television, etc.)
- Blogs and Social Media
  - Blogs
  - Facebook
  - Twitter
  - YouTube and other video
  - All of above as part of regular collecting of websites
  - Other
- Computers and Technology (software, gaming sites, etc.)
- Government
  - Federal Government
  - State Government
  - County Government
  - City Government
- Spontaneous Events, for example: natural disasters, tragedy, environmental events, spontaneous political demonstrations
- Politics and Elections
  - Local elections
  - State elections
  - Federal elections
- Science
- Health
- Society and Culture
- Sports/Sporting Events
- University or Colleges
- News
  - Newspapers
  - Citizen Journalism/Community News
  - Broadcast/Television
- Non-U.S./International content
Please describe the scope of your collections.
Permissions/Copyright
- Do you ask permission to crawl? (always, never, sometimes (depends on the content))
- Do you ask display permissions (access)? (always, never, sometimes (depends on the content)
- Do you respect robots.txt when crawling? (always, never, sometimes
- Describe[comments box to explain any of these, further describe]
Access
- What access tool do you use for viewing Web archives?
- Do you do full text indexing? [yes, for testing only; yes, researchers can utilize; no]
- Public access URL:
Researchers (do we want any questions on research use?)(Yes. Could ask in an open-ended way how researchers are using the content)
Ever participated in a collaborative web archive (give examples), yes/no
- if so, describe role/project
Interested in collaborating on future projects?

Distribution

NDSA/NDIIPP listservs/blog/twitter, etc.
IIPC Curators list
Archive-IT list
ALA groups (need to find people we don't normally talk to)

@@ Line 20: / Line 20: @@
 *Are you a  member of:
 **NDSA
-**IIPC   [are there others?]
+**IIPC   '''[are there others?]'''
 *What year did you begin archiving?
 *Are you using an external service or company to archive, or crawling in-house?
-**If using an external service, what one? (Archive-IT, IA's crawling services, Hanzo, Iterasi, other)
+**If using an external service, what one? (Archive-IT, IA's crawling services, Hanzo, Iterasi, other)  '''[what is IA's big crawling service called officially?]'''
 **If you are crawling in-house, what crawling tools are you using (Heritrix, Httrack, other)
 *Does your organization have collection policies that cover web archiving?