NDSA:November 2 Blog Preservation Meeting Minutes

From DLF Wiki
Revision as of 15:18, 11 February 2016 by Dlfadm (talk | contribs) (21 revisions imported: Migrate NDSA content from Library of Congress)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

November 2, 2011, 11am ET

Attendees

  • Anderson, Janice Snyder | Georgetown University Law Library | anderjan@law.georgetown.edu
  • Anderson, Martha | Director, NDIIPP, Library of Congress | mande@loc.gov
  • Baker, Timothy D. | Maryland State Archives | timb@MDSA.NET
  • Beers, Elizabeth | University of Michigan Library | embeers@umich.edu
  • Carpenter, Kris | Internet Archive | kcarpenter@archive.org
  • Chudnoff, Dan | George Washington University | dchud@gwu.edu
  • Fallon, Tessa | Columbia University | taf2111@columbia.edu
  • Fido-Radin, Ben | Rhizome | ben.finoradin@rhizome.org
  • Grotke, Abbie | Library of Congress, Co-Chair of the NDSA Content Working Group | abgr@LOC.GOV
  • Hanna, Kristine | Internet Archive | kristine@ARCHIVE.ORG
  • Hartman, Cathy | University of North Texas/ Co-Chair of the NDSA Content Working Group | cathy.hartman@UNT.EDU
  • Jones, Gina | Library of Congress | gjon@loc.gov
  • Johnston, Leslie | Library of Congress | lesliej@loc.gov
  • Moffatt, Christie | National Library of Medicine | moffattc@mail.nlm.nih.gov
  • Nacin, Andrew | Wordpress | andrewnacin@gmail.com
  • Owens, Trevor | Library of Congress | trow@loc.gov
  • Potter, Abbey | Library of Congress | abpo@LOC.GOV
  • Reib, Linda | Arizona State Library, Archives, and Public Records | lreib@LIB.AZ.US
  • Schmitz Furhrig, Lynda | Smithsonian Institution | SchmitzfuhrigL@si.edu
  • Smith, Stephanie | Maryland State Archives
  • Taylor, Nicholas | Library of Congress | ntay@loc.gov
  • Wurl, Joel | National Endowment for the Humanities | jwurl@neh.gov

(Attendees from NDSA member organizations in bold)

Agenda

Welcome/Introductions

Attendees introduced themselves and talked about their specific interests in this project.

Brief report on background of this idea

Abbie provided a quick report on how we got to this meeting (referring to the distributed blog proposal). She summed up the three ideas listed in that proposal, but explained that this meeting was to focus particularly on the "Flag for opt-in to preservation and harvesting" idea.

Discussion

Some of the questions received before the meeting included:

*If a blog owner opts-in, would that guarantee preservation?

The idea is that mostly likely, yes: IA will crawl everything that is flagged by site owners for preservation. Other NDSA members will be able to select from the list what they would like to include in their own archives. Multiple organizations may collect the same URLs (duplication is not a bad thing).

*Will blog owners expect backup services if they opt-in?

We talked quite a bit about making sure the purpose of the pilot is clear to site owners, and that this particular plugin is not meant to be able to provide backup services. Ideas #1 and #2 about downloading a for personal backup would cover this sort of request, most likely.

*How do we get notified that blogs have opted-in? Are notices sent somewhere? Some other ideas:

    • A Google spreadsheet that gets auto-updated; preservationists could refer to and pick and choose what to preserve.
    • Machine-readable tag that could be used to auto-detect sites that have opted-in (a la creative commons)
    • A feed of some sort?

Andrew demonstrated a mockup of what a simple plugin might look like, which is essentially just "submit for preservation" button. The group discussed a number or options for what would happen upon submitting. The easiest approach is to have that data get sent to an established URL to populate a database. More details on what we will do initially are below, in the proposal for moving ahead / next steps.

*How often/frequently would notifications or updates to whatever process we put in place occur?

We discussed frequency of archiving a bit but didn't go into great detail about the frequency of notifications. Basically this will occur real time - as the site owner clicks to opt-in, data will be sent to our NDSA database. If the site owner changes his/her mind and decides to STOP participating, that notice will also be sent. We need to make it very clear that if they OPT OUT after opting in, we will not DELETE their content already preserved. We will just stop archiving moving forward.

*Is a license/agreement needed?

We didn't really go into details about this specifically, but there were concerns about copyright and whether or not preservation could include comments, posts by people other than the site owner who installs the plugins. We discussed whether parts of the sites could be identified for archiving; these seems too difficult to manage.

If a site owner opts in, does that give us explicit right to archive even if the blog/site contains content produced by others? We discussed this topic. LC, when asking permission, lets the site owner tell us if they can't grant permission for all of the content; if they say they can't then we don't archive. For this project, we're assuming that if the site owner opts in, it's okay to preserve. We discussed providing some language that could be posted to their sites telling commenters/contributors that they are potentially being archived.

We will have text describing what they are opting in to within the plugin (or available via a link to somewhere describing it all).

*What sorts of information would we want from the blog owner besides permission? (category/subject? Frequency of change information? Other data?)

We discussed and came to agreement on this core set for now:

    • URL (will be sent automatically)
    • Title (will be scraped from site automatically)
    • Category (NDSA to come up with a list for site owner to select from)
    • Description (site owner to fill out)
    • Name/Contact information
    • Option to select a creative commons license to go along with it (based on how IA does it with donations to the archive)

*Do we want data shared about who is preserving what? If organizations are picking and choosing what to preserve from the available blogs, might be good to have that information available to others (publicly or among NDSA members?)

The list of sites/blogs who opt-in, and ideally who has agreed to archive what, will be initially published to NDSA members, but wider distribution was discussed. For the pilot, we will focus on just NDSA members but we will explore this issue further. There was talk of opening it up at least to IIPC members and allowing them access.

We talked about this plugin being a two-way communication - this is not only alerting NDSA of sites to archive, but we could also send back information to the site owner about who has selected their site for archiving. We talked about providing links to the archived sites - though since some organizations make content available after an embargo period (which is not always consistent between organizations), this could be tricky, but something to consider.

Andrew provided some statistics about Wordpress.org sites. 2/3rds of them are from the international community (so IIPC should have high interest). We hope to discuss the pilot at the IIPC GA in May.

*Other topics that came up during the discussion:

    • Archiving will capture the look and feel/functionality, images, etc. not just the text. We discussed capture of feeds but at this time we're not exploring that.
    • This would not include FTP access to content
    • If NDSA members are interested in encouraging blog owners to install the plugin so that archiving is easier, they could send a link to the plugin page with instructions for installing. The pilot project will not include a "permissions letter" per se since the plugin opt-in grants the permissions required.
    • Mention of Creative Commons licenses - we hope that we can encourage blog owners to put these on their sites.
    • Andrew suggested he could also code the plugin to do a look-up in archive.org to show what's already been archived. Bring awareness to archiving in general.
    • Should the embargo be consistant among all NDSA members participating?
    • NDSA members participating should agree to some set of basic requirements for archiving that is consistent.
    • What if a site owner doesn't want to be preserved by a particular NDSA member? We talked about but decided they won't have an option to pick and choose if they opt-in. If it becomes an issue we will discuss further during the pilot.
    • If a site owner changes their mind, they can opt out of future archiving, but none of their previously archived content will be deleted.
    • Could a statement of opt-in also be inserted into the robots.txt file? We may explore this during the pilot.
    • Other stats from Andrew: 14% of web are wordpress sites, and 22 of every 100 new domains registered become wordpress sites.
    • This project is a great opportunity for outreach about the importance of preservation and re: NDSA.

Next Steps/Action Items

A smaller group (Abbie, Andrew, Martha, Leslie, Nicholas, Trevor, and Kris met after the call ended to discuss next steps based on the discussions with the larger group. Objective for the controlled pilot will be a proof of concept, eventually we hope to engage wider web archiving community (IIPC).

The outcome was:

*CODE THE PLUGIN / TEST

In next couple of weeks:

Abbie will work on drafting initial text to go with the opt-in buttons and fields of data to collect to send onto Andrew - this won't be the "perfect" list of categories but something to start with, so we can get something up quickly to show to the wider group and respond to and improve upon.

Kris will have Vanay to work with Andrew -- IA will be hosting (at least for the pilot) the "database" of URLs and data. Plugin will be set up to ping IA's database when sites are updated (so we can get an idea of frequency of change)

Andrew will code the plugin (under 4 hrs he said) and send it out for testing. Interested NDSA members will be asked to install and test before we open up the pilot more widely.

*LC CLEARANCES FOR HOW TO DESCRIBE PARTNERSHIPS/Etc.

While the above is happening, LC needs to talk to public affairs and lawyers to make sure we describe this appropriately. Wordpress or Andrew will be the author of the plugin with sponsorship by LC, IA. Idea is "Joint project of LC, IA, Wordpress" with a mention of NDSA. Wording of description within plugin - how we talk about this with the public, etc. will need to be run past folks at LC.

*ROLL OUT

Time frame of pilot start date/end date still TBD. When we are ready, we plan to ppen it up for anyone to install. See how it goes.

*SORTING OUT OF WHO ARCHIVES WHAT

Once we start seeing the results of who is opting-in, then interested NDSA members will need to get active to figure out how to pick and choose from the list, how to get that recorded, etc. Future meetings of NDSA Content Work Group will be set up to discuss further as we go.