Digitizing Special Formats
This list of resources is curated by the Digital Library Federation for the benefit of cultural heritage professionals planning projects involving the digitization of rare and unique materials. Applicants to the Digitizing Hidden Special Collections and Archives [1] program of the Council on Library and Information Resources [2] may find these helpful in planning project proposals.
Rather than providing comprehensive coverage, this list includes introductory and reference materials that are good places to begin an exploration of issues of broad import to digitizing cultural heritage materials.
Content for this wiki page is currently being curated by the following team: Nicholas Graham, Project Coordinator, North Carolina Digital Heritage Center [3], Lisa Gregory, Digital Projects Librarian, North Carolina Digital Heritage Center, and Tamsyn Rose-Steel, CLIR/DLF Postdoctoral Fellow for Data Curation in Medieval Studies at Johns Hopkins University.
If you would like to suggest a resource for inclusion on this page, send your suggestion to DigiWiki@clir.org. The DLF is currently seeking professionals from DLF member institutions who would like to participate in a curatorial group to develop and maintain the content of this page. Prospective volunteers for this group should also send expressions of interest to DigiWiki@clir.org.
PROJECT PLANNING & MANAGEMENT
This section curated by Nicholas Graham and Lisa Gregory of the North Carolina Digital Heritage Center.
General Resources
The Federal Agencies Digitization Guidelines Initiative (or FADGI) [4] is a collaboratively maintained clearinghouse of information related to digitization, from project planning [5], to digital file formats [6], to technical specifications [7]. FADGI was launched in 2007 under the auspices of the National Digital Information Infrastructure and Preservation Program (NDIIPP). Two groups of professionals from federal agencies develop FADGI outcomes: (1) the Still Image Working Group [8] produces guidelines for creating digital images of cultural heritage materials; and (2) the Audio-Visual Working Group [9] covers the digitization of analog audio and audiovisual recordings as well as the digital reformatting of born-digital audio or audiovisual content. http://www.digitizationguidelines.gov
The Society of American Archivists (SAA) [10] provides a useful list of links to sources of information about digitization standards on its website. http://www2.archivists.org/standards/external/123
The Association of Research Libraries (ARL) [11] developed the Principles to Guide Vendor/Publisher Relations in Large-Scale Digitization Projects of Special Collections Materials [12] in 2010 to help institutions build strong working relationships with commercial partners while creating broad access to their collections. http://www.arl.org/storage/documents/publications/principles_large_scale_digitization.pdf
Working with the Digital Library Federation [13] Assessment Group, Joyce Chapman at the State Library of North Carolina developed the Library Digitization Cost Calculator [14] using data collected by Duke University, the University of Alabama, and the Triangle Research Libraries Network. The tool can help professionals create rough estimates for still image digitization of archival collections. http://statelibrarync.org/plstats/digitization_calculator.php
The UCLA Libraries Special Collections Digital Project Toolkit [15] includes many template documents suitable for planning digitization projects, such as a Digitization Cost Estimate Worksheet [16], a Fair Use Statement [17], a Vendor Decision Matrix [18], Digitization Workflow Guidelines [19], Quality Control Guidelines [20], and more. http://library.ucla.edu/special-collections/programs-projects/digital-projects-special-collections
A number of cultural heritage institutions and professionals have created videos about their digitization work for YouTube [21].
Format-Specific Resources
Text
The EU-based IMPACT Project (IMProving ACcess to Text) [22] provides useful documentation and case studies related to mass digitization of text, optical character recognition (OCR), and estimating digitization costs and storage. http://www.impact-project.eu/taa/strat/pilot-tools
Best Practices for TEI in Libraries [23] provides a recent (2011) overview of possible approaches to incorporating encoded text into large-scale digitization projects. http://www.tei-c.org/SIG/Libraries/teiinlibraries/
Newspapers
The Guidelines and Resources [24] page on the National Digital Newspaper Program (NDNP) website [25] provides information about best practices for digitizing newspapers and making newspaper content broadly accessible and discoverable. The NDNP is a partnership of the National Endowment for the Humanities (NEH) and the Library of Congress. The NEH operates the National Digital Newspaper funding initiative [26], which offers grants specifically for newspaper digitization. http://www.loc.gov/ndnp/guidelines/
The Center for Research Libraries (CRL) [27] maintains the International Coalition on Newspapers (ICON) [28] database, which contains issue and holdings data for nearly 170,000 publications [29] dating from the seventeenth century through the present. The ICON project also includes a directory of digitization efforts around the globe [30]. CRL is actively seeking new contributors to the ICON database; they invite feedback about the project through the database website [31]. http://icon.crl.edu
The International Federation of Library Associations (IFLA) [32] maintains a list of links to best practices for digitizing newspapers and serials. http://www.ifla.org/node/6777
Rare Books & Manuscripts
The International Federation of Library Associations [33] (IFLA) Rare Book and Special Collections Section [34] published its Guidelines for Planning the Digitization of Rare Book and Manuscript Collections [35] in 2014, covering project design, metadata creation, dissemination, and project assessment. http://www.ifla.org/files/assets/rare-books-and-manuscripts/rbms-guidelines/ifla_guidelines_for_planning_the_digitization_of_rare_book_and_manuscripts_collections_september_2014.pdf
Images
The International Press Telecommunications Council (IPTC) Core Standard Specification [36] is a widely used metadata standard for describing photographs and includes details about embedding metadata into digital image files. http://www.iptc.org/cms/site/index.html?channel=CH0099
The International Image Interoperability Framework (IIIF) is a community of research libraries and image repositories collaboratively developing applications and application programming interfaces that can produce an interoperable technology and community framework for image delivery. http://iiif.io/about.html
Audio and Audiovisual Recordings
The State of Recorded Sound Preservation in the United States: A National Legacy at Risk in the Digital Age [37] is a report commissioned by the National Recording Preservation Board [38] of the Library of Congress that gives an overview of the complex legal and technical issues facing the preservation of recorded sound. The Board itself also maintains an Audio Preservation Bibliography [39]. http://www.clir.org/pubs/abstract/reports/pub148
New York consulting company AVPreserve [40] maintains a useful list of publications and presentations on tools and techniques for the preservation of audiovisual media [41]. http://www.avpreserve.com/avpsresources/papers-and-presentations/
A San Francisco nonprofit, the Bay Area Video Coalition (BAVC) [42] has developed a set of Quality Control Tools for Video Preservation [43]. http://www.bavc.org/qctools
Barbara Goldsmith's 2013 Digitizing Video for Long-Term Preservation: An RFP Guide and Template [44] is intended to take an institution step-by-step through the process of drafting a Request for Proposals (RFP) for the transfer of analog video formats to digital carriers for preservation.
Maps
A list of National Geospatial Program Standards and Specifications [45] appears on the United States Geological Survey (USGS) [46] National Map Project website [47]. http://nationalmap.gov/standards/index.html
METADATA AGGREGATION & REGISTRIES
The Digital Public Library of America (DPLA) [48] aggregates the metadata of digital collections held in educational and cultural heritage institutions across the United States. Institutions with large digital collections may contribute data as Content Hubs [49], while smaller organizations may contribute through local or regional Service Hubs [50]. Contributors must abide by the DPLA's data policies [51] in order to participate. See also: An Introduction to the DPLA Metadata Model (pdf) [52]; The DPLA Metadata Application Profile [53]; DPLA Metadata Aggregation Webinar Recording, 1/22/15 [54]. http://dp.la/
Many DPLA Service Hubs and their partners provide useful documentation and links to tools for metadata normalization, quality control, and aggregation through their project websites.
--Digitization Guidelines, North Carolina Digital Heritage Center [55]: http://www.digitalnc.org/about/policies/digitization-guidelines/
--DPLA aggregation tools on GitHub, North Carolina Digital Heritage Center: https://github.com/ncdhc
--Setting Up a Repository for Harvest, Mountain West Digital Library [56]: http://mwdl.org/getinvolved/repository_setup.php
--Portal Partners Page, The Portal to Texas History [57]: http://www.library.unt.edu/digital-projects-unit/our-partners
The Print Archives Preservation Registry (PAPR) [58] collects information about serial titles, print holdings, and archiving terms and conditions. It is a valuable resource for assessing the uniqueness of serial collections and determining the degree of need for digitization of those collections.
http://papr.crl.edu
DIGITAL REPOSITORIES
This section curated by Tamsyn Rose-Steel, CLIR/DLF Postdoctoral Fellow for Data Curation in Medieval Studies.
The Directory of Open Access Repositories (OpenDOAR) [59] is an international directory of academic open access repositories, useful for those seeking options for depositing digital collections or models for developing new digital repositories. http://www.opendoar.org/index.html
The Trustworthy Repositories Audit & Certification: Criteria and Checklist (TRAC) [60] describes the characteristics of secure and sustainable digital repository management. http://www.crl.edu/archiving-preservation/digital-archives/metrics-assessing-and-certifying/trac
The Association of Research Libraries (ARL) Web Accessibility Toolkit [61] explains the principles of accessibility, universal design, and digital inclusion, and offers tips about best practices and checklists for digital repository creators that can help them ensure digitized content is as broadly accessible as possible. http://accessibility.arl.org
DIGITAL PRESERVATION
This section curated by Tamsyn Rose-Steel, CLIR/DLF Postdoctoral Fellow for Data Curation in Medieval Studies.
Thanks to the WikiProject Digital Preservation [62], the information about digital preservation on Wikipedia [63] is substantial and current. http://en.wikipedia.org/wiki/Digital_preservation
The National Digital Stewardship Alliance (NDSA) [64] has published the 2015 NDSA National Agenda for Digital Stewardship [65], which provides a broad overview of current "challenges, opportunities, gaps, and trends" related to building and maintaining digital collections in the United States. See also: The NDSA Levels of Digital Preservation [66]. http://www.digitalpreservation.gov/ndsa/documents/2015NationalAgenda.pdf
The Sustainability of Digital Formats page [67] provides detailed descriptions and notes on sustainability issues for hundreds of digital file formats [68]. http://www.digitalpreservation.gov/formats/intro/intro.shtml
The Northeast Document Conservation Center (NEDCC) [69] has compiled a Digital Preservation Reading List [70] that provides a thorough introduction to the challenges of digital preservation as they relate to cultural heritage collections. Additional links to resources related to digital preservation are provided on NEDCC's website [71]. https://www.nedcc.org/assets/media/documents/DigiPres_Biblio_Digital_Directions_2014_update.pdf
From Theory to Action: “Good Enough” Digital Preservation Solutions for Under-Resourced Cultural Heritage Institutions (2014) is a white paper compiling the results of a three-year study of affordable, scalable digital preservation solutions suitable for under-resourced organizations. http://commons.lib.niu.edu/handle/10843/13610
Digital Preservation Management: Short-Term Strategies for Long-Term Problems is a tutorial created by Cornell University Libraries with funding from the National Endowment for the Humanities. It is now hosted by the MIT Libraries: http://www.dpworkshop.org/dpm-eng/eng_index.html
The Guidelines for Digital Newspaper Preservation Readiness [72] address a specific set of preservation challenges faced by libraries, archives, historical societies, and other organizations that curate substantial collections of digital newspaper content. Guidelines was written by Katherine Skinner and Mat Schultz and was published by the Educopia Institute [73] in 2014.
COPYRIGHT & INTELLECTUAL PROPERTY
This section curated by Nicholas Graham and Lisa Gregory of the North Carolina Digital Heritage Center.
Copyright and Cultural Institutions: Guidelines for Digitization for U.S. Libraries, Archives, and Museums [74] by Peter Hirtle, Emily Hudson, and Andrew Kenyon (2009) provides comprehensive coverage of all major copyright issues relevant to digitization in cultural heritage institutions. This work is also available in print from the Society of American Archivists [75]. http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1495365
The Berkeley Digital Library Copyright Project produces and disseminates research about copyright issues facing cultural heritage institutions building digital collections. http://www.law.berkeley.edu/librarycopyright.htm
The Association of Research Libraries [76], in particular its initiatives on Transforming Special Collections in the Digital Age [77] and on Copyright and Intellectual Property [78], has published a variety of documents relevant to the digitization of special collections and archives, including a 2012 special issue of Research Library Issues that covers legal concerns related to digitizing rare and unique materials. http://publications.arl.org/rli279/1
In 2009, the Society of American Archivists [79] issued Orphan Works: Statement of Best Practices to guide decision-making in conducting research into the copyright status of unpublished collections. The document includes a variety of useful links and appendices. http://www2.archivists.org/sites/all/files/OrphanWorks-June2009.pdf
In 2010, OCLC Research [80] organized a seminar and led the effort to develop a description of Well-intentioned practice for putting digitized collections of unpublished materials online [81], which provides useful guidance for institutions developing sustainable copyright risk assessment strategies for their digitization programs. http://www.oclc.org/research/activities/rights.html
In 2013, the Center for Media & Social Impact [82] issued the Report on Orphan Works Challenges: for libraries, archives, and other memory institutions, a study of the obstacles cultural memory institutions face in their efforts to address concerns about rights and intellectual property that concludes with a set of recommendations for the development and dissemination of best practices for overcoming these obstacles. http://cmsimpact.org/fair-use/related-materials/documents/report-orphan-works-challenges-libraries-archives-and-other-mem
Copyright and Related Issues Relevant to Digital Preservation and Dissemination of Unpublished Pre-1972 Sound Recordings by Libraries and Archives describes what libraries and archives can legally do to preserve and make accessible holdings of unpublished sound recordings. http://www.clir.org/pubs/abstract/reports/pub144
FUNDING OPPORTUNITIES
The National Endowment for the Humanities (NEH) [83] supports digitization and related activities through the Humanities Collections and Reference Resources program [84], the National Digital Newspaper Program [85], and the Preservation and Access Research and Development [86] grants. http://www.neh.gov/
The National Historical Publications & Records Commission (NHPRC) [87] offers funding for digitization and related activities through the Access to Historical Records program [88] and the Digital Dissemination of Archival Collections program [89]. http://www.archives.gov/nhprc/
EDUCATION & TRAINING OPPORTUNITIES
Lyrasis [90] offers a variety of classes relevant to digitization and digital content management, such as Introduction to Audio Visual Digitization, Introduction to Digital Project Management Planning, and Digitization for Small Institutions. The classes and events schedule contains up-to-date information about offerings: https://www.lyrasis.org/Pages/Events.aspx
The Northeast Document Conservation Center (NEDCC) offers a variety of programs, including several related to digital collections and digital preservation.
INFORMATION FROM DIGITIZATION SERVICE PROVIDERS
NOTE: This section is provided for convenience and information only. The Digital Library Federation and the Council on Library and Information Resources do not recommend or endorse any specific digitization service provider, and the use or non-use of any particular provider has no bearing upon any applicant's consideration in the Digitizing Hidden Special Collections and Archives competition.
BMI Imaging Systems, Inc. [91] enables libraries to transform microfilm records into a digital format that provides easy access and image enhancement capabilities never seen before. In addition to the archival TIFF used for inclusion to the State/National repositories, BMI provides a solution that allows patrons to scroll digital microfilm rolls from a computer and use full text search to find records, articles, and photos for interim access. Adjustable grayscale allows users to turn black and white images into real photos. For more information please contact Jake Walker at (800) 488-3456 ext 406 or jwalker@bmiimaging.com.
https://bmiimaging.com
DataBank [92] is a National Document & Information Management Company with over 23 years of experience in document conversion and automation. Their areas of expertise include the conversion of photos, archival documents and legacy microfilm or microfiche collections. They offer seamless integration with repositories for storage and retrieval of scanned media. For additional information contact Kathy Berger, Senior Solutions Consultant, at kberger@databankimx.com or at (603) 463-0154.
http://www.databankimx.com
The Internet Archive [93] (IA) is one of the world’s largest public digital libraries, with an extensive collection of human culture. Its mission includes offering free access to all digital knowledge for researchers, historians, scholars, people with disabilities, and the general public to outstanding collections that exist in digital format. The Internet Archive also offers online access and discovery of digital content, including public domain eBooks [94] and a more selective collection of public domain and non-public domain texts [95].
IA offers non-destructive digitization services - which include image capture, digital processing, preservation, and future proofing of digital data. Items to be digitized can be sent to one of 33 regional digitization centers around the world, or portable equipment can be placed on-site within libraries and archives. Questions: Robert@archive.org
- Overview of IA Workflow:
http://archive.org/details/ProcessDocument
- To reach one of the Internet Archive centers:
https://archive.org/details/texts
- To purchase digitization equipment:
http://www.archive.org/details/tabletopscribesystem
Luna Imaging, Inc.[96] offers digitization and software & hosting services for building and maintaining digital collections. Digitization services include:
- Preservation scanning
- Access capture
- Book capture services
- OCR, PDF, BookReader processing
LYRASIS can support special collections and archival digitization projects by providing:
- Digitization and Project Management Services – working through its Digitization Collaborative LYRASIS can digitize a wide range of source materials including print/manuscript/microfilm/photographic materials/audio/video and film and manage the process for you. See more at: http://www.lyrasis.org/digitize/
- Staff Expertise – information on processes and standards for project planning
- Professional Development Opportunities – LYRASIS offers a wide range of classes and can provide specific digitization classes to suit local needs.
http://lyrasisnow.org/clir-hidden-collections-grants/
The Northeast Document Conservation Center [97] (NEDCC) is an independent conservation laboratory specializing in the conservation and preservation of paper-based collections. NEDCC provides professional conservation treatment for books, maps, photographs, documents, parchment, papyrus, manuscripts, architectural plans, and works of art on paper. NEDCC’s Imaging Services department provides digital imaging services and specializes in rare, historic, and oversize materials, as well as X-Ray Film scanning and reformatting for black and white and color negative films and color transparencies.
Northern Micrographics [98] has over 60 years experience partnering with clients in library, academic, commercial and industrial markets to provide superior preservation imaging products and services. They scan a variety of object types including bound and disbound volumes, photos, maps, microfilm and microfiche. Northern Micrographics can also help place digital collections online with custom software products, ProSeek® and PhotoAtlasTM. They also offer a variety of other services including microfilming, microfilm duplication, metadata development, data conversions, hosting and book binding. Contact Northern Micrographics at 800-236-0850 or at sales@nmt.com to learn more.
Stanford University Libraries (SUL) Digitization Services [99] is a fully-integrated service provider tailored to meet libraries, archives and museums’ heterogeneous collection needs. SUL digitization services support three families of content format: paper-based materials, audiovisual media and born-digital files.
SUL Digitization Services offers:
- Digitization of original materials;
- Large format scanning and image stitching;
- Reformatting of audio and moving image content;
- Reformatting and recovery of files from digital media;
- Preservation-quality master file creation;
- Derivative file creation for discovery and access;
- Secure storage and handling of original materials;
- OCR text processing in plain text, ALTO or PDF;
- Project consultation and planning;
- RFP consultation and vendor management;
- Onsite digitization for fragile content;
- Long-term preservation; and
- Content hosting and discovery solutions.
Stanford University Library's digitization services are provided by Digital Library Systems and Services. For inquiries regarding digitization services, contact digitization-contact@lists.stanford.edu. SUL Digitization Services' brochure [100] provides full details for potential partners.
http://digitization.stanford.edu/
Two Cats Digital [101] has been providing world-class digital imaging and consulting services since 2003 with a particular emphasis on cultural heritage institutions and materials. Their passion is in designing and managing efficient digitization workflows, and for helping our clients bring their valuable collections to light. Our clients include hundreds of institutions including museums, libraries, universities, government agencies, architects, photographers and non-profit organizations. For additional information contact Two Cats at info@twocatdigital.com.
If you would like to suggest a resource for inclusion on this page, send your suggestion to DigiWiki@clir.org. The DLF is currently seeking professionals from DLF member institutions who would like to participate in a working group to develop and maintain the content of this page. Prospective volunteers for this group should also send expressions of interest to DigiWiki@clir.org.