NDSA:Library of Congress

From DLF Wiki
  1. What is the particular preservation goal or challenge you need to accomplish? (for example, re-use, public access, internal access, legal mandate, etc.)
    • The Library’s mission includes sustaining and preserving a universal collection of knowledge and creativity for future generations. Long-term management of digital content supports that mission.
  2. What large scale storage or cloud technologies are you using to meet that challenge? Further, which service providers or tools did you consider and how did you make your choice?
    • We have a technical infrastructure that includes multiple data centers with enterprise-scale disk and tape components. The network, server and storage components are architected and optimized to support requirements of the business and content owners. Our goals are to provide content delivery and content management services that meet the needs of the users and provide secure data management without requiring the users to have to be technically knowledgeable of the specific technologies or components underlying the services. We use technologies from multiple vendors, and continually evaluate the existing infrastructure components and industry trends to plan and maintain a cost-effective set of infrastructure components.
  3. Specifically, what kind of materials are you preserving (text, data sets, photographs, moving images, web pages, etc.) ?
    • We provide long-term storage for all types of digital materials that are being selected and/or managed for the Library’s collections and/or received as part of Copyright processing or LC-sponsored programs and partnerships such as NDIIPP.
    • Examples of types of content:
      1. Archived websites
      2. Digital text content (e.g., digitized newspapers and historical documents)
      3. Still images (e.g., photographs)
      4. Audio
      5. Moving image
    • The Voice of America did a video piece called “Digital Library of Congress” that might be informative. http://www.youtube.com/watch?v=ylFlAQZ0piU
  4. How big is your collection? (In terms of number of objects and storage space required)
    • In 2010, we were managing over 2000 terabytes of disk storage and over 5 petabytes of tape storage in multiple data centers.
  5. What are your performance requirements?
    • We target performance requirements to support the workflows and long-term management goals of the content owners or content stewards. For example, we work closely with stewards of high-volume data sets that are expected to require significant amounts of new and continuing long-term storage over multiple years, so that we can plan for the storage requirements and also identify and optimize the technical workflows specific to each data set (e.g., transfer from outside the Library, movement of files within the Library’s infrastructure, processing on the data sets, indexing requirements, inventory requirements, retrieval patterns, etc). We build on existing models and workflows to establish, document and improve “best practices” that can be leveraged to provide consistently-implemented current and future services.
  6. What storage media have you elected to use? (Disk, Tape, etc)
    • We use both disk storage and tape storage from multiple vendors as modular components in the overall technical infrastructure to meet the preservation objections based on cost, reliability and performance.
  7. What do you think the key advantages of the system you use?
    • See #9 below.
  8. What do you think are the key problems or disadvantages your system present?
    • The Library needs to continue to meet the challenges of providing content management and delivery for existing content as well as newly-evolving and increasingly-complex types of content that do not yet exist. Current and future service users and service providers need to continue to work closely together to ensure that the appropriate levels and types of resources are available to provide cost-effective services (within the context of the Library’s mission, strategic planning, budget and Congressional guidance).
  9. What important principles informed your decision about the particular tool or service you chose to use?
    • Our infrastructure is architected for reliability, scalability and availability. It is architected so that users are provided with services, and do not need to be knowledgeable about the technologies in use. It is architected so that individual components can be upgraded, migrated and/or replaced on cycles appropriate to each component.
  10. How frequently do you migrate from one system to another?
    • Our infrastructure is architected so that individual components can be upgraded, migrated and/or replaced on cycles appropriate to each component. We evaluate the lifecycle of each component, related industry and vendor trends, and cost-effectiveness of continued operation. We develop and implement annual and multi-year infrastructure investment plans in conjunction with LC IT governance processes.
  11. What characteristics of the storage system(s) you use do you feel are particularly well-suited to #*long-term digital preservation?
    • Our technical infrastructure includes the following:
      1. multiple data centers, including an alternate computing facility for disaster recovery;
      2. multiple long-term content management systems (based on automated robotic tape libraries), with tape copies maintained at two data centers in different locations;
      3. continued investment in high-availability network and server components;
      4. file transfer software with built-in transmission integrity checking to prevent silent data corruption;
      5. 24/7 data center management;
      6. proactive tape media management; and
      7. specialized engineering staff with storage expertise.
  12. What functionality or processes have you developed to augment your storage systems in order to meet preservation goals?
    • The content owners and stewards identify requirements in the areas of content management and delivery.
    • We work with them in areas such as:
      1. file inventories for content owners;
      2. file movement workflows;
      3. proactive monitoring of server, storage and network components;
      4. short-term and long-term planning and requests for content delivery and content management storage.
  13. Are there tough requirements for digital preservation, e.g. TRAC, Trusted Repository Audit Checklist, certification, that you wish were more readily handled by your storage system?
    • We work with the business and content owners to identify and plan for how their requirements for current and future services can be met cost-effectively in both the short-term and long-term.
    • We support development of open end-to-end standards and implementation of hardware and software error detection, reporting and correction for electronic data, so that files can remain readable and usable. #*This is particularly critical as the volume of data increases.