NDSA:National Library of Medicine Responses
NLM’s responses
1. What is the particular preservation goal or challenge you need to accomplish? (for example, re-use, public access, internal access, legal mandate, etc.) • We have a legal mandate to preserve biomedical literature. NLM’s goal is to provide public access to materials.
2. What large scale storage or cloud technologies are you using to meet that challenge? Further, which service providers or tools did you consider and how did you make your choice? • NetApp is our standard storage vendor. Considering tiered storage strategy utilizing lower cost storage for images not accessed frequently rather than have them set around on expensive storage but still have them available on spinning disks when requested. We have IAAS, Infrastructure as a Service, cloud services but have not determined if or how preservation services will utilize. SAAS, Software as a Service, cloud services has not been considered.
3. Specifically, what kind of materials are you preserving (text, data sets, photographs, moving images, web pages, etc.) • Note: images not yet implemented, but in Pilot #3. Text, photographs, moving images, audio, monographs, reports
4. How big is your collection? (In terms of number of objects and storage space required) • By end of 2011: ~6300 books, ~50 videos, ~7 TB for one copy (MHL+cholera+video). Additional copies of all content are maintained for site replication, QA process and backup.
5. What are your performance requirements? • TBD – tested at 10 accesses per second – virtualization and clustered-server architecture will facilitate scaling of resources to provide desired performance as load increases. End to end performance requirements will be established and the storage must support the requirement.
6. What storage media have you elected to use? (Disk, Tape, etc) • Disk and tape TBD for offline full content backup.
7. What do you think the key advantages of the system you use? • Availability, reliability, snapshots, snapmirror, thin provisioning.
8. What do you think are the key problems or disadvantages your system present? • Very expensive storage solution for a very large repository which is why we are considering the tiered storage solution.
9. What important principles informed your decision about the particular tool or service you chose to use? • Flexibility, scalability, satisfaction of functional requirements, community support. NAS, Network attached Storage, provides reliability and ease of management.
10. How frequently do you migrate from one system to another? • Migration has not yet occurred. New systems would be introduced about every 3 years. With a centralized storage pool the introduction of new storage is really not an issue.
11. What characteristics of the storage system(s) you use do you feel are particularly well-suited to long-term digital preservation? (High levels of redundancy/resiliency, internal checksumming capabilities, automated tape refresh, etc) • Redundancy, availability, snapmirror, site to site replication.
12. What functionality or processes have you developed to augment your storage systems in order to meet preservation goals? (Periodic checksum validation, limited human access or novel use of permissions schemes) • Periodic checksum validation & limited write access to preservation directories
13. Are there tough requirements for digital preservation, e.g. TRAC, Trusted Repository Audit Checklist, certification, that you wish were more readily handled by your storage system? • TRAC certification is a likely NLM goal in the future. Should these tests be performed by storage or by the application layer, e.g. Checksum, self healing, replication, backup and format characterization is to be determined.