NDSA:Harvard Library
These responses pertain to Harvard's Digital Repository Service (DRS). See [1].
1. What is the particular preservation goal or challenge you need to accomplish? (for example, re-use, public access, internal access, legal mandate, etc.)
The DRS provides professionally managed services to ensure the usability of stored digital objects over time. The DRS is both a preservation and an access repository. In other words, its obligations include assurances that stored digital content will remain both viable and accessible into the indefinite future despite a constantly changing technological environment. All objects managed in the DRS will receive the highest level of preservation service consistent with the object's characteristics and the current technical capabilities of the DRS and its staff.
2. What large scale storage or cloud technologies are you using to meet that challenge? Further, why did you choose these particular technologies?
We are currently using SUN SAM/QFS Storage Archive Manager 4.6, Dual Sun T2000 Solaris SAM servers (redundant servers at site 1, disaster recovery failover at site 2), EMC CLARiiON disk storage arrays (a CX3-40 at site 1, CX3-80 at site 2) and a StorageTek SL500 tape library.
3. Specifically, what kind of materials are you preserving (text, data sets, images, moving images, web pages, etc.)
We are preserving audio, web harvests, page-turned content (books, manuscripts, etc.), documents, still images, biomedical images, target images, text, color profiles, and related documentation and rights objects. By next year we will also be preserving email messages and attachments.
4. How big is your collection? (In terms of number of objects and storage space required)
Unique content (not including replications): 123 TB / 23 million files (some of these are compressed files containing many files as is the case for our web harvests)
5. What are your performance requirements? Further, why are these your particular requirements?
(I'll fill this in later)
6. What storage media have you elected to use? (Disk, Tape, etc) Further, why did you choose these particular media?
A combination of disk and tape.
7. What do you think the key advantages of the system you use?
The system is very robust and the replication is automated using SAM/QFS. We've proven that we can detect corruption and replace the bad copies quickly.
8. What do you think are the key problems or disadvantages your system present?
It's difficult to check the integrity of the tape copies.
9. What important principles informed your decision about the particular tool or service you chose to use?
(I'll fill this in later)
10. How frequently do you migrate from one system to another? Further, what is it that prompts you to make these migrations?
In general we migrate our storage system every 5 years. We do this because of maintenance contracts, aging hardware, the desire to move to higher capacity systems.
11. What characteristics of the storage system(s) you use do you feel are particularly well-suited to long-term digital preservation? (High levels of redundancy/resiliency, internal checksumming capabilities, automated tape refresh, etc)
Non-proprietary storage formats (e.g. the use of tar), robustness, no encryption.
12. What functionality or processes have you developed to augment your storage systems in order to meet preservation goals? (Periodic checksum validation, limited human access or novel use of permissions schemes)
Continuous integrity checking (comparing MD5s, comparing total count in database and file system)
13. Are there tough requirements for digital preservation, e.g. TRAC certification, that you wish were more readily handled by your storage system?
Can't think of any