NDSA:Columbia University

From DLF Wiki
  1. What is the particular preservation goal or challenge you need to accomplish? (for example, re-use, public access, internal access, legal mandate, etc.)
    • Design & implement coherent & comprehensive preservation program for ensuring survival & continued accessibility of Libraries’ digital content. Develop & budget for long-term digital archiving strategy for content created by the Libraries, whether “born-digital” or converted from analog formats.
    • Provide stable, secure storage for large-scale access & long- term preservation
    • Support efficient creation & management of administrative, descriptive, structural, preservation & rights metadata
    • Support object relationships, actions, behaviors, fine-grained access control policies
  2. What large scale storage or cloud technologies are you using to meet that challenge? Further, why did you choose these particular technologies?
    • Fedora version 3
    • SUN SAM-FS platform, four copies, two on disk, two on tape
    • 70TB effective storage with 9.6TB tier I disk cache
    • Offsite disk storage at NYSERNet Data Center, Syracuse, New York, dedicated 1Gb/s network link to Columbia
    • Risk averse - use "tried and true" technologies
    • Open to maximize sustainability and flexibility
    • Entrance and exit strategy
  3. Specifically, what kind of materials are you preserving (text, data sets, images, moving images, web pages, etc.)
    • text, images, data sets, audio, limited video
  4. How big is your collection? (In terms of number of objects and storage space required)
    • TBD
  5. What are your performance requirements? Further, why are these your particular requirements?
    • System is an "accessible repository" with low latency access to data.
    • Decision to build consolidated system based on current size of collection, desire to provide ready access to materials.
  6. What storage media have you elected to use? (Disk, Tape, etc) Further, why did you choose these particular media?
    • Two copies of disk, two copies on tape, with one remote disk copy in Syracuse.
    • Two copies on disk support fixity checking.
    • Tape copy supports offline and offsite backup.
  7. What do you think the key advantages of the system you use?
    • SAM automatically replicates data based on defined policies
    • SAM automatically brings data from SATA storage into higher performance fibre-channel storage
  8. What do you think are the key problems or disadvantages your system present?
    • System was a conservative choice, has commercial support, and has met our needs.
    • Oracle acquisition has created some uncertainty regarding hardware.
  9. What important principles informed your decision about the particular tool or service you chose to use?
  10. How frequently do you migrate from one system to another? Further, what is it that prompts you to make these migrations?
    • We will migrate at the end of the equipment lifecycle (4-5 years). We haven't decided if we will migrate off of SAM-FS.
  11. What characteristics of the storage system(s) you use do you feel are particularly well-suited to long-term digital preservation? (High levels of redundancy/resiliency, internal checksumming capabilities, automated tape refresh, etc)
  12. What functionality or processes have you developed to augment your storage systems in order to meet preservation goals? (Periodic checksum validation, limited human access or novel use of permissions schemes)
  13. Are there tough requirements for digital preservation, e.g. TRAC certification, that you wish were more readily handled by your storage system?