Assessment:Costs:Toolkit: Difference between revisions
Added in Sections |
No edit summary |
||
(5 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
== | '''This site is currently under development by the Cost Assessment Working Group. Please check back at the end of 2024.''' | ||
In 2021 the Cost group surveyed a variety of digitization stakeholders from across the Digital Library Federation. The data collected helps inform what is needed for successful digitization projects or starting a digitization program. The group formed subgroups to work on various resources and issues: insourcing vs. outsourcing, a registry for vendors, assessing scans per linear feet, best practices, and cost estimation. The groups analyzed the data produced by the surveys and have developed reports and this toolkit to help digitization professionals in their decision making as it relates to logistics and cost. | |||
Final reports and anonymized data for all survey groups can be found in OSF (https://osf.io/6dnmh/). | |||
==State of the Field== | |||
==In Sourcing vs Outsourcing== | ==In Sourcing vs Outsourcing== | ||
Line 13: | Line 16: | ||
''Audio / Video / Moving Image'' | ''Audio / Video / Moving Image'' | ||
*Clemson University - AV Digitization | *[https://osf.io/eas2x/ Clemson University - AV Digitization] | ||
*Hoover case study - Audio tapes | *[https://osf.io/4eryh/ Hoover case study - Audio tapes] | ||
*MIT Libraries case study - AVMI materials | *[https://osf.io/y3r8q/ MIT Libraries case study - AVMI materials] | ||
*UMichigan - Super 8 film | *[https://osf.io/hp8g2/ UMichigan - Super 8 film] | ||
''Still Image'' | ''Still Image'' | ||
*Hoover case study - Posters | *[https://osf.io/2tvky/ Hoover case study - Posters] | ||
*UMichigan - Medieval Manuscripts and Scrolls | *[https://osf.io/jwgkm/ UMichigan - Medieval Manuscripts and Scrolls] | ||
*UMichigan - Scrapbooks and Photo Albums | *[https://osf.io/vcju7/ UMichigan - Scrapbooks and Photo Albums] | ||
*University of North Texas - Yearbook Digitization | *[https://osf.io/cy3za/ University of North Texas - Yearbook Digitization] | ||
''Text'' | ''Text'' | ||
*Hoover case study - Text-Based Archival Collection | *[https://osf.io/e29v5/ Hoover case study - Text-Based Archival Collection] | ||
*New York University case study - Access level (books) | *[https://osf.io/zj56e/ New York University case study - Access level (books)] | ||
*UCalgary case study - Administrative documents archival collection | *[https://osf.io/7zu4r/ UCalgary case study - Administrative documents archival collection] | ||
==== Approach/Solution ==== | ==== Approach/Solution ==== | ||
''Insourced'' | ''Insourced'' | ||
*Hoover case study - Text-Based Archival Collection | *[https://osf.io/e29v5/ Hoover case study - Text-Based Archival Collection] | ||
*UMichigan - Medieval Manuscripts and Scrolls | *[https://osf.io/jwgkm/ UMichigan - Medieval Manuscripts and Scrolls] | ||
*UMichigan - Scrapbooks and Photo Albums | *[https://osf.io/vcju7/ UMichigan - Scrapbooks and Photo Albums] | ||
''Outsourced'' | ''Outsourced'' | ||
*Clemson University - AV Digitization | *[https://osf.io/eas2x/ Clemson University - AV Digitization] | ||
*Hoover Case study - Audio tapes | *[https://osf.io/4eryh/ Hoover Case study - Audio tapes] | ||
*MIT Libraries case study - AVMI materials | *[https://osf.io/y3r8q/ MIT Libraries case study - AVMI materials] | ||
*University of North Texas - Yearbook Digitization | *[https://osf.io/cy3za/ University of North Texas - Yearbook Digitization] | ||
''Hybrid/Blended'' | ''Hybrid/Blended'' | ||
*Hoover Case study - Posters | *[https://osf.io/2tvky/ Hoover Case study - Posters] | ||
*UCalgary Case study - Administrative documents archival collection | *[https://osf.io/7zu4r/ UCalgary Case study - Administrative documents archival collection] | ||
''Undecided'' | ''Undecided'' | ||
*New York University case study - Access level (Books) | *[https://osf.io/zj56e/ New York University case study - Access level] (Books) | ||
*UMichigan - Super 8 film | *[https://osf.io/hp8g2/ UMichigan - Super 8 film] | ||
==== Institutional Information ==== | ==== Institutional Information ==== | ||
Line 56: | Line 59: | ||
Categories included: institution size, digitization experience, concerns about insourcing vs outsourcing, risk tolerance for outsourced digitization, risk tolerance for copyright, funding, IT resources, and skills available. | Categories included: institution size, digitization experience, concerns about insourcing vs outsourcing, risk tolerance for outsourced digitization, risk tolerance for copyright, funding, IT resources, and skills available. | ||
Hoover Institution | [https://osf.io/yjh68/ Hoover Institution] | ||
New York University | [https://osf.io/ujp78/ New York University] | ||
UMichigan | [https://osf.io/n86vb/ UMichigan] | ||
==Scans per Linear Feet== | ==Scans per Linear Feet== | ||
The Cost Assessment Scans Per Linear Feet Subgroup was created to fact check current estimations published online for calculating images per linear feet. We also evaluated how differing digitization and housing practices as well as content type changed the amount of scans per linear foot. The results from this work can be used when determining the cost of a project using the cost calculator and/or other means of assessing costs like conversations with vendors. | The Cost Assessment Scans Per Linear Feet Subgroup was created to fact check current estimations published online for calculating images per linear feet. We also evaluated how differing digitization and housing practices as well as content type changed the amount of scans per linear foot. The results from this work can be used when determining the cost of a project using the DLF cost calculator and/or other means of assessing costs like conversations with vendors. | ||
==== Data ==== | ==== Data ==== | ||
Line 127: | Line 130: | ||
The above data can be used to calculate and estimate the number of images likely to be produced by a digitization project. You can tailor your calculations based on box type, linear feet, and/or content type. Depending on your institution's processing practices, you may fall to one side of the min/average/max spectrum. Keep that in mind as you choose the number to use in your calculations. | The above data can be used to calculate and estimate the number of images likely to be produced by a digitization project. You can tailor your calculations based on box type, linear feet, and/or content type. Depending on your institution's processing practices, you may fall to one side of the min/average/max spectrum. Keep that in mind as you choose the number to use in your calculations. | ||
''Using with the DLF Cost Calculator:'' | |||
''Using with the Cost Calculator:'' | |||
If using the cost calculator on DLF’s website, the calculator asks for a number of images. Use the appropriate chart to estimate the number of images for your project. Just a reminder, the cost calculator is no longer being maintained so use at your own risk. | If using the cost calculator on DLF’s website, the calculator asks for a number of images. Use the appropriate chart to estimate the number of images for your project. Just a reminder, the cost calculator is no longer being maintained so use at your own risk. | ||
''Using with the Cost Calculator Worksheet:'' | ''Using with the DLF Cost Calculator Worksheet:'' | ||
The worksheet uses items and scans per item to calculate your number of estimated images. You can convert our data to fit this equation in multiple ways. Here are two examples: | The worksheet uses items and scans per item to calculate your number of estimated images. You can convert our data to fit this equation in multiple ways. Here are two examples: | ||
Line 145: | Line 146: | ||
You have a 10 Hollinger box collection of mixed: unbound pages, bound/folded documents. You notice that the boxes aren’t loose but aren’t packed tight. In this scenario, it would be appropriate to select the average number on the scale which is 2,951. In column B you would input 10, as the boxes will act as items, and in column C, 2,951 which is the number of images per item. Your total estimated image count would be 29,510. | You have a 10 Hollinger box collection of mixed: unbound pages, bound/folded documents. You notice that the boxes aren’t loose but aren’t packed tight. In this scenario, it would be appropriate to select the average number on the scale which is 2,951. In column B you would input 10, as the boxes will act as items, and in column C, 2,951 which is the number of images per item. Your total estimated image count would be 29,510. | ||
==Cost Calculator== | |||
The Cost Calculator Subgroup was formed to explore easier-to-maintain alternatives to the aging DLF Cost Calculator, to meet the needs of the community. The original calculator was developed in 2014-2016, and its underlying data set now has a bad data point that significantly impacts the metadata cost results. Due to changes over time in the connection to the platform hosting the data set, it is not possible to correct that data point. In addition, any improvements would require work by a Ruby on Rails developer. | |||
The approach of a downloadable spreadsheet-based digitization cost calculator was inspired by [https://www.oclc.org/research/publications/2021/oclcresearch-total-cost-of-stewardship.html OCLC Total Cost of Stewardship: Responsible Collection Building in Archives and Special Collections] — specifically the idea of a downloadable tools zip file or individual Excel files. | |||
A spreadsheet-based digitization cost calculator is much easier to maintain. Also, significantly, it is easy to see the calculations and the data they are based on, and it’s easy for a user to customize for their needs. | |||
The group developed a spreadsheet-based digitization cost calculator as a Google Sheet and a matching downloadable Excel file. The initial structure and underlying data were provided by the University of Oklahoma. The underlying data are the images per hour and labor cost for each type of work included, for each equipment or material type; typical image size for each equipment or material type; and storage cost. | |||
[ screenshot of basic tab of Google sheet, upper left portion ] | |||
The spreadsheet cost calculator can be used as-is for rough estimates, or customized to better match the capabilities of the user’s organization. | |||
Use is straightforward, with some basic guidance in a readme tab. The spreadsheet has one tab with a stripped-down version of the cost calculator, a second tab with a fuller version that includes more options for equipment or materials, and a readme tab that includes a brief overview of what the calculator does, how to use it, and how to customize it. | |||
[ link to current spreadsheet Google Sheet to copy ] | |||
[ link to current spreadsheet Excel file to download ] | |||
Customization by a user is straightforward, whether it’s changing the underlying data in the cells or adjusting what equipment or materials to cover. It’s also possible, with a little more work, to add or remove types of work included (i.e., columns – currently capture, post-processing, basic metadata assignment, creating preservation files, and preparing for storage). | |||
==Vendor Registry== | |||
The Vendor Registry group was created to get feedback from the library digitization community about their experiences with equipment and service digitization vendors within the cultural heritage sector. In 2023 the Vendor Registry subgroup circulated two surveys. Both surveys provided a list of vendors curated from a previous vendor registry, as well as provided an opportunity for respondents to add vendors to the list. Respondents were also asked to indicate whether they would be available to be contacted directly to answer questions. | |||
The subgroup compiled the survey results into an overview document that listed each vendor, its | |||
website, a description of product or services received, and overall service experience. These results were then made publicly available via [https://osf.io/6m8da/ Reformatting Service & Equipment Vendor Registry Survey Results]. The Survey Results document also included an indicator whether there were survey respondents that would be willing to share their contact information with others to provide additional details about their experiences. | |||
==Contact Information== | |||
Members of the [https://wiki.diglib.org/Assessment:Costs Cost Assessment Interest Group] voted to sunset the group in favor of creating a new working group focusing on the logistics of digitization. Questions about this toolkit can be directed either to leadership of the Assessment Interest Group or to the new group set which will launch in 2025. For anyone interested in continuing the assessment work of the Cost group, contact the [https://wiki.diglib.org/Assessment Assessment Interest Group leadership]. |
Latest revision as of 16:02, 9 December 2024
This site is currently under development by the Cost Assessment Working Group. Please check back at the end of 2024.
In 2021 the Cost group surveyed a variety of digitization stakeholders from across the Digital Library Federation. The data collected helps inform what is needed for successful digitization projects or starting a digitization program. The group formed subgroups to work on various resources and issues: insourcing vs. outsourcing, a registry for vendors, assessing scans per linear feet, best practices, and cost estimation. The groups analyzed the data produced by the surveys and have developed reports and this toolkit to help digitization professionals in their decision making as it relates to logistics and cost. Final reports and anonymized data for all survey groups can be found in OSF (https://osf.io/6dnmh/).
State of the Field
In Sourcing vs Outsourcing
The case studies include information about what the institution is like, the collection being digitized, and the factors that affect the digitization method chosen - number of items, fragility, staffing, equipment, money, etc. The case studies explore what the initial choice was - insource, outsource, both - how digitization went, and a brief evaluation about the lessons learned about that approach. They discuss whether the project was entirely successful, proved too difficult as scoped originally, or had mixed results - some parts were completed well, but other parts of the project proved too difficult to complete. All situations are informative.
Case studies are available for review by original format and by approach/solution.
Original Format
Audio / Video / Moving Image
- Clemson University - AV Digitization
- Hoover case study - Audio tapes
- MIT Libraries case study - AVMI materials
- UMichigan - Super 8 film
Still Image
- Hoover case study - Posters
- UMichigan - Medieval Manuscripts and Scrolls
- UMichigan - Scrapbooks and Photo Albums
- University of North Texas - Yearbook Digitization
Text
- Hoover case study - Text-Based Archival Collection
- New York University case study - Access level (books)
- UCalgary case study - Administrative documents archival collection
Approach/Solution
Insourced
- Hoover case study - Text-Based Archival Collection
- UMichigan - Medieval Manuscripts and Scrolls
- UMichigan - Scrapbooks and Photo Albums
Outsourced
- Clemson University - AV Digitization
- Hoover Case study - Audio tapes
- MIT Libraries case study - AVMI materials
- University of North Texas - Yearbook Digitization
Hybrid/Blended
Undecided
Institutional Information
Respondents were given the option to provide information about their institutions in relation to their case studies. The aim was to present this information to readers so they can see if their peers have similar circumstances and how they handled them.
Categories included: institution size, digitization experience, concerns about insourcing vs outsourcing, risk tolerance for outsourced digitization, risk tolerance for copyright, funding, IT resources, and skills available.
Hoover Institution New York University UMichigan
Scans per Linear Feet
The Cost Assessment Scans Per Linear Feet Subgroup was created to fact check current estimations published online for calculating images per linear feet. We also evaluated how differing digitization and housing practices as well as content type changed the amount of scans per linear foot. The results from this work can be used when determining the cost of a project using the DLF cost calculator and/or other means of assessing costs like conversations with vendors.
Data
Further information about the survey, gathered data, and calculations can be found here: https://osf.io/ag9z3/
Content Type (Min / Average / Max) Linear Feet
- Photographs: 63 entries (658 / 1,871 / 2,515)
- Unbound pages: 120 entries (393 / 1,914 / 6,849)
- Mixed: Unbound pages, bound/folded paper documents: 15 entries (1,482 / 2,125 / 4,274)
Content Type (Linear Feet) | Number of Entries | Minimum | Average | Maximum |
---|---|---|---|---|
Photographs | 63 | 658 | 1,871 | 2,515 |
Unbound Pages | 120 | 393 | 1,914 | 6,849 |
Mixed: Unbound Pages, Bound/Folded Documents | 15 | 1,482 | 2,125 | 4,274 |
Box Type (Min / Average / Max) Linear Feet
- Full Hollinger: 168 entries (103 / 2108 / 6,849)
- Paige/Banker/Records (filed legal): 64 entries (658 / 1,994 / 4,141)
- Paige/Banker/Records (filed letter): 17 entries (1,792 / 2,469 / 3,742)
Box Type (Linear Feet) | Number of Entries | Minimum | Average | Maximum |
---|---|---|---|---|
Full Hollinger | 168 | 103 | 2,108 | 6,849 |
Paige/Banker/Records (filed legal) | 64 | 658 | 1,994 | 4,141 |
Paige/Banker/Records (filed letter) | 17 | 1,792 | 2,469 | 3,742 |
Images per Box Type (Not in linear feet)
- Full Hollinger: 168 entries (45 / 923 / 3,000)
- Paige/Banker/Records (filed legal): 64 entries (672 / 2,007 / 3,451)
- Paige/Banker/Records (filed letter): 17 entries (2,009 / 2,951 / 4677)
Box Type (Not in Linear Feet) | Number of Entries | Minimum | Average | Maximum |
---|---|---|---|---|
Full Hollinger | 168 | 45 | 923 | 3,000 |
Paige/Banker/Records (filed legal) | 64 | 672 | 2,007 | 3,451 |
Paige/Banker/Records (filed letter) | 17 | 2,009 | 2,951 | 4,677 |
How to Use This Data:
General Use: The above data can be used to calculate and estimate the number of images likely to be produced by a digitization project. You can tailor your calculations based on box type, linear feet, and/or content type. Depending on your institution's processing practices, you may fall to one side of the min/average/max spectrum. Keep that in mind as you choose the number to use in your calculations.
Using with the DLF Cost Calculator: If using the cost calculator on DLF’s website, the calculator asks for a number of images. Use the appropriate chart to estimate the number of images for your project. Just a reminder, the cost calculator is no longer being maintained so use at your own risk.
Using with the DLF Cost Calculator Worksheet: The worksheet uses items and scans per item to calculate your number of estimated images. You can convert our data to fit this equation in multiple ways. Here are two examples:
Example 1: Column B represents the total amount of images you estimate using whichever chart is relevant to the data you have (boxes or linear feet). Since we’re calculating the number of images instead of items you would have a 1 to 1 ratio of images to “item”. So, column C is set to 1.
You have a 10 linear feet collection of Mixed: Unbound Pages, Bound/Folded Documents. You notice that your boxes are packed tight. In this scenario, it would be appropriate to select the max number on the scale, which is 4,274 images/linear foot. This means you would have 42,740 images. 42,740 would be the number for column B and 1 in column C.
Example 2: Column B represents the number of boxes you have. Column C represents the number of images you have selected from the chart Images per Box Type (Not in linear feet).
[Insert screenshot of spreadsheet]
You have a 10 Hollinger box collection of mixed: unbound pages, bound/folded documents. You notice that the boxes aren’t loose but aren’t packed tight. In this scenario, it would be appropriate to select the average number on the scale which is 2,951. In column B you would input 10, as the boxes will act as items, and in column C, 2,951 which is the number of images per item. Your total estimated image count would be 29,510.
Cost Calculator
The Cost Calculator Subgroup was formed to explore easier-to-maintain alternatives to the aging DLF Cost Calculator, to meet the needs of the community. The original calculator was developed in 2014-2016, and its underlying data set now has a bad data point that significantly impacts the metadata cost results. Due to changes over time in the connection to the platform hosting the data set, it is not possible to correct that data point. In addition, any improvements would require work by a Ruby on Rails developer.
The approach of a downloadable spreadsheet-based digitization cost calculator was inspired by OCLC Total Cost of Stewardship: Responsible Collection Building in Archives and Special Collections — specifically the idea of a downloadable tools zip file or individual Excel files.
A spreadsheet-based digitization cost calculator is much easier to maintain. Also, significantly, it is easy to see the calculations and the data they are based on, and it’s easy for a user to customize for their needs.
The group developed a spreadsheet-based digitization cost calculator as a Google Sheet and a matching downloadable Excel file. The initial structure and underlying data were provided by the University of Oklahoma. The underlying data are the images per hour and labor cost for each type of work included, for each equipment or material type; typical image size for each equipment or material type; and storage cost.
[ screenshot of basic tab of Google sheet, upper left portion ]
The spreadsheet cost calculator can be used as-is for rough estimates, or customized to better match the capabilities of the user’s organization.
Use is straightforward, with some basic guidance in a readme tab. The spreadsheet has one tab with a stripped-down version of the cost calculator, a second tab with a fuller version that includes more options for equipment or materials, and a readme tab that includes a brief overview of what the calculator does, how to use it, and how to customize it.
[ link to current spreadsheet Google Sheet to copy ]
[ link to current spreadsheet Excel file to download ]
Customization by a user is straightforward, whether it’s changing the underlying data in the cells or adjusting what equipment or materials to cover. It’s also possible, with a little more work, to add or remove types of work included (i.e., columns – currently capture, post-processing, basic metadata assignment, creating preservation files, and preparing for storage).
Vendor Registry
The Vendor Registry group was created to get feedback from the library digitization community about their experiences with equipment and service digitization vendors within the cultural heritage sector. In 2023 the Vendor Registry subgroup circulated two surveys. Both surveys provided a list of vendors curated from a previous vendor registry, as well as provided an opportunity for respondents to add vendors to the list. Respondents were also asked to indicate whether they would be available to be contacted directly to answer questions.
The subgroup compiled the survey results into an overview document that listed each vendor, its website, a description of product or services received, and overall service experience. These results were then made publicly available via Reformatting Service & Equipment Vendor Registry Survey Results. The Survey Results document also included an indicator whether there were survey respondents that would be willing to share their contact information with others to provide additional details about their experiences.
Contact Information
Members of the Cost Assessment Interest Group voted to sunset the group in favor of creating a new working group focusing on the logistics of digitization. Questions about this toolkit can be directed either to leadership of the Assessment Interest Group or to the new group set which will launch in 2025. For anyone interested in continuing the assessment work of the Cost group, contact the Assessment Interest Group leadership.