Assessment:Costs:Toolkit: Difference between revisions

From DLF Wiki
Rsenese (talk | contribs)
added in links
Rsenese (talk | contribs)
No edit summary
 
(2 intermediate revisions by the same user not shown)
Line 1: Line 1:
This site is currently under development by the Cost Assessment Working Group.  Please check back at the end of 2024.
'''This site is currently under development by the Cost Assessment Working Group.  Please check back at the end of 2024.'''
 
In 2021 the Cost group surveyed a variety of digitization stakeholders from across the Digital Library Federation. The data collected helps inform what is needed for successful digitization projects or starting a digitization program. The group formed subgroups to work on various resources and issues: insourcing vs. outsourcing, a registry for vendors, assessing scans per linear feet, best practices, and cost estimation.  The groups analyzed the data produced by the surveys and have developed reports and this toolkit to help digitization professionals in their decision making as it relates to logistics and cost.
Final reports and anonymized data for all survey groups can be found in OSF (https://osf.io/6dnmh/). 
==State of the Field==
 


==In Sourcing vs Outsourcing==
==In Sourcing vs Outsourcing==
Line 60: Line 65:
==Scans per Linear Feet==
==Scans per Linear Feet==


The Cost Assessment Scans Per Linear Feet Subgroup was created to fact check current estimations published online for calculating images per linear feet.  We also evaluated how differing digitization and housing practices as well as content type changed the amount of scans per linear foot.  The results from this work can be used when determining the cost of a project using the cost calculator and/or other means of assessing costs like conversations with vendors.
The Cost Assessment Scans Per Linear Feet Subgroup was created to fact check current estimations published online for calculating images per linear feet.  We also evaluated how differing digitization and housing practices as well as content type changed the amount of scans per linear foot.  The results from this work can be used when determining the cost of a project using the DLF cost calculator and/or other means of assessing costs like conversations with vendors.


==== Data ====
==== Data ====
Line 125: Line 130:
The above data can be used to calculate and estimate the number of images likely to be produced by a digitization project.  You can tailor your calculations based on box type, linear feet, and/or content type.  Depending on your institution's processing practices, you may fall to one side of the min/average/max spectrum.  Keep that in mind as you choose the number to use in your calculations.
The above data can be used to calculate and estimate the number of images likely to be produced by a digitization project.  You can tailor your calculations based on box type, linear feet, and/or content type.  Depending on your institution's processing practices, you may fall to one side of the min/average/max spectrum.  Keep that in mind as you choose the number to use in your calculations.


 
''Using with the DLF Cost Calculator:''
 
''Using with the Cost Calculator:''
If using the cost calculator on DLF’s website, the calculator asks for a number of images.  Use the appropriate chart to estimate the number of images for your project. Just a reminder, the cost calculator is no longer being maintained so use at your own risk.
If using the cost calculator on DLF’s website, the calculator asks for a number of images.  Use the appropriate chart to estimate the number of images for your project. Just a reminder, the cost calculator is no longer being maintained so use at your own risk.


''Using with the Cost Calculator Worksheet:''
''Using with the DLF Cost Calculator Worksheet:''
The worksheet uses items and scans per item to calculate your number of estimated images.  You can convert our data to fit this equation  in multiple ways.  Here are two examples:
The worksheet uses items and scans per item to calculate your number of estimated images.  You can convert our data to fit this equation  in multiple ways.  Here are two examples:


Line 143: Line 146:


You have a 10 Hollinger box collection of mixed: unbound pages, bound/folded documents.  You notice that the boxes aren’t loose but aren’t packed tight.  In this scenario, it would be appropriate to select the average number on the scale which is 2,951.  In column B you would input 10, as the boxes will act as items, and in column C, 2,951 which is the number of images per item. Your total estimated image count would be 29,510.
You have a 10 Hollinger box collection of mixed: unbound pages, bound/folded documents.  You notice that the boxes aren’t loose but aren’t packed tight.  In this scenario, it would be appropriate to select the average number on the scale which is 2,951.  In column B you would input 10, as the boxes will act as items, and in column C, 2,951 which is the number of images per item. Your total estimated image count would be 29,510.
==Cost Calculator==
The Cost Calculator Subgroup was formed to explore easier-to-maintain alternatives to the aging DLF Cost Calculator, to meet the needs of the community. The original calculator was developed in 2014-2016, and its underlying data set now has a bad data point that significantly impacts the metadata cost results. Due to changes over time in the connection to the platform hosting the data set, it is not possible to correct that data point. In addition, any improvements would require work by a Ruby on Rails developer.
The approach of a downloadable spreadsheet-based digitization cost calculator was inspired by [https://www.oclc.org/research/publications/2021/oclcresearch-total-cost-of-stewardship.html OCLC Total Cost of Stewardship: Responsible Collection Building in Archives and Special Collections] — specifically the idea of a downloadable tools zip file or individual Excel files.
A spreadsheet-based digitization cost calculator is much easier to maintain. Also, significantly, it is easy to see the calculations and the data they are based on, and it’s easy for a user to customize for their needs.
The group developed a spreadsheet-based digitization cost calculator as a Google Sheet and a matching downloadable Excel file. The initial structure and underlying data were provided by the University of Oklahoma. The underlying data are the images per hour and labor cost for each type of work included, for each equipment or material type; typical image size for each equipment or material type; and storage cost.
[ screenshot of basic tab of Google sheet, upper left portion ]
The spreadsheet cost calculator can be used as-is for rough estimates, or customized to better match the capabilities of the user’s organization.
Use is straightforward, with some basic guidance in a readme tab. The spreadsheet has one tab with a stripped-down version of the cost calculator, a second tab with a fuller version that includes more options for equipment or materials, and a readme tab that includes a brief overview of what the calculator does, how to use it, and how to customize it.
[ link to current spreadsheet Google Sheet to copy ]
[ link to current spreadsheet Excel file to download ]
Customization by a user is straightforward, whether it’s changing the underlying data in the cells or adjusting what equipment or materials to cover. It’s also possible, with a little more work, to add or remove types of work included (i.e., columns – currently capture, post-processing, basic metadata assignment, creating preservation files, and preparing for storage).


==Vendor Registry==
==Vendor Registry==
Line 149: Line 173:
The subgroup compiled the survey results into an overview document that listed each vendor, its
The subgroup compiled the survey results into an overview document that listed each vendor, its
website, a description of product or services received, and overall service experience. These results were then made publicly available via  [https://osf.io/6m8da/ Reformatting Service & Equipment Vendor Registry Survey Results]. The Survey Results document also included an indicator whether there were survey respondents that would be willing to share their contact information with others to provide additional details about their experiences.
website, a description of product or services received, and overall service experience. These results were then made publicly available via  [https://osf.io/6m8da/ Reformatting Service & Equipment Vendor Registry Survey Results]. The Survey Results document also included an indicator whether there were survey respondents that would be willing to share their contact information with others to provide additional details about their experiences.
==Contact Information==
Members of the [https://wiki.diglib.org/Assessment:Costs Cost Assessment Interest Group] voted to sunset the group in favor of creating a new working group focusing on the logistics of digitization.  Questions about this toolkit can be directed either to leadership of the Assessment Interest Group or to the new group set which will launch in 2025. For anyone interested in continuing the assessment work of the Cost group, contact the [https://wiki.diglib.org/Assessment Assessment Interest Group leadership].

Latest revision as of 16:02, 9 December 2024

This site is currently under development by the Cost Assessment Working Group. Please check back at the end of 2024.

In 2021 the Cost group surveyed a variety of digitization stakeholders from across the Digital Library Federation. The data collected helps inform what is needed for successful digitization projects or starting a digitization program. The group formed subgroups to work on various resources and issues: insourcing vs. outsourcing, a registry for vendors, assessing scans per linear feet, best practices, and cost estimation. The groups analyzed the data produced by the surveys and have developed reports and this toolkit to help digitization professionals in their decision making as it relates to logistics and cost. Final reports and anonymized data for all survey groups can be found in OSF (https://osf.io/6dnmh/).

State of the Field

In Sourcing vs Outsourcing

The case studies include information about what the institution is like, the collection being digitized, and the factors that affect the digitization method chosen - number of items, fragility, staffing, equipment, money, etc. The case studies explore what the initial choice was - insource, outsource, both - how digitization went, and a brief evaluation about the lessons learned about that approach. They discuss whether the project was entirely successful, proved too difficult as scoped originally, or had mixed results - some parts were completed well, but other parts of the project proved too difficult to complete. All situations are informative.


Case studies are available for review by original format and by approach/solution.

Original Format

Audio / Video / Moving Image

Still Image

Text

Approach/Solution

Insourced

Outsourced

Hybrid/Blended

Undecided

Institutional Information

Respondents were given the option to provide information about their institutions in relation to their case studies. The aim was to present this information to readers so they can see if their peers have similar circumstances and how they handled them.

Categories included: institution size, digitization experience, concerns about insourcing vs outsourcing, risk tolerance for outsourced digitization, risk tolerance for copyright, funding, IT resources, and skills available.

Hoover Institution New York University UMichigan

Scans per Linear Feet

The Cost Assessment Scans Per Linear Feet Subgroup was created to fact check current estimations published online for calculating images per linear feet. We also evaluated how differing digitization and housing practices as well as content type changed the amount of scans per linear foot. The results from this work can be used when determining the cost of a project using the DLF cost calculator and/or other means of assessing costs like conversations with vendors.

Data

Further information about the survey, gathered data, and calculations can be found here: https://osf.io/ag9z3/

Content Type (Min / Average / Max) Linear Feet

  • Photographs: 63 entries (658 / 1,871 / 2,515)
  • Unbound pages: 120 entries (393 / 1,914 / 6,849)
  • Mixed: Unbound pages, bound/folded paper documents: 15 entries (1,482 / 2,125 / 4,274)
Content Type (Linear Feet) Number of Entries Minimum Average Maximum
Photographs 63 658 1,871 2,515
Unbound Pages 120 393 1,914 6,849
Mixed: Unbound Pages, Bound/Folded Documents 15 1,482 2,125 4,274


Box Type (Min / Average / Max) Linear Feet

  • Full Hollinger: 168 entries (103 / 2108 / 6,849)
  • Paige/Banker/Records (filed legal): 64 entries (658 / 1,994 / 4,141)
  • Paige/Banker/Records (filed letter): 17 entries (1,792 / 2,469 / 3,742)
Box Type (Linear Feet) Number of Entries Minimum Average Maximum
Full Hollinger 168 103 2,108 6,849
Paige/Banker/Records (filed legal) 64 658 1,994 4,141
Paige/Banker/Records (filed letter) 17 1,792 2,469 3,742


Images per Box Type (Not in linear feet)

  • Full Hollinger: 168 entries (45 / 923 / 3,000)
  • Paige/Banker/Records (filed legal): 64 entries (672 / 2,007 / 3,451)
  • Paige/Banker/Records (filed letter): 17 entries (2,009 / 2,951 / 4677)
Box Type (Not in Linear Feet) Number of Entries Minimum Average Maximum
Full Hollinger 168 45 923 3,000
Paige/Banker/Records (filed legal) 64 672 2,007 3,451
Paige/Banker/Records (filed letter) 17 2,009 2,951 4,677

How to Use This Data:

General Use: The above data can be used to calculate and estimate the number of images likely to be produced by a digitization project. You can tailor your calculations based on box type, linear feet, and/or content type. Depending on your institution's processing practices, you may fall to one side of the min/average/max spectrum. Keep that in mind as you choose the number to use in your calculations.

Using with the DLF Cost Calculator: If using the cost calculator on DLF’s website, the calculator asks for a number of images. Use the appropriate chart to estimate the number of images for your project. Just a reminder, the cost calculator is no longer being maintained so use at your own risk.

Using with the DLF Cost Calculator Worksheet: The worksheet uses items and scans per item to calculate your number of estimated images. You can convert our data to fit this equation in multiple ways. Here are two examples:

Example 1: Column B represents the total amount of images you estimate using whichever chart is relevant to the data you have (boxes or linear feet). Since we’re calculating the number of images instead of items you would have a 1 to 1 ratio of images to “item”. So, column C is set to 1.

You have a 10 linear feet collection of Mixed: Unbound Pages, Bound/Folded Documents. You notice that your boxes are packed tight. In this scenario, it would be appropriate to select the max number on the scale, which is 4,274 images/linear foot. This means you would have 42,740 images. 42,740 would be the number for column B and 1 in column C.


Example 2: Column B represents the number of boxes you have. Column C represents the number of images you have selected from the chart Images per Box Type (Not in linear feet).

[Insert screenshot of spreadsheet]

You have a 10 Hollinger box collection of mixed: unbound pages, bound/folded documents. You notice that the boxes aren’t loose but aren’t packed tight. In this scenario, it would be appropriate to select the average number on the scale which is 2,951. In column B you would input 10, as the boxes will act as items, and in column C, 2,951 which is the number of images per item. Your total estimated image count would be 29,510.

Cost Calculator

The Cost Calculator Subgroup was formed to explore easier-to-maintain alternatives to the aging DLF Cost Calculator, to meet the needs of the community. The original calculator was developed in 2014-2016, and its underlying data set now has a bad data point that significantly impacts the metadata cost results. Due to changes over time in the connection to the platform hosting the data set, it is not possible to correct that data point. In addition, any improvements would require work by a Ruby on Rails developer.

The approach of a downloadable spreadsheet-based digitization cost calculator was inspired by OCLC Total Cost of Stewardship: Responsible Collection Building in Archives and Special Collections — specifically the idea of a downloadable tools zip file or individual Excel files.

A spreadsheet-based digitization cost calculator is much easier to maintain. Also, significantly, it is easy to see the calculations and the data they are based on, and it’s easy for a user to customize for their needs.

The group developed a spreadsheet-based digitization cost calculator as a Google Sheet and a matching downloadable Excel file. The initial structure and underlying data were provided by the University of Oklahoma. The underlying data are the images per hour and labor cost for each type of work included, for each equipment or material type; typical image size for each equipment or material type; and storage cost.

[ screenshot of basic tab of Google sheet, upper left portion ]

The spreadsheet cost calculator can be used as-is for rough estimates, or customized to better match the capabilities of the user’s organization.

Use is straightforward, with some basic guidance in a readme tab. The spreadsheet has one tab with a stripped-down version of the cost calculator, a second tab with a fuller version that includes more options for equipment or materials, and a readme tab that includes a brief overview of what the calculator does, how to use it, and how to customize it.

[ link to current spreadsheet Google Sheet to copy ]

[ link to current spreadsheet Excel file to download ]

Customization by a user is straightforward, whether it’s changing the underlying data in the cells or adjusting what equipment or materials to cover. It’s also possible, with a little more work, to add or remove types of work included (i.e., columns – currently capture, post-processing, basic metadata assignment, creating preservation files, and preparing for storage).

Vendor Registry

The Vendor Registry group was created to get feedback from the library digitization community about their experiences with equipment and service digitization vendors within the cultural heritage sector. In 2023 the Vendor Registry subgroup circulated two surveys. Both surveys provided a list of vendors curated from a previous vendor registry, as well as provided an opportunity for respondents to add vendors to the list. Respondents were also asked to indicate whether they would be available to be contacted directly to answer questions.

The subgroup compiled the survey results into an overview document that listed each vendor, its website, a description of product or services received, and overall service experience. These results were then made publicly available via Reformatting Service & Equipment Vendor Registry Survey Results. The Survey Results document also included an indicator whether there were survey respondents that would be willing to share their contact information with others to provide additional details about their experiences.

Contact Information

Members of the Cost Assessment Interest Group voted to sunset the group in favor of creating a new working group focusing on the logistics of digitization. Questions about this toolkit can be directed either to leadership of the Assessment Interest Group or to the new group set which will launch in 2025. For anyone interested in continuing the assessment work of the Cost group, contact the Assessment Interest Group leadership.