Digitizing Initiatives

(above: Jean Mannheim (1863-1945),
Sunny Portrait, before 1945, tuttartpitturasculturapoesiamusica.com.
Public domain, via Wikimedia Commons*)
Methods and Costs
From 2005:
There were, as of 2005, three mass
productions options for scanning books according to Dustin Goot in a
Wired article titled 3 Ways to Scan a Library. The first
method is to remove the spines and use "machines [that] cost $25,000
and churn through 90 black-and-white pages per minute, front and back."
Second, libraries can have "workers in India, China, and the Philippines
earn about 40 cents an hour to manually turn pages that are zapped by $15,000
overhead scanners... Carnegie Mellon's Million Book Project [see above]
alone employs more than 100 Indians for this activity" Third, libraries
or publishers can employ automated systems such as Kirtas
Technologies' automated system that scans 1,200 pages per hour from
bound books.
-
- When the end purpose of digitization is the publishing
of converted material onto the Internet, art books and journal articles
present a special challenge for conversion of the analog material to digital
files. Since images of art objects are frequently embedded in pages containing
text, .pdf digital output for Internet publication is usually not feasible
due to the time and expense needed for copyright clearance of the images
of art objects.
-
- In addition to the mass scanning methods, organizations
can manually scan text in bound books page by page. Resource Library
recently estimated that the time required to manually scan, delete
artwork images, proofread and convert to HTML a 400 word page of text in
a bound book averages 6 minutes per page, maintaining 99.995% accuracy.
The combined direct labor cost is estimated at $2.50 per page, or $25 per
hour for 10 pages. A 5,000 word essay would therefore cost $32 to process
in direct labor cost. Capital equipment and overhead costs need to be added
to direct labor costs to arrive at total cost. (see the Content
presentation guidelines from Resource
Library for further information on its text presentation conventions)
-
- In 2006, TFAO conducted research to assess the feasibility
of outsourcing its text conversion process to service bureaus. TFAO would
provide final processing into .htm files for online publication. Assumptions
for quotes:
- estimate of a minimum input of 3 documents per week x
36 weeks / year; each bound document = 2,000 to 100,000 words. Average
10,000 words. All source document printing is black ink on white paper.
- output accuracy at 99.995%.
- paper documents in good condition (not fragile): 1. sent
by sources direct to service bureau for processing, or, 2. scanned at museum
and converted there to .pdf files, or .pdf and .doc files. For a .PDF image
of a sample paper source document see this page with a link to a .pdf of
a 50 page catalogue: < http://www.tfaoi.org/aa/6aa/6aa418.htm >.
- if documents scanned at a service bureau, all source
documents in English and scannable on a 8 1/2 x 14 inch scanner.
- batch processing ok with quarterly turnaround
- source paper documents not returned to sources or to
TFAO -- if sent to a service bureau by a museum
- output .doc file formatted to Resource Library
text presentation conventions and a .pdf file showing the image of each
source document page both sent by email to TFAO
- For a discussion on the costs related to reading of "open
access publishing" vs. subscription based articles see "The Cost
per Article Reading of Open Access Articles" by Jonas Holmström,
Research Assistant, Swedish School of Economics and Business Administration.
-
- For a comparison of costs involved with operating a paper
vs. virtual library see "Comparing
Library Resource Allocations for the Paper and the Digital Library"
by Lynn Silipigni Connaway, Research Scientist, Office of Research, OCLC
Online Computer Library Center, Inc. and Stephen R. Lawrence, Associate
Professor of Operations Management, Leeds School of Business, University
of Colorado. Also see "The
Return on Investment of Electronic Journals - It Is a Matter of Time"
by Jonas Holmström, Swedish School of Economics and Business Administration,
Helsinki, Finland
-
- At an image resolution of 300 to 500 dpi. Kirtas estimated
in 2005 that their automated method costs "as low as $.03" per
page ($36 per hour), while manual scanning, at a rate of 100 to 150 pages
per hour, costs "$.35 to $1.50" per page. (This cost quote is
probably not applicable to Resource Library's text conversion and
text presentation conventions requiring 99.995% accuracy.)
-
- A November 9, 2005 Wall Street Journal article
by David Kesmodel and Vauhini Vara discussed costs connected with the book
digitizing program of Internet Archive,
a San Francisco nonprofit group that is spearheading the Open Content Alliance,
a consortium of business and educational groups. Employees manually scan
out of copyright books in five-hour shifts, four times a week. Pay is just
over $10 per hour. The article says that the Archive has digitized around
2,800 books, at a cost of about $108,000, which is $38.50 per book. It
costs "about 10 cents a page to get a book online, taking into account
equipment, labor and the cost of hosting the pages on the Internet Archive's
Web servers." Each special scanning machine costs $20,000 to $40,000.
It takes around one hour to scan 500 pages or about 8 1/3 pages per minute.
(This cost quote is probably not applicable to Resource Library's text
conversion and text presentation conventions requiring 99.995% accuracy.)
-
- A December 12, 2005 article in the Wall Street Journal
by Jeffrey A. Trachtenberg and Kevin J. Delaney said that a major publisher
was recently told that "it costs as much as 10 cents per page to scan,
digitize and tag a book, which means a 300-page novel would cost $30."
(This cost quote is probably not applicable to Resource Library's text
conversion and text presentation conventions requiring 99.995% accuracy.)
-
- A December 14, 2004 announcement
by Google that the firm will collaborate with institutional libraries to
digitize large quantities of books spawned numerous articles in the media.
Digitizing expenses were quoted from $10 to $20 per book. For instance,
a December 14, 2004 Reuters article by Lisa Baertlein titled "Google
Bets Big on Bringing Libraries to Web" said "Librarians and non
profits already involved in scanning books for other projects say it costs
around $20 to do a 300-page book, but that the cost should soon fall to
around $10 per book." At $20 that is 7 cents per page and at $10 it's
3 cents. (This cost quote is probably not applicable to Resource Library's
text conversion and text presentation conventions requiring 99.995%
accuracy.)
-
From 2006:
- During 2006 TFAO received quotes from firms to provide
text conversion service.
-
- One firm's subcontractor offered 99.995% accuracy with
pricing for .doc output files to be:
-
- -- Bound bitone scanning up to 8.5" x 11" =
$0.72/each
- -- Bound bitone scanning up to 11" x 17" =
$1.02/each
- -- OCR bitone images = $0.18/each
- -- Proofing and formatting = $1.17 per 1,000 characters
(later reduced to 80 cents in a 2007 requote)
- -- CD-R masters = $10.00/each (optional)
- -- Shipping = at cost
-
- Assuming a 10,000 word essay with 5.3 characters per
word, there would be 53,000 characters in the document. The (2007) proofreading
cost = $42.40. If there are 600 words per page the scanning = $12. Adding
a CD-R master brings the total cost to $64.40
-
- Firms quoted for proofreading and formatting service
only for an equivalent document $42, $156 and $200.
-
- For proofreading there are a number of specialty specialty
service bureaus For example, Canyouproofthis.com
charges a minimum of $50 as of November, 2006. They provide an online
rate calculator. Wordsru.com provided
an "instant estimate" of $78 for a 5,000 word document.
From 2007:
- During 2007 TFAO received a quote for scanning, formatting,
proofreading and emailing of a resultant .doc file at 80 cents per 1,000
characters. The source has a $100 minimum, so for maximum efficiency, TFAO
would send to the contractor 125,000 characters, equivalent to 23,500 words
of text.
-
- A sample AAR article converted in 2007 has 1,840 words
in five pages, or 368 words per page. At that rate for AAR articles to
maximize use of $100 minimum, 23,500 words divided by 368 words per page
= 64 pages needed.
-
- These quotes are based on adherence to TFAO's text presentation conventions.
-
-

(above: William Gropper, Automobile
Industry, 1941, oil on canvas, 72 inches x 20 feet, Detroit,
Michigan Post Office. Public domain, via Wikimedia Commons*)
-
- rev. 6/4/07
Go to:
- Commercial Ventures
- The eBook future
- Related Non-Profit Organizations
- Methods and Costs
- Notes
back to start of Digitizing Initiatives
*Tag for expired US copyright of object
image:

Links to sources of information outside
of our web site are provided only as referrals for your further consideration.
Please use due diligence in judging the quality of information contained
in these and all other web sites. Information from linked sources may be
inaccurate or out of date. TFAO neither recommends or endorses these referenced
organizations. Although TFAO includes links to other web sites, it takes
no responsibility for the content or information contained on those other
sites, nor exerts any editorial or other control over them. For more information
on evaluating web pages see TFAO's General
Resources section in Online
Resources for Collectors and Students of Art History.
Search Resource Library
Copyright 2012 Traditional Fine Arts Organization, Inc., an Arizona nonprofit corporation. All rights
reserved.