Bridging The Digital Divide
The challenges of information accessibility have prompted efforts to digitize backlogs of transportation resources currently in analog formats.
Although transportation practitioners have access to a seemingly unlimited amount of information via Internet search engines and other online portals, a large portion of key research conducted before the digital age remains in print and other analog formats. This body of work remains difficult or, in some cases, impossible to access remotely, which may lead to duplication of research and, secondarily, loss of knowledge crucial to solving other transportation problems.
To provide practitioners with difficult-to-access materials, State departments of transportation (DOTs) and transportation libraries are working to digitize unique and important materials. Digitization is the conversion of analog materials, or documents that can be viewed or heard by people, to digital format, which can be read by machine. After digitization, a computer reads or interprets the file to produce an image for display or printing. Digitization does not necessarily represent an exact reproduction of the analog item; it is a snapshot of all or part of the document, sound record, or video.
And while most digitization projects yield a favorable return on investment, agencies and institutions can benefit from established workflows and examination of costs before undertaking such efforts.
Access: The Central Problem
Access to print and other analog transportation materials is hampered by the localization of resources at individual DOTs or research libraries and lack of discoverability, notably, the difficulty in finding these resources by means of standard Internet search queries. The access challenges may result in duplication of research, lack of knowledge of past research, or loss of historical and institutional knowledge.
Traditionally, transportation-related technical reports and other materials were available only to researchers in close proximity to the materials or to those willing to travel to the print collections. For example, a DOT's technical reports might be kept in a particular office in the DOT and are readily available only to practitioners in that department. Libraries have been used as repositories for print resources, but even under the best cooperative agreements, physical items remain largely available only to local users. Materials at local institutions often are not discoverable via search engines, library catalogs, or other means because of their physicality. Even at institutions with a well-cataloged collection, sharing agreements such as interlibrary loans are costly because of base fees, labor, and shipping. Though costs vary significantly by institution, a study by The University of Kansas and The University of North Carolina estimated borrowing costs of $12.11 per item across 23 North American libraries.
"The materials may exist, but the people who need them have no way to discover them," says Roberto Sarmiento, head of the Northwestern University Transportation Library. "We have a universe of information, but it is not necessarily available to the rest of the world."
The Digitization Solution
Digitization is one solution to the accessibility issue. When materials are digitized, practitioners can use resources regardless of their location. According to Sarmiento, "With digital objects, we may not even know who the user is, or where they're coming from. The user becomes anyone in the world searching for the information."
Digitized resources, depending on the quality of their associated metadata, such as author, title, and subject information, become discoverable via search engines and standard in-house library catalogs.
"We have a situation now where we're making these historical research findings available quickly," says Dr. Darcy Bullock, P.E., professor of civil engineering at Purdue University and director of the Joint Transportation Research Program, a collaboration of Purdue University and the Indiana Department of Transportation (INDOT). "Within hours of them being published online, we see Google Scholarâ„¢ pick them up."
However, providing access does not come without difficult decisions, such as which materials to digitize and procedures for digitization, as well as significant costs. A typical digitization project involves numerous steps. To start, the project planner selects materials for digitization. Depending on the condition of the materials, they may require physical repair or other preparation.
The materials then are inventoried and moved to a digitization lab, either in-house or offsite. Digitization requires scanning a master, then producing a digital object in one or more user-friendly formats. Digitized print objects are typically available in PDF format, but are increasingly available in formats for electronic book readers, as well as in a customized book-reading interface on a project Web site.
Next, during quality control, the digitization team checks the digital object carefully against the analog object. Quality control is needed because an automated technology called optical character recognition scans the alphabetical characters or letters from the analog document into digital format. During character recognition, a scanner attempts to recognize each letter or character in a word. The quality control check ensures that no mistakes occurred during the recognition process. For example, the scanner could recognize the letter "h" as a "b." Various font types, faded ink, torn and damaged pages, or simply the age of some objects can affect character recognition. After digitization, the analog objects are either retained or disposed of according to the needs of the owning institution.
Costs and Challenges
Digitization projects present numerous challenges. Even under the best of circumstances, they require significant investments of time and money. A full digitization project involves labor-hours, which often extend beyond typical projections, examination of copyright by an individual knowledgeable in that field, the use of specialized equipment, indexing to ensure that materials are searchable, and, ultimately, the long-term cost of storage and appropriate display of the resources on servers and Web sites.
What follows are case studies of digitization projects by the South Carolina Department of Transportation (SCDOT), the Iowa Department of Transportation (Iowa DOT), Indiana's Joint Transportation Research Program, and Northwestern University. The case studies illustrate the planning, success, and lessons learned from a variety of projects.
Case Study: South Carolina
Plans Online is SCDOT's award-winning library of digitized construction plans for highways within the State's transportation system. Prior to the establishment of Plans Online, the agency's road construction plans were available only in print and located at an SCDOT facility 3 miles (4.8 kilometers) from the agency's central office in Columbia. The process to retrieve a particular plan was cumbersome and took up to several days to complete -- first accessing information in a card catalog, then ordering the document from the shop facility, and finally having it delivered to the central facility.
SCDOT began in-house scanning of road plans in 1997 using an existing large-scale printer. "The cumbersome part was labeling every sheet and indexing everything. The actual scanning was the easiest part," says Mark Lorick, administrative coordinator in the Plans Storage Office at SCDOT headquarters. SCDOT used data elements such as county, route, project number, termini locations, and date of creation to index documents in Plans Online. As scanning technology became cheaper, the agency then purchased a newer large-scale printer, which it still uses today.
Early in the project, SCDOT faced a critical question of access: Would the library of plans be available to the public at no cost, or password protected with a fee? After much discussion, SCDOT officials decided to charge a nominal fee to users outside of State government offices. The current annual subscription price is $60, which helps to cover the cost of maintenance, operation, and improvement of the database.
SCDOT places no restriction on who may subscribe to Plans Online. The database is available to surveyors, consultants, engineers, and local officials.
Today, Plans Online contains nearly 2 million images, with plans dating from the early 20th century to current plans that were "born digital," that is, originally produced in digital format versus print format. SCDOT completed digitization of the backlog of print-based plans in 2010, and now populates Plans Online with born-digital plans for current highway projects.
The Governor of South Carolina recognized Plans Online for its efficiency improvements, and the project also was a top 10 winner of the American Association of State Highway and Transportation Officials' 2011 America's Transportation Awards in the innovative management category. (For more information, see "Best of the Best: America's Transportation Awards!" in the March/April 2012 issue of PUBLIC ROADS.)
"[Plans Online] is a tremendous help to our profession," says Joe Mitchell, professional land surveyor and former president of Mitchell Surveying. "We can now sit at a computer and access online highway right-of-way plans for the entire State. No longer do we have to travel to the different county SCDOT offices to get right-of-way information and plans."
Case Study: Iowa
The Iowa DOT, established in 1913 as the Iowa State Highway Commission, had materials in its collections dating back to its establishment. The materials were located in various Iowa DOT offices.
"The research is only valuable if people know about it and have access to it," says Hank Zaletel, former director of the Iowa DOT Library.
Beyond print materials, the Iowa DOT collections include photo negatives, maps, and 16-millimeter film. These types of nonprint materials are particularly vulnerable to loss or damage because they are more susceptible to humidity, water, and light than print materials are. In addition, the long-term storage of audio and visual materials is costly and difficult because they must be maintained in a climate-controlled facility.
The importance of nonprint materials may not be as evident to organizational leaders, who may overlook their relevance to current operations. However, Iowa DOT recognized the importance in their intrinsic historic value across a variety of disciplines and as a way to inform the public of the agency's mission and vision. For example, the DOT has used some of the digitized materials in research publications and other general interest pieces on its Web site.
In 2002 Iowa DOT established an exploration committee to look at possible digitization of the agency's physical collections. The committee included the agency's director of research, librarian, records manager, and office director. In 2004 a followup committee, which included experts in historic mitigation, transportation data, and public affairs, won an initial $50,000 Transportation Enhancement grant from the Federal Highway Administration to hire an archival consultant.
Marcy Flynn, an archival expert in visual materials with Silver Image Management, assisted the Iowa DOT in creating a business and digitization plan, a climate-controlled storage area, and a policy and procedures manual (an internal document) on the use of archival materials. The plan included using existing in-house equipment to digitize the photos and employing the department's existing electronic record management system. Using the two in-house systems meant an immediate cost savings for an otherwise costly process.
Efforts to digitize parts of the collection moved forward in 2007. The DOT then received another $150,000 Trans-portation Enhancement grant that it used to digitize the Iowa State Highway Commission's collection of 8,500 road photos.
Flynn emphasizes the importance of using quality metadata to index nonprint digitized materials. Metadata, such as information about the geographic location, date, activity, or subject of materials, makes "a big difference," she says. "A picture is worth a thousand words -- but if you don't know what you're looking at, descriptive metadata can really help you access images, and identify and understand them."
Case Study: Indiana
The Joint Transportation Research Program began in March 1937 as the Joint Highway Research Project to facilitate collaboration between higher education and the transportation community to improve Indiana's highway infrastructure. The program, which reflects a long track record of collaboration between INDOT and Purdue University, most recently implemented a successful model of digitization of transportation resources.
What Is a Digital Object Identifier?
A digital object identifier (DOI) is a unique identifier of an object, typically an electronic document, but also images, audio or visual objects, and datasets. The International Organization for Standardization developed these identifiers, which now are registered through membership in the International DOI Foundation, home to more than 5,000 members with 65 million registered digital object identifier names.
A digital object identifier consists of a character string divided into two parts: a prefix and suffix. The parts are separated by a slash. The prefix is the unique identifier of the registered organization; the suffix is chosen by the organization and identifies the object linked to the digital object identifier. Name resolution, which operates like a "phonebook" by translating host names to Internet protocol (IP) addresses or vice versa, is provided through the Handle System®, a technology that manages persistent identifiers for Internet resources. The string http://dx.doi.org/ precedes the digital object identifier string and, unlike a static URL, creates a unique and persistent uniform resource identifier, or URI. The digital object identifier syntax is standardized under ANSI/NISO Z39.84-2005 (R2010), Syntax for the Digital Object Identifier.
"The Joint Transportation Research Program did some great research over the past 75 years. Unfortunately, the paper reports were not very accessible," says Purdue's Bullock. "We would occasionally get requests . . . for reports from 20 or more years ago. We would have to search for the report, and attempt to make copies of some very fragile documents."
Bullock sought a solution and found it at the Purdue University Press, a unit of Purdue University Libraries, and the organization's digital online portal, Purdue e-Pubs. The Purdue University Press digitized all 1,500 of the program's historical reports in print using its in-house capabilities, which helped to keep costs down. These older reports, as well as current born-digital reports, now are published under a uniform Purdue University title page and entered into Purdue e-Pubs. A standard title page ensures easy identification of publication elements and consistency across Joint Transportation Research Program reports.
Library staff members assign a digital object identifier and add standardized metadata about the authors and subject. A digital object identifier is a unique character string used for the identification of an object. Among the metadata added to digitized reports, the program has found the digital object identifier to be one of the most valuable.
Statistics indicating the number of user hits bear out the success of placing Indiana's reports in the new digitization and publishing framework. Users have downloaded historical technical reports on the Purdue e-Pubs Web site more than 400,000 times.
"The scanned documents are now broadly accessible to anybody with Web access," says Bullock. "As an example, one 1972 report on pavement performance had [more than] 450 downloads within 6 months of being posted online."
Case Study: Northwestern University
Northwestern University Transportation Library has one of the largest collections of transportation-related materials in the world. The library, which owns more than 500,000 volumes, contains materials issued by agencies at the local, State, Federal, and international levels. The library has resources from DOTs across the United States but comprehensively collects materials from the Illinois Department of Transportation (IDOT), the city of Chicago, and numerous regional transportation bodies. Twenty-four percent of the materials are unique to the library, meaning they exist only in the Northwestern collection.
Digitization at Northwestern University Transportation Library occurs via several mechanisms. The library owns its own machine for in-house digitization, typically used for digitizing fragile materials. It also outsources digitization for out-of-copyright materials. The Google Booksâ„¢ Project, which began in 2010, has digitized millions of resources at Northwestern, including numerous volumes from the library itself. Also, the Consortium of Academic and Research Libraries in Illinois (CARLI) has carried out digitization workflow at Northwestern. Two projects implemented through CARLI focused on materials specific to Illinois, one on local transit, and the other on materials related to Chicago O'Hare International Airport. Both projects uncovered and digitized numerous documents from IDOT.
With such a large-scale collection like that of a major research university, Northwestern University Transportation Library's Sarmiento says, "As a manager, it's my job to make sure that our researchers have access to the best available information. We choose to digitize the best of the best. We select the materials that give us the best bang for our buck. We want to digitize the materials with the highest return on investment, not just for our patrons, but also to prove [the digitization's] worth to our management and institution."
Costs Associated with Digitization
Although the cost of digitization is difficult to estimate because it varies greatly among institutions, the following are some expenses typically incurred beyond a basic cost per page of digitization:
Access to digitized materials is a high priority at Northwestern. The university shares metadata across numerous portals that point to the digital objects. For example, items from the CARLI digitization projects are discoverable at the Internet Archive, the standard portal for CARLI materials; NUcat, the Northwestern University Libraries catalog; OCLC WorldCat, a global network of library content and services; and the Transportation Research International Documentation (TRID) database at the Transportation Research Board.
David E. Kosnik, Ph.D., P.E., research engineer at the Northwestern University Infrastructure Technology Institute, points to digitized documents being used during field work involving structural health monitoring of IDOT's steel and concrete bridges. "To get the data we need to understand performance and deterioration of real-world structures, we install sensors on inservice bridges, buildings, and tunnels -- facilities that are typically exposed to the elements," Kosnik says. "We often consult reports and other technical documents and drawings while in the field as we determine the best instrumentation and monitoring plan for a given facility. The portability and searchability of digitized documents makes this process easier and prevents damage to physical documents from field hazards such as rain, wind, debris, grease, and muddy boots."
Digitization projects require copious amounts of planning time that can vary greatly by project. Selection, preparation, delivery, and creation of a metadata schema (a set of predefined elements) are particularly time consuming, especially for an institution undertaking a digitization project for the first time. Assigning as much metadata as possible within the budget is critical. To save time and costs, using in-house equipment and systems is also beneficial. Executing part or all of a project in-house could help achieve significant cost savings versus outsourcing the entire project.
To save additional time and money, team leaders of digitization projects should exercise caution when identifying and selecting materials. A collection should have a coherent theme, appeal to a defined audience, and have a well-defined end use. In addition, taking steps to avoid duplication of work already done by other digitization projects can save significant amounts of time and money.
Quality control is also important. For each resource, the digitization team should examine the copyright carefully before digitization. While Federal guidelines on digitization are available from sites such as www.copyright.gov/laws, laws governing documents created by and for State, local, and municipal agencies differ from Federal guidelines. When materials lack an office from which to obtain copyright clearance, the organization should consult its legal counsel. In addition, digitization teams should run quality control checks on digitized objects before releasing the materials to the public. Missing pages, bad imaging, and corrupt files could compromise a project's quality and validity in the public's eye.
Lastly, an organization should plan for a public relations strategy at the start of a major digitization project. Communicating with the public will introduce the digitized materials to potential users and also help to prove the worth of the project to the funding organization, managers, and other stakeholders.
While print and other analog resources present challenges of access to localization, restriction, and portability, various innovations in digitization can help agencies overcome these limitations. Many State DOTs, often in conjunction with universities and institutions well-versed in digitization, have digitized large collections of print materials. These efforts mean improved access to transportation-related information for transportation researchers and practitioners, and the broader research community.