Potential Applications for Transportation
The data needed to safely and efficiently operate the nation's transportation systems are immense and include data on:
- Demand, such as traffic volumes.
- System performance, such as queue lengths or crashes.
- Operating parameters, such as speed limits.
- Operating characteristics, such as travel times.
- Facilities, such as asset inventories.
Several of these types of data could benefit from collective information. Specific to the MIRE listing, potential applications include:
- Intersection inventory. This could include number of legs, intersection control (e.g., signalized, stop-controlled, yield-controlled), presence of turn lanes, left turn phasing, number of traffic control devices, etc.
- Signal inventory. This could include traditional red/yellow/green signals, pedestrian signals, and flashing beacons. The data collector could enter information on the presence of the signals or outages.
- Lighting. This could include the location and types of lighting.
- Pedestrian facilities inventory. This could include the presence and condition of sidewalks, crosswalks, pedestrian signals, push buttons, wayfinding signs, warning signs, etc.
- Bicycle facility inventory. This could include the presence and condition of bicycle paths, bicycle lanes, wayfinding signs, warning signs, etc.
- User reported walkability index. This could provide an opportunity for local citizens to rate facilities using existing walkability index assessments.
- Pavement surface characteristics. This could include the pavement condition and specific locations of deterioration.
- Sign inventory. This could include the presence, condition, and information conveyed for regulatory and warning signs.
- Curve and grade inventory. This could include the location and degree of curvature for horizontal curves or ramps. For vertical curves, it could include presence and grade. For both horizontal and vertical curves it could also include the presence and conditions of warning signs or the presence of features within the curve, such as driveways, that may impact safety.
The concept could also extend beyond the MIRE listing. Agencies could use these applications to collect other data to improve their transportation system including:
- Transit performance. This could include passenger reported arrival and departure times, and origin and destination information for the overall trip, not just the transit portions.
- Transit facilities. This could include the presence and condition of amenities such as benches and shelters at transit stops.
- Operating speed inventory. This could include passively reported operating speeds on roadways in a jurisdiction at peak and non-peak times.
The type of application used and the information collected would vary based on the needs of the agency. An agency may use one application to collect data across several categories. For example, an agency could use one application to collect inventory elements such as intersection, signal, sign, and pedestrian facility elements. Or, an agency could use an application to only collect elements of interest to one user group.
Once one agency develops an application, it may be able to be adapted to be beneficial to other agencies. The United States DOT Research and Innovative Technology Administration (RITA) tested San Francisco's CycleTracks application in Austin, Texas with positive results (8).
Types of Data Collection
The authors envision two types of data collection: mobile collection and remote collection. Mobile collection would occur in the field through the use of mobile devices such as smart phones or tablets. The user of the application would use GPS or maps to locate the data. The location of the person collecting the data would be important for the actual collection. CycleTracks is a good example of mobile collection. The GPS in the mobile devices of the data collectors tracks their location throughout their route.
The second type of data collection is remote collection. This type of collection could be done anywhere. The location of the person collecting the data is not important to the actual collection. The actual data collection is better described as data capturing. The agency in need of data would collect a large amount of data in the field, likely using a video camera or other device (e.g., light detection and ranging [LIDAR]). The remote data collector would review the video and process the captured data. For example, an agency might post a video of a corridor. Data collectors could process that video to develop a street sign condition inventory.
A good example of remote collection is a recent effort undertaken as part of the FHWA MIRE MIS project to develop an intersection inventory for New Hampshire DOT. Data entry clerks populated an intersection inventory using video logs and publically available photos in Google Maps Street View™ and Microsoft's Bing® Maps Bird's Eye. Remote data collectors, hundreds of miles away from the actual intersections in New Hampshire, reviewed the images and characterized attributes of each intersection in a database. Although in this example the data collectors were compensated, crowdsourcing could be used for similar efforts.
Data Contributor Considerations
For the purpose of this paper, the term contributor refers to a user of the application who would collect the data. These contributors are also likely users of the data in the more global sense in that they are users of the facility or users of the collected data. There are several types of potential contributors. They include: public contributor, qualified contributor, opportunistic contributor, and directed contributors, and are discussed in more detail in the following section.
Types of Contributors
The authors envision several types of contributors, or collective data imputers. The first type is public contributor. This type of contributor is used for applications that are open to any interested users. Public contributors are appropriate in the following circumstances:
- A large or representative sample of people is needed to accurately reflect operating conditions or to provide enough coverage for the application to reflect the transportation environment.
- Detailed training, beyond what can be conveyed directly by the application, is not needed.
- No equipment is needed, beyond the capabilities of the mobile device, to collect and transmit the information.
- The data are not needed urgently.
The previously described CycleTracks is an example of an application with public contributors. SFCTA made their application open to anyone who wanted to download it, and required the contributors to register using some basic contact information. SFCTA did not place any restrictions on who could register. This provided an opportunity for a diverse sample of bicyclists around the jurisdiction. Similarly, Boston's Street Bump and Citizen Connect applications all use public contributors.
The second type is qualified contributor. Civic organizations, Parent Teacher Association (PTA) members, or neighborhood watch members are examples of potential qualified contributors. Agencies would specifically identify these groups for the effort and would receive some form of training. Qualified contributors are appropriate in the following circumstances:
- Training, beyond what can be conveyed directly by the application, is needed.
- The needed reliability of the data is greater than what can be collected by public contributors.
- The collected data has some direct value to the qualified group.
An example of the type of application that may use qualified contributors includes school zone inventories or pedestrian facilities inventories.
The third type is opportunistic contributor. This could include agency employees whose primary responsibility is not data collection but their role with the agency provides an opportunity to collect data. Examples of this type of contributor include downtown ambassadors, maintenance personnel, personnel responsible for collecting other data, such as traffic counters, or parking enforcement. Opportunistic contributors are appropriate in the following circumstances:
- The data collection is not time sensitive, since the collection of data by these contributors occurs only when the opportunity is present.
- Some training or equipment may be needed.
- The data elements to be collected are proximate to the general duties of the contributor, more likely in an urban environment.
An example is the Ohio BADCS. BADCS is a web-based application to view video of Ohio DOT's roadways to collect large-scale asset inventories. It is available to anyone within the DOT, although primarily used by Districts 1 and 2 (9).
An example of the type of application that may use opportunistic contributors includes signal inventories or Americans with Disabilities Act (ADA) compliancy in a central business district.
The fourth type is directed contributors. These are contributors that are specifically directed to collect data.
This includes contractors retained to collect data, summer interns, or maintenance staff directed to collect data while performing their normal duties. In most cases, these contributors would be compensated for collecting the data. Directed contributors are appropriate in the following circumstances:
- The data are needed urgently, or must be collected over a short period of time.
- Some training, equipment, or specialized expertise is needed.
- The reliability of the data is very important.
ADAMobile developed by Johnson, Mirmiran & Thompson (JMT), Inc., is an example of an application collected by directed contributors (10). The Delaware DOT retained JMT to collect pedestrian facilities inventory data including ADA compliance data in the field and provide real- time updates to the office. JMT collected the data over several months with a high level of reliability.
The authors also envision other contributor groups forming beyond these four categories as the need arises. An agency could construct an application for use by any of the four categories. The application could also recognize the type of contributor. Agencies could use this information to qualify the reliability of the data. For example, an agency could develop an intersection inventory as a combined effort of public contributors, qualified contributors from a local civic group, and directed contributors as part of a summer traffic counting program. The agency would characterize the reliability of the data collected for each intersection by the contributor group that collected it. Data from the public contributors would receive the lowest quality. In the case of redundant data (e.g., data for the same intersection collected by the local civic group and the 8 Collective Information White Paper Final Report summer traffic counting program), the agency would use the data collected by the higher contributor group.
Motivation for Collection
What is the incentive of the general public to collect data? Jonathan Zittrain, a social theorist and professor of internet law at Harvard Law School, has provided some input on the subject (11). Mr. Zittrain theorizes that the motivation in related crowdsourcing examples includes a general desire to help others but could also include recognition, reward, and entertainment. One notable example is that the problems page on the popular website Wikipedia has more visitors to the page looking to solve problems than the number of visitors there to report problems (12). Other interesting (non-transportation) examples on people contributing for the greater good include XPrize, Facewatch ID, and Vertices, among several others. Appendix B provides more detail on these non-transportation examples.
The motivation for data collection could differ by the type of contributor and the type of data being collected. Public contributors and qualified contributors may be motivated by altruistic motives of contributing to the greater good; for example, collecting data that will be used to improve school zones. This is particularly true for qualified contributors as their existing affiliation with a community group, such as the PTA, signals their willingness to contribute to the community. These same contributors may also be self-motivated or vested in the results. In the CycleTracks example, contributors were cyclists motivated to participate because the data were ultimately used to improve bicycle facilities. Contributors could also be motivated by the application itself. Using the CycleTracks example again, the application included some added functionality, beyond what was needed for the data collection, as a benefit to the contributors of the app. In addition to recording and sending GPS data to SFCTA servers, contributors could also see their recorded routes mapped with their distance and speed by trip. This added functionality was a motivation to use the application.
An application could also motivate contributors to use the app by being fun to use, or by introducing some elements of gaming. An application could include a scoring system so that contributors could be ranked against one another. This concept could be used in a sign inventory. Contributors would receive points both for the number of signs collected and the various categories of signs or diversity of collected signs. Each new type of sign would be like obtaining a tool or a randomized treasure in common adventure or role playing computer games. This concept could encourage contributors to collect data in a more distributed fashion. This may appeal to passengers on road trips. It may also appeal to players of the popular geocaching games or similar.
For directed contributors, and to some degree for opportunistic contributors, the motivation is compensation or negative consequences for not completing their assigned duties. Agencies could also use compensation to motivate public or qualified contributors. Unlike directed contributors, all contributors would not be compensated. Instead, the agency could introduce some randomization of compensation. For example, contributors could be randomly selected to receive gift cards. The data collected could serve as a contributor's entry into the random drawing. That is, the more data a contributor collects, the more entries they have in the drawing. Similarly, compensation could also be incorporated into the previously described gaming approach.
Regardless of the type of contributor, the overwhelming take-away is that the contributors must understand the benefit of the effort to which they are contributing. Therefore, any successful application must identify the expected contribution to improved safety and communicate that to the contributors as part of the application.
Training
Most applications will require some form of training for contributors. This training could be very basic, such as information on how to use the application, or more detailed training such as classifying highway signs or assessing the condition of a crosswalk. The amount of training required would be dependent on the application.
Agencies should consider the amount of training required when selecting the targeted contributor. Agencies could deliver training through traditional methods for qualified, opportunistic, and directed contributors. However, for public contributors, web or application based training is more appropriate. Several web-based collective data applications exist that could provide useful examples of training for classification data. The website Zooniverse uses public contributors to identify and classify galaxies. The contributors have to work through a self-paced training guide in which they essentially learn to classify galaxies by working example problems with some sample data (13). An agency could use this concept in the envisioned collective data applications. An agency could train contributors by working through the collection of some sample data on the device.
Risk and Safety of the Individuals
Any application used for collective data needs to prevent or discourage contributors from entering data while driving or from collecting data at a potentially hazardous location such as in the roadway. Agencies should consider this in the data input needs for specific types of locations. For example, an application should not require or allow the collection of video data on an Interstate.
There is always some risk that someone will use a mobile device in a situation that introduces a hazard to themselves or others such as driving while talking on the mobile device, or walking along a hazardous roadway. The application should launch with a disclaimer that warns the user of the hazards of using the application and instructions on appropriate use of the application. Specifically any active data collection should be limited to passengers in a vehicle or pedestrians 10 Collective Information White Paper Final Report safely out of the roadway. (A driver's device could collect passive data such as pavement condition.) As part of the registration process for the application, the user could agree to the conditions of appropriate use.
Agency Considerations
As previously described, the primary function for collective data applications is for use by public agencies to collect transportation data in support of their highway safety efforts. Cost- effectiveness is one of the primary motivations for agencies to seek this type of application. However, there are other agency considerations including the following:
-
Need for Access to a Larger or More Representative Data Set. Collective data applications may appeal to a broader base of contributors than traditional survey or data collection methods as smart phones are almost ubiquitous. This is important if the collected data needs to reflect the diversity of the contributors or be conducted in diverse locations throughout a jurisdiction. SFCTA compared the demographics of their contributors to the National Household Travel Survey (NHTS) respondents and found their CycleTracks contributor to be a more diverse and representative sample of their jurisdiction. According to a 2012 study by the Pew Internet & American Life Project, nearly half of Americans (45 percent) use a smart phone or other mobile device, and the number is growing (14).
-
Agency Size. The size of the agency and jurisdiction it manages would impact several of the considerations. Collective data efforts that would rely on a "community" of people would be most effective for local jurisdictions rather than statewide initiatives.
-
Ability to Administer the Application. Agencies need to have the ability to oversee the collective data application. A contractor could develop or manage the application; however, the responsible agency has to be able to clearly identify their data needs, requirements, and intended use for the data.
-
Ability to Manage the Data. Agencies would need the technology and knowledge on how to store and manage the additional data. This may require additional IT investments and training for the data managers.
-
Vision. Although the process of collecting the data will be more cost-effective than the traditional methods, the initial development of the data collection application will be a cost or time investment. This cost may be larger than the initial start-up costs for some traditional methods. Therefore, the agency needs to have a clear vision of how they will use the data so they can develop the application appropriately. An agency should also develop the application with some flexibility as their needs may evolve.
-
Existing Applications. Once an agency has established a clear vision, they should do some research to determine if another agency has already developed a similar application. This additional research could save valuable time and resources in the long-run by building off of existing applications rather than developing one from scratch.
-
Pilot the Effort. As with any large data collection effort, the agency should consider a pilot effort before launching a large scale effort. This is related to the previous point about flexibility in the application. It allows for the agency to identify and resolve problems or additional needs before wide-scale use.
-
Secondary Benefits of Data. An agency may find that there are some secondary benefits of the application of the collected data. For example, inventory data collected for safety analysis may also help with asset management.
-
Secondary Benefits of Process. Agencies that use collective data applications will gain experience, knowledge, and comfort/familiarity with collective data and mobile technologies. Agencies may view data collection and the actual data in a more dynamic, connected context. This process will fundamentally change how they view both the data and the process of data collection by introducing the agency to other possibilities and potentially generating ideas for future applications. This could be future crowdsourcing applications or future uses for mobile technologies in general.
Data Considerations
Data Accuracy
Accuracy concerns will certainly limit the potential applications, but in many cases, there can be tremendous value in collecting data through collective data applications, realizing that the accuracy of the data may be in question. Agencies might decide to use these applications as an interim measure, with a willingness not to allow the perfect to stand in the way of the good.
For the types of applications envisioned, data accuracy could include location accuracy (e.g., is the horizontal curve correctly placed), accuracy in classification of elements (e.g., is the sign a standard Manual of Uniform Traffic Control [MUTCD] R1-1 STOP sign (15)), accuracy in characterization of elements (e.g., does the intersection have a protected left turn on the northbound approach or a protected-permitted left turn), and accuracy in measurement (e.g., does the ramp actually have a 10% grade). Before using collective data for their efforts, the agency should realize that the accuracy of the data has some limitations when collected through this method.
Location accuracy for most data elements would be limited to the location accuracy of the mobile device or the contributor's ability to locate the position of the element or asset on a map. The application could increase the location accuracy by providing two measures of location. That is, the contributor's position as understood by the mobile device could be identified on a map and the contributor would be asked to verify that the position reflects their understanding of their location. Classification accuracy and characterization accuracy would be dependent on the contributor's ability to understand the training or direction provided and apply that to their characterization or classification of elements. Therefore, the skill of the contributor is critical.
Quality Control
The type of contributors affects the expected quality and therefore the quality control required. Directed contributors should provide higher quality data and require less quality control. The 12 Collective Information White Paper Final Report application could also be more complex and the data collected could be more in-depth. Public contributors are at the other extreme. There would be an increased priority to validate and normalize the data that are collected by these contributors.
An agency can employ several methods to increase the accuracy or at least understand the accuracy of the data through a quality control process. As previously discussed, there may be value to collecting the information, realizing the accuracy of the information may be in question. However, the agency needs to have an understanding of the accuracy of the data.
The first method is to provide some ground truthing. That is, a sample of data would be collected by, or on behalf of, the agency with more traditional data collection methods. The agency would consider these data the ground truth data. The agency would then compare the ground truth data to data collected through the application to determine the relative accuracy of the application data. This could be done for a sample of data (e.g., one percent of all observations) or in a structured fashion to provide a fixed reference point in the collected data that contributors could use to conduct their own quality control (e.g., one sign in every mile of segment is characterized).
The second method is through the use of redundancy. That is, multiple contributors collect information on the same element. The values of multiple contributors are compared for similar responses. In this case, the majority prevails. Galaxy Zoo, part of the previously described Zooniverse, uses this process to control the quality of their data, noting, "Having multiple classifications of the same object is important, as it allows us to assess how reliable each one is. For some projects, we may only need a few thousand galaxies but want to be sure they're all spirals. No problem — just use those that 100% of classifiers agree on. For other projects we might want larger numbers of galaxies, so might use those that a majority say are spiral. (16)"
The third method, ratings, incorporates elements of the first two methods. With this ratings method, contributors receive a rating on the quality of their data entry. This rating could be assigned by comparing the contributor's individual collected data to the ground truthing data or to the data collected by other contributors. This rating would be similar to the seller ratings used in the popular online auction site, EBay. Similarly, the rating could be applied to individual elements instead of individual contributors. Agencies could tag those elements that show low consistency in contributed data based on the redundancy checks with a rating. For example, 75 percent of contributors agree on the left turn protection at this intersection. Agencies may decide that some elements or specific locations need to be collected with traditional methods to supplement the collected data.
Analysis and Reporting
Analysis and reporting are important data considerations for any data collection effort. Specific to collective data, the application should provide feedback to the contributors. This is important for at least two reasons. The first is related to quality control and providing feedback to the contributors. The second is related to the motivation to the contributors. As previously 13 Collective Information White Paper Final Report described, the desire to contribute to the greater cause will be one of the primary motivations for the contributors using the application. Information as simple as summary statistics on the data collection effort can help to provide the necessary feedback to the contributors to keep them engaged in the effort.
Application Considerations
Just over five years ago the only feasible solutions for customized field data collection was through the use of hand-held data collectors, common with surveyors, or some type of ruggedized or semi-ruggedized laptop. If photographs or geo-location were required, the data collection team needed to bring a camera and/or a GPS device in the field. Connecting all these devices reliably in the field was sometimes a challenge. There were options of integrated hardware, but the cost of these solutions limited the use to professional services firms that could afford them, in turn limiting who could collect these data.
Today, there is a vast array of platforms available to agencies to collect data. The majority of these devices are consumer level devices. Connectivity and processing power on the mobile devices continues to improve. Built-in features on devices such as cameras, video recorders, GPS, and accelerometers enhance the ability of data collection.
There are many options for developing a data collection application for a laptop or a data collector, but for a smart phone, presently there are only two viable approaches for data collection — web based applications or native apps such as what you would find on the iTunes App Store or the Google Apps Marketplace. There are pros and cons for both options.
The potential benefit of developing an application on a web platform is that you can develop it once and it is usable on any device. There are many browsers and mobile development tool sets that make this mobile web development more streamlined, but the challenge is that the variety of browsers and hardware platforms will typically require a lot of compatibility testing. Native apps can leverage the built-in functionality of the device, but may require specialized development resources. Additionally, an agency will need to either select a platform to support, such as the iPhone, Android, or Windows, or invest in developing the application on multiple platforms to get as much penetration as possible.