USA Banner

Official US Government Icon

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure Site Icon

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

U.S. Department of Transportation U.S. Department of Transportation Icon United States Department of Transportation United States Department of Transportation
Turner-Fairbank logo
OFFICE OF RESEARCH, DEVELOPMENT, AND TECHNOLOGY AT THE TURNER-FAIRBANK HIGHWAY RESEARCH CENTER

Safety Data Resources

Task B3-3: Identify CMF Research Needs—Safety Data Resources

This document identifies safety databases that could be used to help accomplish the following tasks, through other Federal Highway Administration (FHWA) or partner efforts, related to crash modification factor (CMF) development and advancement.

  1. Identify and prioritize current CMF research needs (i.e., those already proposed—a near-term goal).
  2. Identify, prioritize, and coordinate future CMF research that will yield more reliable CMFs and may be more cost effective than current practices (a mid- to long-term goal). The relevant questions for future research needs include:
    1. What resources are available and how can they be used?
    2. What parties can be involved?
    3. What tools are available and do better ones exist or can improvements be made to existing tools?
    4. What are the methodological needs and what efforts are needed or underway to meet those needs?
  3. Support and advance innovation in safety countermeasures to further reduce crash fatalities and severe injuries associated with prioritized safety needs.
  4. Identify the current FHWA efforts and emerging statistical methodologies (e.g., those discussed at the recent DCMF Task B2 Technical Experts Meeting) that may support current needs, identify appropriate stakeholders that could be involved in promoting this effort, and determine priority research needs that have not been identified.

The following databases are relevant to supporting the four tasks listed above:

  • Fatality Analysis Reporting System (FARS).
  • General Estimates System (GES).
  • Crash Report Sampling System (CRSS).
  • Crashworthiness Data System (CDS).
  • Crash Investigation Sampling System (CISS).
  • National Motor Vehicles Crash Causation Study (NMVCCS).
  • Crash Injury Research and Engineering Network (CIREN).
  • Motor Carriers Management Information System (MCMIS).
  • Federal Transit Administration (FTA) National Transit Database (NTD).
  • National EMS Information System (NEMSIS).
  • Second Strategic Highway Research Program (SHRP2) Naturalistic Driving and Roadway Databases.
  • National Park Service Service-wide Traffic Accident Reporting System (STARS).
  • Highway Safety Information System (HSIS).

Tables 1 – 4 provide a summary of these databases, including critical aspects of each database with respect to Task B3. Specifically, the tables provide summary information such as the sponsoring agency, data coverage, data years, data availability, and database content. The last row of each table identifies the applicability to Tasks A – D above. The results of this task will be used as a springboard to additional efforts in the future.

 

Table 1 Summary of National Crash Database

 

FARS

GES

CRSS

CDS

CISS

NMVCCS

Who houses and maintains the data?

National Automotive Sampling System (NASS); directed by the National Center for Statistics and Analysis (NCSA), which is a component of Policy and Operations in the National Highway Traffic Safety Administration (NHTSA).

NASS; directed by NCSA, a component of Policy and
Operations in NHTSA.

NASS; directed by NCSA, a component of Policy and
Operations in NHTSA.

NASS; directed by NCSA, a component of Policy and
Operations in NHTSA.

NASS; directed by NCSA, a component of Policy and
Operations in NHTSA.

NASS; directed by NCSA, a component of Policy and
Operations in NHTSA.

What is the spatial coverage of the data?

All qualifying fatal crashes within the 50 States, the District of
Columbia, and Puerto Rico.

Obtained from 60 geographic sites that reflect the geography, roadway mileage, population, and traffic density of the United States; approximately 400 police jurisdictions included in the sampling.

Obtained from 60 selected areas that reflect the geography, population, miles driven, and crashes in the United States

Obtained from 24 geographic sites that reflect the geography, roadway mileage, population, and traffic density of the United States.

Random selections of thousands of police crash reports at law enforcement agencies in selected areas that reflect the geography, population, miles driven, and crashes in the United States.

Sample of crashes in 24 primary sampling units (PSUs), centered on large cities/counties/metro areas; include cities and counties in AL, AZ, CA, CO, FL, IL, IN, MD, MI, NE, NJ, NY, NC, PA, TN, TX, WA.

What years of data are in the database?

1975 to 2021

1988 to 2015

2016 to 2021

2004 to 2015

2016 to 2021

January 2005 to December 2007

What is the general availability of the data?

Publicly available

Publicly available

Publicly available

Publicly available

Publicly available

Publicly available

How are the data collected? How are the data coded?

Cooperative agreement with agency in each State to provide information in standard format on fatal crashes in the State; data collected, coded and submitted into database. The data are coded for:

  • Crash variables.
  • Vehicle variables.
  • Person variables.

Data collectors make weekly, biweekly, or monthly visits to selected police agencies, and randomly sample about 50,000 police accident reports (PARs) each year; approximately 90 data elements; for privacy reasons, no personal information nor specific crash location is coded.

Data collectors visit the selected police jurisdictions weekly, sample and copy police crash reports (PCRs) and send them to a central contractor for coding; trained CRSS coders interpret and code data directly from PCRs into an electronic data file; approximately 120 data elements are captured.

Twenty-four research teams at PSUs study between 3,000 and 5,000 crashes a year involving passenger cars, light trucks, vans, and utility vehicles; investigators obtain data from selected police agencies, crash sites, and study all available evidence; interview crash victims and review medical records; more than
600 elements coded; for privacy reasons, no personal information or specific crash location is coded.

Technicians obtain data from crash sites by documenting scene evidence (e.g., skid marks, fluid spills, struck objects), crash damages, interior components that occupants contacted, interviews of crash victims, and medical records for the injured; no personal information is included.

Investigated crash locations while first responders were still onsite; reconstruct crash by collecting all available data and interviewing witnesses; identify critical precrash event, critical reason for crash event, and other associated factors; over 500 elements coded.

Does the database include all crashes for the coverage area (i.e., the population) or just a portion of the crashes (i.e., a sample)?

Includes population of crashes with fatal outcome; fatalities are defined as a death to an individual occurring within 30 days of a crash due to injuries sustained in the crash.

Includes only portion of crashes, sampled randomly from 60 geographic sites and some 400 police agencies across the United States.

Includes only portion of crashes; it is a nationally representative probability sample selected from the estimated 5 to 6 million police-reported crashes that occur annually.

Includes only portion of crashes, sampled randomly from 24 geographic sites across the United States.

Includes only portion of crashes using a stratified, multi-stage, and multiphase sampling system.

Sample of crashes from each PSU.

How are crash severity levels defined?

KABCO

KABCO

KABCO

KABCO and sometimes Abbreviated Injury Scale (AIS)

AIS 2015

KABCO, plus:

  • Died prior to crash.
  • Unknown if injured.

What is the vehicle type coverage?

All vehicle types.

All vehicle types.

All vehicle types.

Crashes involving at least one light vehicle <10,000 lbs.

All vehicle types.

Crashes involving at least one light vehicle <10,000 lbs.

If data are just a sample, how was the sampling done?

NA

(1) Selection of primary sampling units.
(2) Selection of police jurisdictions.
(3) Selection of crashes.

(1) Selection of primary sampling units.
(2) Selection of police jurisdictions.
(3) Selection of crashes.

(1) Selection of primary sampling units.
(2) Selection of police jurisdictions.
(3) Selection of crashes.

(1) Selection of primary sampling units.
(2) Selection of police jurisdictions.
(3) Selection of crashes.

Six-hour sampling time period (between 6AM and midnight) selected each week; then divided into sampling days with tendency to maximize probability of observing crash during selected sampling periods.

If just a sample, what (if any) guidance is given to incorporate the sampling procedure into data analysis?

NA

A national weight has been added to the file for each PAR and is called "WEIGHT." This weight is the product of the inverse of the probabilities of selection at each of the three stages in the sampling process.

A national weight has been added to the CRSS analysis file and is called "WEIGHT." This weight incorporates selection probabilities, non-response bias, coverage bias, duplicate crashes, and benchmarking Census resident population counts and FARS crash counts.

Data are weighted to represent all police reported motor vehicle crashes occurring in the USA during the year involving passenger cars, light trucks and vans that were towed due to damage.

A weight has been added to the data file. This weight incorporates selection probabilities, non-response adjustments, coverage bias, benchmarking Census resident population information, and truncation of large case weights.

A comprehensive weighting procedure, that makes the NMVCCS sample nationally representative, consists of mainly two phases, the design weight and its appropriate adjustment.

Table 2 Summary of National Crash Database (Cont.)

 

CIREN

MCMIS

STARS

Who houses and maintains the data?

NHTSA

Federal Motor Carrier Safety Administration (FMCSA)

National Park Service (NPS)

What is the spatial coverage of the data?

Sample of crashes collected by Crash Injury Research Engineering Network teams, which consist of three medical centers and three engineering centers in Washington, Wisconsin, Virginia, Maryland, and Alabama.

All qualifying crashes involving motor carriers with USDOT numbers within the 50 States, the District of Columbia, and Puerto Rico.

All motor vehicle collisions that occur within National Park Service jurisdiction.

What years of data are in the database?

2007 to 2022

1989 to present

1990–2005

What is the general availability of the data?

Publicly available

Available to the general public through the MCMIS Data Dissemination Program with a fee, formal request needed.

No direct access online, formal request needed.

How are the data collected? How are the data coded?

Each Center collects detailed crash and medical data on about 50 crashes per year. Personal and location identifiers and highly sensitive medical information have been removed from the public files to protect patient confidentiality; 650 National Automotive Sampling System (NASS) Crashworthiness Data System (CDS) data elements and 250 medical and injury data elements coded.

Quarterly update from field offices through SAFETYNET, CAPRI, and other sources. The data are coded for: crash variables, census variables, and inspection variables. Inspection data is conducted at the roadside by state personnel under the Motor Carrier Safety Assistance Program (MCSAP).

Obtained from Motor Vehicle Accident Report. The data is coded for crash variables.

Does the database include all crashes for the coverage area (i.e., the population) or just a portion of the crashes (i.e., a sample)?

Includes only crashes with serious injury.

Include only reported crashes involving commercial motor carriers (truck & bus) and hazardous material shippers.

All reported crashes.

How are crash severity levels defined?

ISS/MAIS Scale

National Governors’ Association crash thresholds. 
Injury crashes: person injured is immediately taken to a medical facility. 
Tow-away crashes: at least one vehicle is towed from the scene as a result of disabling damage suffered in the crash.

Fatal, Injury, PDO

What is the vehicle type coverage?

All vehicle types.

Trucks, buses, passenger cars, and light trucks with United States Department of Transportation numbers or HAZMAT placard.

All vehicle types.

If data are just a sample, how was the sampling done?

Admission to participating CIREN Center. Severely injured and transported to Level 1 trauma center. Injury required: (1) at least one AIS3+ injury, (2) AIS2 injury in two different AIS body regions, (3) significant particular injury to a lower extremity (AIS2). Vehicle model no older than 6 years. Restraint: (1) frontal crash – Air bag and/or belt required, (2) side impact – Unbelted is acceptable, (3) rollover – eject occupants are excluded.

NA

NA

If just a sample, what (if any) guidance is given to incorporate the sampling procedure into data analysis?

None.

NA

NA

To which Tasks (A – D) is the database applicable?

 

General: Conduct research related to vehicles, occupants, and nonmotorized road users involved in a crash (e.g., identify motor vehicle design features that offer maximum occupant protection).

C: Support and advance innovation in safety countermeasures to further reduce crash fatalities and severe injuries associated with prioritized safety needs.
D: Determine priority research needs that have not been identified.

General: Support and evaluate motor carrier safety programs and regulations.

C: Support and advance innovation in motor carrier-related safety countermeasures to further reduce crash fatalities and severe injuries associated with prioritized safety needs.
D: Determine priority research needs related to motor carriers that have not been identified.

General: Support and evaluate NPS safety programs and regulations.

C: Support and advance innovation in safety countermeasures to further reduce crash fatalities and severe injuries associated with prioritized safety needs.
D: Determine priority research needs that have not been identified.

Note: the NPS STARS database may have limited potential for the DCMF project and future efforts to advance CMF development.

Table 3 Summary of Other National Databases

 

NTD

NEMSIS

SHRP2

Who houses and maintains the data?

Federal Transit Administration (FTA)

NHTSA Office of Emergency Medical Services

Virginia Tech Transportation Institute (VTTI)

What is the spatial coverage of the data?

National transit-related reportable incidents.

National repository for EMS data. As of 2022, there are 54 states and territories that are contributing to the dataset.

The naturalistic driving study (NDS) data and roadway information database (RID) were based on data gathered in six states (Florida, Indiana, New York, North Carolina, Pennsylvania, and Washington).

What years of data are in the database?

2002 to 2021

2008 to 2022

2010 to 2013

What is the general availability of the data?

Publicly available

Publicly available by submitting a request form

Data are available to qualified researchers with a data use license with VTTI.

How are the data collected? How are the data coded?

The system derives data from transit providers, States, or Metropolitan Planning Organizations (MPOs) that are recipients and beneficiaries of grants. There are 55 data fields that are collected from six different forms for safety and security.

The NEMSIS project was developed to help states collect more standardized elements and eventually submit the data to a national emergency medical services (EMS) database.

The Naturalistic Driving Study (NDS) data were collected by instrumenting vehicles to record vehicle location, forward radar, vehicle control positions, and video of the forward roadway and of the driver’s face and hands. Crash investigations were conducted after certain crashes to gather more detailed data.

The RID contains new roadway data gathered by automated data collection vehicles and existing data provided by agencies (i.e., State DOTs, MPOs, and counties). The roadway data include roadway inventory information, crash histories, traffic, weather, roadway improvements, work zones, safety laws, and enforcement campaigns.

Does the database include all crashes for the coverage area (i.e., the population) or just a portion of the crashes (i.e., a sample)?

The database includes transit-related reportable incidents. Note that not all incidents are considered to be reportable. If an incident is not related to and does not affect revenue operations, then it is considered to be nonreportable.

Events submitted by States do not necessarily represent all EMS events occurring within the State.

The naturalistic driving study (NDS) database includes detailed data on more than 5.8 million trips, 33 million travel miles, and 1.4 million driving hours from more than 3,100 participants of various ages across the country. The database represents continuous data from all trips taken by volunteer participants over one to two years. 

The RID contains approximately 12,500 centerline miles of quality-checked data collected for SHRP 2 in six states (FL, IN, NC, NY, PA, WA). The existing data contains more than 200,000 centerline miles.

How are crash severity levels defined?

Incidents, injuries, fatalities

Possible injury (yes/no)

Unknown

What is the vehicle type coverage?

Transit vehicles, including the following modes: Automated Guideway, Commuter Bus, Cable Car, Demand Response, Demand Response-Taxi, Ferryboat, Inclined Plane, Heavy Rail, Jitney, Light Rail, Motor Bus, Monorail/Guideway, Monorail, Público, Bus Rapid Transit, Streetcar Rail, Trolleybus, Aerial Tramway, Vanpool, and Hybrid Rail

All vehicle types

Passenger vehicles

If data is just a sample, how was the sampling done?

NA

States vary in criteria used to determine the types of EMS events submitted to the NEMSIS dataset.

Six locations were selected in the United States to represent geographic diversity and to provide a range of driver, vehicle, and roadway conditions. However, it is not a nationally representative sample.

If just a sample, what (if any) guidance is given to incorporate the sampling procedure into data analysis?

NA

No

No

To which Tasks (A – D) is the database applicable?

General: United States’ primary source of transit system information and statistics. Investigate transit-related crashes, including the injuries and fatalities by type and mode.

C: Support and advance innovation in transit-related safety countermeasures to further reduce fatalities and severe injuries associated with prioritized safety needs.
D: Determine priority research needs related to transit that have not been identified.

General: Evaluate patient and EMS system outcomes.

General (Note: the following list provides examples of potential uses of SHRP2 data):

  • Understand the contributing and causal factors in crashes.
  • Understand how the driver interacts with and adapts to the vehicle, traffic, roadway characteristics, traffic control devices, and the environment.
  • Identify the relationship between crashes, conflicts, and crash surrogates.
  • Formulate exposure-based risk measures using surrogate measures.
  • Investigate the potential for new countermeasures related to the design of the roadway and vehicles as well as public policy and enforcement.
  • Enhance driver training programs to demonstrate appropriate and inappropriate driver behavior.
  • The RID provides a model for developing linked datasets for asset management purposes.

Table 4 Summary of Seven HSIS Databases

 

California

Illinois

Maine

Minnesota

North Carolina

Ohio

Washington

Who houses and maintains the data?

USDOT Secure Data Commons (SDC) houses the data which is maintained by VHB under contract with Federal Highway Administration (FHWA).

What is the spatial coverage of the data?

Statewide

What years of data are in the database?

1991 to 2021

1985 to 2021

1985 to 2021

1985 to 2021

1990 to 2020

1997 to 2021

1993 to 1996, 1999 to 2020

What is the general availability of the data?

Data can be provided in different formats (e.g., Shapefiles, CSV, Excel) via a ShareFile link. The data can be requested by filling out an HSIS data request form online on the HSIS web site.

How are the data collected? How are the data coded?

Crash and roadway data received from California. Data include roadway, intersection, interchange ramp, crash, and unit files.

Crash data received from Illinois. Roadway data downloaded from Illinois DOT website. Data include roadway, crash, unit, and person files.

Crash and roadway data received from Maine. Data include roadway, intersection node, interchange, crash, unit, commercial vehicle, and person files.

Crash and roadway data received from Minnesota. Data include roadway, intersection, horizontal curve, intersection approach, traffic signal, interchange, lighting unit, lighting systems, roadside barrier, roadside barrier terminal, sign support, crash, unit, and person files.

Crash data received from North Carolina. Roadway data downloaded from North Carolina DOT website. Data include roadway, traffic signal, interchange, horizontal curve, freeway exit, crash, unit, and person files.

Crash and roadway data received from Ohio. Data include roadway, horizontal curve, intersection, intersection approach, barrier, lighting, bicycle route, crash, unit, and person files.

Crash data received from Washington. Roadway data downloaded from Washington DOT website. Data include roadway, horizontal curve, grade, crash, unit, and person files.

Does the database include all crashes for the coverage area or just a portion of the crashes (i.e., a sample)?

All reported crashes, primarily on the State-maintained system.  This varies slightly by State.

How are crash severity levels defined?

KABCO

KABCO

KABCO

KABCO

KABCO

KABCO

KABCO

What is the vehicle type coverage?

All vehicle types, distinguished between vehicle type.

All vehicle types, distinguished between vehicle type.

All vehicle types, distinguished between vehicle type.

All vehicle types, distinguished between vehicle make, model, and year.

All vehicle types, distinguished between vehicle type, make, and model.

All vehicle types, distinguished between vehicle type, make, model, and year.

All vehicle types, distinguished between vehicle type, make, model, and year.

If data is just a sample, how was the sampling done?

NA

NA

NA

NA

NA

NA

NA

If just a sample, what (if any) guidance is given to incorporate the sampling procedure into data analysis?

NA

NA

NA

NA

NA

NA

NA

To which Tasks (A – D) is the database applicable?

General: The HSIS database has numerous general applications, as do many of the databases listed in this document.
A: Prioritize current CMF research needs based on the magnitude and severity of crashes at specific locations (e.g., curves, intersections, segments, etc.).
B: Prioritize future CMF research needs based on the magnitude and severity of crashes at specific locations (e.g., curves, intersections, segments, etc.).
C: Support and advance innovation in safety countermeasures to further reduce crash fatalities and severe injuries associated with prioritized safety needs.
D: Determine priority research needs that have not been identified based on the investigation of crashes and crash severity at specific locations (e.g., curves, intersections, segments, etc.).