Safety Data Resources
Task B3-3: Identify CMF Research Needs—Safety Data Resources
This document identifies safety databases that could be used to help accomplish the following tasks, through other Federal Highway Administration (FHWA) or partner efforts, related to crash modification factor (CMF) development and advancement.
- Identify and prioritize current CMF research needs (i.e., those already proposed—a near-term goal).
- Identify, prioritize, and coordinate future CMF research that will yield more reliable CMFs and may be more cost effective than current practices (a mid- to long-term goal). The relevant questions for future research needs include:
- What resources are available and how can they be used?
- What parties can be involved?
- What tools are available and do better ones exist or can improvements be made to existing tools?
- What are the methodological needs and what efforts are needed or underway to meet those needs?
- Support and advance innovation in safety countermeasures to further reduce crash fatalities and severe injuries associated with prioritized safety needs.
- Identify the current FHWA efforts and emerging statistical methodologies (e.g., those discussed at the recent DCMF Task B2 Technical Experts Meeting) that may support current needs, identify appropriate stakeholders that could be involved in promoting this effort, and determine priority research needs that have not been identified.
The following databases are relevant to supporting the four tasks listed above:
- Fatality Analysis Reporting System (FARS).
- General Estimates System (GES).
- Crash Report Sampling System (CRSS).
- Crashworthiness Data System (CDS).
- Crash Investigation Sampling System (CISS).
- National Motor Vehicles Crash Causation Study (NMVCCS).
- Crash Injury Research and Engineering Network (CIREN).
- Motor Carriers Management Information System (MCMIS).
- Federal Transit Administration (FTA) National Transit Database (NTD).
- National EMS Information System (NEMSIS).
- Second Strategic Highway Research Program (SHRP2) Naturalistic Driving and Roadway Databases.
- National Park Service Service-wide Traffic Accident Reporting System (STARS).
- Highway Safety Information System (HSIS).
Tables 1 – 4 provide a summary of these databases, including critical aspects of each database with respect to Task B3. Specifically, the tables provide summary information such as the sponsoring agency, data coverage, data years, data availability, and database content. The last row of each table identifies the applicability to Tasks A – D above. The results of this task will be used as a springboard to additional efforts in the future.
Table 1 Summary of National Crash Database
Who houses and maintains the data? |
National Automotive Sampling System (NASS); directed by the National Center for Statistics and Analysis (NCSA), which is a component of Policy and Operations in the National Highway Traffic Safety Administration (NHTSA). |
NASS; directed by NCSA, a component of Policy and |
NASS; directed by NCSA, a component of Policy and |
NASS; directed by NCSA, a component of Policy and |
NASS; directed by NCSA, a component of Policy and |
NASS; directed by NCSA, a component of Policy and |
What is the spatial coverage of the data? |
All qualifying fatal crashes within the 50 States, the District of |
Obtained from 60 geographic sites that reflect the geography, roadway mileage, population, and traffic density of the United States; approximately 400 police jurisdictions included in the sampling. |
Obtained from 60 selected areas that reflect the geography, population, miles driven, and crashes in the United States |
Obtained from 24 geographic sites that reflect the geography, roadway mileage, population, and traffic density of the United States. |
Random selections of thousands of police crash reports at law enforcement agencies in selected areas that reflect the geography, population, miles driven, and crashes in the United States. |
Sample of crashes in 24 primary sampling units (PSUs), centered on large cities/counties/metro areas; include cities and counties in AL, AZ, CA, CO, FL, IL, IN, MD, MI, NE, NJ, NY, NC, PA, TN, TX, WA. |
What years of data are in the database? |
1975 to 2021 |
1988 to 2015 |
2016 to 2021 |
2004 to 2015 |
2016 to 2021 |
January 2005 to December 2007 |
What is the general availability of the data? |
||||||
How are the data collected? How are the data coded? |
Cooperative agreement with agency in each State to provide information in standard format on fatal crashes in the State; data collected, coded and submitted into database. The data are coded for:
|
Data collectors make weekly, biweekly, or monthly visits to selected police agencies, and randomly sample about 50,000 police accident reports (PARs) each year; approximately 90 data elements; for privacy reasons, no personal information nor specific crash location is coded. |
Data collectors visit the selected police jurisdictions weekly, sample and copy police crash reports (PCRs) and send them to a central contractor for coding; trained CRSS coders interpret and code data directly from PCRs into an electronic data file; approximately 120 data elements are captured. |
Twenty-four research teams at PSUs study between 3,000 and 5,000 crashes a year involving passenger cars, light trucks, vans, and utility vehicles; investigators obtain data from selected police agencies, crash sites, and study all available evidence; interview crash victims and review medical records; more than |
Technicians obtain data from crash sites by documenting scene evidence (e.g., skid marks, fluid spills, struck objects), crash damages, interior components that occupants contacted, interviews of crash victims, and medical records for the injured; no personal information is included. |
Investigated crash locations while first responders were still onsite; reconstruct crash by collecting all available data and interviewing witnesses; identify critical precrash event, critical reason for crash event, and other associated factors; over 500 elements coded. |
Does the database include all crashes for the coverage area (i.e., the population) or just a portion of the crashes (i.e., a sample)? |
Includes population of crashes with fatal outcome; fatalities are defined as a death to an individual occurring within 30 days of a crash due to injuries sustained in the crash. |
Includes only portion of crashes, sampled randomly from 60 geographic sites and some 400 police agencies across the United States. |
Includes only portion of crashes; it is a nationally representative probability sample selected from the estimated 5 to 6 million police-reported crashes that occur annually. |
Includes only portion of crashes, sampled randomly from 24 geographic sites across the United States. |
Includes only portion of crashes using a stratified, multi-stage, and multiphase sampling system. |
Sample of crashes from each PSU. |
How are crash severity levels defined? |
KABCO |
KABCO |
KABCO |
KABCO and sometimes Abbreviated Injury Scale (AIS) |
AIS 2015 |
KABCO, plus:
|
What is the vehicle type coverage? |
All vehicle types. |
All vehicle types. |
All vehicle types. |
Crashes involving at least one light vehicle <10,000 lbs. |
All vehicle types. |
Crashes involving at least one light vehicle <10,000 lbs. |
If data are just a sample, how was the sampling done? |
NA |
(1) Selection of primary sampling units. |
(1) Selection of primary sampling units. |
(1) Selection of primary sampling units. |
(1) Selection of primary sampling units. |
Six-hour sampling time period (between 6AM and midnight) selected each week; then divided into sampling days with tendency to maximize probability of observing crash during selected sampling periods. |
If just a sample, what (if any) guidance is given to incorporate the sampling procedure into data analysis? |
NA |
A national weight has been added to the file for each PAR and is called "WEIGHT." This weight is the product of the inverse of the probabilities of selection at each of the three stages in the sampling process. |
A national weight has been added to the CRSS analysis file and is called "WEIGHT." This weight incorporates selection probabilities, non-response bias, coverage bias, duplicate crashes, and benchmarking Census resident population counts and FARS crash counts. |
Data are weighted to represent all police reported motor vehicle crashes occurring in the USA during the year involving passenger cars, light trucks and vans that were towed due to damage. |
A weight has been added to the data file. This weight incorporates selection probabilities, non-response adjustments, coverage bias, benchmarking Census resident population information, and truncation of large case weights. |
A comprehensive weighting procedure, that makes the NMVCCS sample nationally representative, consists of mainly two phases, the design weight and its appropriate adjustment. |
Table 2 Summary of National Crash Database (Cont.)
STARS |
|||
Who houses and maintains the data? |
NHTSA |
Federal Motor Carrier Safety Administration (FMCSA) |
National Park Service (NPS) |
What is the spatial coverage of the data? |
Sample of crashes collected by Crash Injury Research Engineering Network teams, which consist of three medical centers and three engineering centers in Washington, Wisconsin, Virginia, Maryland, and Alabama. |
All qualifying crashes involving motor carriers with USDOT numbers within the 50 States, the District of Columbia, and Puerto Rico. |
All motor vehicle collisions that occur within National Park Service jurisdiction. |
What years of data are in the database? |
2007 to 2022 |
1989 to present |
1990–2005 |
What is the general availability of the data? |
Available to the general public through the MCMIS Data Dissemination Program with a fee, formal request needed. |
No direct access online, formal request needed. |
|
How are the data collected? How are the data coded? |
Each Center collects detailed crash and medical data on about 50 crashes per year. Personal and location identifiers and highly sensitive medical information have been removed from the public files to protect patient confidentiality; 650 National Automotive Sampling System (NASS) Crashworthiness Data System (CDS) data elements and 250 medical and injury data elements coded. |
Quarterly update from field offices through SAFETYNET, CAPRI, and other sources. The data are coded for: crash variables, census variables, and inspection variables. Inspection data is conducted at the roadside by state personnel under the Motor Carrier Safety Assistance Program (MCSAP). |
Obtained from Motor Vehicle Accident Report. The data is coded for crash variables. |
Does the database include all crashes for the coverage area (i.e., the population) or just a portion of the crashes (i.e., a sample)? |
Includes only crashes with serious injury. |
Include only reported crashes involving commercial motor carriers (truck & bus) and hazardous material shippers. |
All reported crashes. |
How are crash severity levels defined? |
ISS/MAIS Scale |
National Governors’ Association crash thresholds. |
Fatal, Injury, PDO |
What is the vehicle type coverage? |
All vehicle types. |
Trucks, buses, passenger cars, and light trucks with United States Department of Transportation numbers or HAZMAT placard. |
All vehicle types. |
If data are just a sample, how was the sampling done? |
Admission to participating CIREN Center. Severely injured and transported to Level 1 trauma center. Injury required: (1) at least one AIS3+ injury, (2) AIS2 injury in two different AIS body regions, (3) significant particular injury to a lower extremity (AIS2). Vehicle model no older than 6 years. Restraint: (1) frontal crash – Air bag and/or belt required, (2) side impact – Unbelted is acceptable, (3) rollover – eject occupants are excluded. |
NA |
NA |
If just a sample, what (if any) guidance is given to incorporate the sampling procedure into data analysis? |
None. |
NA |
NA |
To which Tasks (A – D) is the database applicable?
|
General: Conduct research related to vehicles, occupants, and nonmotorized road users involved in a crash (e.g., identify motor vehicle design features that offer maximum occupant protection). |
General: Support and evaluate motor carrier safety programs and regulations. |
General: Support and evaluate NPS safety programs and regulations. |
Table 3 Summary of Other National Databases
Who houses and maintains the data? |
Federal Transit Administration (FTA) |
NHTSA Office of Emergency Medical Services |
Virginia Tech Transportation Institute (VTTI) |
What is the spatial coverage of the data? |
National transit-related reportable incidents. |
National repository for EMS data. As of 2022, there are 54 states and territories that are contributing to the dataset. |
The naturalistic driving study (NDS) data and roadway information database (RID) were based on data gathered in six states (Florida, Indiana, New York, North Carolina, Pennsylvania, and Washington). |
What years of data are in the database? |
2002 to 2021 |
2008 to 2022 |
2010 to 2013 |
What is the general availability of the data? |
Data are available to qualified researchers with a data use license with VTTI. |
||
How are the data collected? How are the data coded? |
The system derives data from transit providers, States, or Metropolitan Planning Organizations (MPOs) that are recipients and beneficiaries of grants. There are 55 data fields that are collected from six different forms for safety and security. |
The NEMSIS project was developed to help states collect more standardized elements and eventually submit the data to a national emergency medical services (EMS) database. |
The Naturalistic Driving Study (NDS) data were collected by instrumenting vehicles to record vehicle location, forward radar, vehicle control positions, and video of the forward roadway and of the driver’s face and hands. Crash investigations were conducted after certain crashes to gather more detailed data. |
Does the database include all crashes for the coverage area (i.e., the population) or just a portion of the crashes (i.e., a sample)? |
The database includes transit-related reportable incidents. Note that not all incidents are considered to be reportable. If an incident is not related to and does not affect revenue operations, then it is considered to be nonreportable. |
Events submitted by States do not necessarily represent all EMS events occurring within the State. |
The naturalistic driving study (NDS) database includes detailed data on more than 5.8 million trips, 33 million travel miles, and 1.4 million driving hours from more than 3,100 participants of various ages across the country. The database represents continuous data from all trips taken by volunteer participants over one to two years. |
How are crash severity levels defined? |
Incidents, injuries, fatalities |
Possible injury (yes/no) |
Unknown |
What is the vehicle type coverage? |
Transit vehicles, including the following modes: Automated Guideway, Commuter Bus, Cable Car, Demand Response, Demand Response-Taxi, Ferryboat, Inclined Plane, Heavy Rail, Jitney, Light Rail, Motor Bus, Monorail/Guideway, Monorail, Público, Bus Rapid Transit, Streetcar Rail, Trolleybus, Aerial Tramway, Vanpool, and Hybrid Rail |
All vehicle types |
Passenger vehicles |
If data is just a sample, how was the sampling done? |
NA |
States vary in criteria used to determine the types of EMS events submitted to the NEMSIS dataset. |
Six locations were selected in the United States to represent geographic diversity and to provide a range of driver, vehicle, and roadway conditions. However, it is not a nationally representative sample. |
If just a sample, what (if any) guidance is given to incorporate the sampling procedure into data analysis? |
NA |
No |
No |
To which Tasks (A – D) is the database applicable? |
General: United States’ primary source of transit system information and statistics. Investigate transit-related crashes, including the injuries and fatalities by type and mode. |
General: Evaluate patient and EMS system outcomes. |
General (Note: the following list provides examples of potential uses of SHRP2 data):
|
Table 4 Summary of Seven HSIS Databases
California |
Illinois |
Maine |
Minnesota |
North Carolina |
Ohio |
Washington |
|||
Who houses and maintains the data? |
USDOT Secure Data Commons (SDC) houses the data which is maintained by VHB under contract with Federal Highway Administration (FHWA). |
||||||||
What is the spatial coverage of the data? |
Statewide |
||||||||
What years of data are in the database? |
1991 to 2021 |
1985 to 2021 |
1985 to 2021 |
1985 to 2021 |
1990 to 2020 |
1997 to 2021 |
1993 to 1996, 1999 to 2020 |
||
What is the general availability of the data? |
Data can be provided in different formats (e.g., Shapefiles, CSV, Excel) via a ShareFile link. The data can be requested by filling out an HSIS data request form online on the HSIS web site. |
||||||||
How are the data collected? How are the data coded? |
Crash and roadway data received from California. Data include roadway, intersection, interchange ramp, crash, and unit files. |
Crash data received from Illinois. Roadway data downloaded from Illinois DOT website. Data include roadway, crash, unit, and person files. |
Crash and roadway data received from Maine. Data include roadway, intersection node, interchange, crash, unit, commercial vehicle, and person files. |
Crash and roadway data received from Minnesota. Data include roadway, intersection, horizontal curve, intersection approach, traffic signal, interchange, lighting unit, lighting systems, roadside barrier, roadside barrier terminal, sign support, crash, unit, and person files. |
Crash data received from North Carolina. Roadway data downloaded from North Carolina DOT website. Data include roadway, traffic signal, interchange, horizontal curve, freeway exit, crash, unit, and person files. |
Crash and roadway data received from Ohio. Data include roadway, horizontal curve, intersection, intersection approach, barrier, lighting, bicycle route, crash, unit, and person files. |
Crash data received from Washington. Roadway data downloaded from Washington DOT website. Data include roadway, horizontal curve, grade, crash, unit, and person files. |
||
Does the database include all crashes for the coverage area or just a portion of the crashes (i.e., a sample)? |
All reported crashes, primarily on the State-maintained system. This varies slightly by State. |
||||||||
How are crash severity levels defined? |
KABCO |
KABCO |
KABCO |
KABCO |
KABCO |
KABCO |
KABCO |
||
What is the vehicle type coverage? |
All vehicle types, distinguished between vehicle type. |
All vehicle types, distinguished between vehicle type. |
All vehicle types, distinguished between vehicle type. |
All vehicle types, distinguished between vehicle make, model, and year. |
All vehicle types, distinguished between vehicle type, make, and model. |
All vehicle types, distinguished between vehicle type, make, model, and year. |
All vehicle types, distinguished between vehicle type, make, model, and year. |
||
If data is just a sample, how was the sampling done? |
NA |
NA |
NA |
NA |
NA |
NA |
NA |
||
If just a sample, what (if any) guidance is given to incorporate the sampling procedure into data analysis? |
NA |
NA |
NA |
NA |
NA |
NA |
NA |
||
To which Tasks (A – D) is the database applicable? |
General: The HSIS database has numerous general applications, as do many of the databases listed in this document. |