TESTING THE MIS STRUCTURE

The previous section contained a broad overview of the conceptual framework for the MIRE MIS. This section contains an overview of a specific prototype of a MIRE MIS that the project team developed and tested. The project team developed a prototype MIRE MIS based on the proposed conceptual structure to further explore the specifics of what would be involved in developing an MIS. The prototype expands upon the section "Central Data Validation and Entry – Develop MIRE MIS application and relational database" in the previous chapter. This prototype was a relational database built around the structure of the MIRE data dictionary.

The project team developed the prototype using data from the New Hampshire DOT (NHDOT). This section provides an overview of the prototype development and testing effort. Appendix C provides the ERD for the logical model. The project team built the MIRE MIS prototype to test the concept behind the MIRE MIS with the following goals:

Advance the MIRE MIS concept by building a trial system to better understand the challenges that States would face in a full implementation.
Develop a logical model for the MIRE Version 1.0 specification that can be used to build a relational database to store MIRE roadway data.
Populate the database with real data collected by NHDOT to evaluate the process of inputting data into the system.
Extend the database structure to incorporate supplemental data, which is not part of the MIRE definition.

The project team made several decisions on the technical approach that would impact the resulting MIS. The key decisions that were made include:

Using a standard database platform. The project team built the prototype MIS on the Microsoft Structured Query Language (SQL) platform for NHDOT. The system uses standard structures and does not utilize specialized or proprietary tools or functions within the platform. This was done to ensure that the MIS structure can be exported easily to another platform for further development.

Building the database around MIRE Version 1.0. The project team developed the structure of the relational database for the prototype to match the data elements to the structure defined in the MIRE Version 1.0 specification, rather than tailoring the database to existing data that NHDOT had previously collected. This is a fundamental decision that will need to be addressed in any implementation of the MIRE MIS that has practical implications on the functions that the system will provide, how the data will be maintained and updated, and the usefulness of comparisons of data from State to State.

Taking a single snapshot of the imported data. One of the challenges in developing a MIRE MIS is working with roadway and safety data that is collected at different points in time. Building a system to store data and track the time of collection is not difficult, though an agency would need to account for changes to the structure of data. More important is that the referencing system that assigns a unique identification to each roadway element (e.g., segments, intersections, ramps, etc.) can change from year to year. Each data source in the system can also use its own referencing system to locate elements. The project team built the MIRE MIS prototype for NHDOT on a single snapshot of data to narrow the development focus. Using a single snapshot of the data ensures that all data in the prototype uses the same referencing system. When working with data in the prototype system, the results are certain to be internally consistent. This makes it easier to analyze the structure of the system and the capabilities of this structure. This approach limits the ability to perform analysis on data from different years as the prototype does not have a means to rectify data from one referencing system to another.

Extending the database to include supplemental data. The MIRE MIS stores roadway inventory, traffic, and crash data along with other data pertaining to roadway safety. Supplemental data collected by NHDOT are listed in the MIRE Version 1.0 Report. These supplemental data were incorporated into the MIRE MIS prototype; a list of these supplemental data is included in the next section. These data were kept in the structure that NHDOT provided to the project team.

The purpose of this prototype was to develop the database model and attempt to import an existing data set into this structure. The conceptual framework identifies a list of functions that an agency should include in a fully developed MIS. Some of the functions listed in the previous section were not included in the prototype. These include:

Service Levels/Performance Indicators.
User interface for data entry and validation.
Electronic file transmission and automated data import.
Digital Imaging.
Mapping Tools.
Data Retention and Archiving.
Data Analysis Tools.
Spatial Analysis Tools.
User Assistance Modes.

Further exploration of these functions should be considered for future research.

SOURCE DATA

The project team met with the NHDOT to determine what datasets at NHDOT contained MIRE data elements and supplemental data types. The NHDOT identified several data sets and provided them to the project team. These included:

NHDOT Roads – roadway segments inventory data.
NHDOT Intersections – roadway intersection inventory data.
Crash data.
Bridge data.
Roadside fixed objects.
Signs.
Railroad grade crossings.

DATABASE DEVELOPMENT

This section provides the principles, assumptions, and decisions the project team used to transform the conceptual model to the logical model. The logical model then served as the basis for the prototype for NHDOT. The relationship between the conceptual and logical models is described in Simsion and Witt's Data Modeling Essentials. According to Simsion and Witt, "The conceptual data model is a (relatively) technology-independent specification of the data to be held in the database. It is the focus of communication between the data modeler and the business stakeholders, and is usually presented as a diagram with supporting documentation. The logical data model is a translation of the conceptual model into structures that can be implemented using a DBMS. Today, that usually means that this model specifies the tables and columns." (8). The initial MIRE MIS conceptual model fits the above description well. It closely matches MIRE Version 1.0, but adds another layer of organization clearly showing how the various groups of data elements are related. In the logical model, the project team attempted to translate the conceptual model into the structures of a relational database. This process includes normalizing the data by removing repeating groups and removing redundancy. It also includes defining lookup tables and selecting data types, as described in the following paragraphs. To avoided proprietary features implemented by any individual relational database management system (RDBMS) vendor, the project team built a working database from the model in Microsoft SQL Server. This process should be straightforward for any current RDBMS. In performing the translation the project team used established normalization methods, substituted lookup tables for most of the various lists contained in MIRE Version 1.0, and made several assumptions. These steps are documented in the following sections.

Removing Repeating Groups

MIRE Version 1.0 has a few instances of repeating groups. The rules of relational normalization require these repeating groups be removed by putting them in their own tables. An example of a repeating group would be MIRE element 17, "Coinciding Route – Minor Route Information," within the Roadway Segment Descriptors. Specifically, the note "Additional elements may be needed to handle instances of more than one coinciding minor route" indicates that this data element is a repeating group (a group could consist of a single element). Thus, it was necessary to create a table specifically for the coinciding routes. Another example is the possible presence of some unspecified number of roads in an intersection. Element 122 is "Location Identifier for Road 1 Crossing Point." Element 123 is "Location Identifier for Road 2 Crossing Point." Element 124 is "Location Identifier for Additional Road Crossing Points."

Removing Redundancy

Normalization requires removing redundancy. An example of redundancy is MIRE elements 1, "County Name" and 2, "County Code." To remove this kind of redundancy, the project team added the County table, where every county is uniquely identified by its county code. This table contains the name of the county, so that it is not included in each roadway segment. Instead, the Roadway Segment Inventory table contains a foreign key, County Code, which points to the County table and permits retrieving the county name from there as needed. There were not many redundancies, but the project team removed them where they existed. It should be noted that this normalization was part of the logical model, not specific to any single implementation. NHDOT does use its own county code rather than the Federal Information Processing System (FIPS) as recommended in MIRE.

Lookup Tables

The MIRE data dictionary provides suggestions for possible values (i.e., attributes) for each of the MIRE elements. Some of these are general such as "numeric", but there are many elements with a defined list of possible attributes. An example is element 19, "Functional Class" with listed attributes of Interstate, Principal arterial other freeways and expressways, Principal arterial other, minor arterial, major collector, minor collector, and local. In keeping with standard practice for logical data modeling, the project team developed lookup tables for each element that includes a list of attributes. Each lookup table has a PK. The referencing table, such as Roadway Segment Inventory, contains an FK that points to the lookup table. In most cases, the project team used an arbitrary integer for the key. This is known as a surrogate key. Surrogate keys should never be exposed to the user. In each case, the project team allowed null values in the FK columns to accommodate missing, unknown, or non-applicable data. This eliminates the need to use "dummy" values for missing data.

Reports that provide descriptions can be included in the lookup tables. Additionally, manual data entry screens provide users with a display list from which to choose. This approach allows the main tables to be smaller, since they do not have to physically contain long descriptions such as "Principal arterial other freeways and expressways." It also avoids the necessity of hard-coding these lists into data-entry interfaces. The use of lookup tables also allows States to customize the lists to their own requirements. For example, there are some data elements where the attributes used by NHDOT differs from the attributes given in MIRE Version 1.0.

Data Types

Database management systems provide various data types, such as fixed length character strings, variable length character strings, dates, and a variety of numerical data types. The numerical data types include several sizes of integers and types for representing non-integer numbers. Each data element (column in the database) must have a data type assigned to it. The project team selected data types that will be useful to most States. For data elements measured in miles the project team used "decimal", which will handle a precision of 0.001 mile. For the length of the description fields (character strings) in lookup tables, the project team used the length of the longest value in the list, generally rounded up to the next 10 characters. In cases where there was no predominant reason for a particular choice, the project team used a data type that would work with NHDOT's data. As individual agencies begin to implement the MIS, they will need to make changes to data types to accommodate their existing data.

Assumptions

There are quite a few data elements where there could be multiple values (e.g., repeating groups that would have to be removed), but this is not mentioned in MIRE Version 1.0. As a result, the project team incorporated the implicit assumption that there will only ever be one:

Auxiliary Lane per Roadway Segment (elements 35 and 36).
HOV Lane per Roadway Segment (elements 37 and 38).
Bicycle Facility per Roadway Segment (elements 40 and 41).
Median Type per Roadway Segment (elements 54 and 55).
On-Street Parking Type per Roadway Segment (element 99).
Left Turn Lane Type per Intersection Approach (element 145).
Approach Traffic Control per Intersection Approach (element 154).

Short of breaking these out into their own tables, agencies could provide guidance to data collectors and data entry personnel on which value takes precedence if there are two or more.

Additional assumptions include:

An identified route has the same Route Number, Route Type Code, and direction over its entire length, so this does not need to be repeated for each segment and intersection.
Each intersection approach will be associated with no more than one roadway segment.
It is possible to identify one primary route for each segment or intersection, and all others are considered "additional." In other words, it is not necessary to identify a "secondary" coinciding route.

Accommodations for NHDOT Data

In some cases it was necessary for the project team to make decisions to accommodate NHDOT, as the initial database was built for them. The project team used an integer rather than a five-character identifier for element 2, "County Code" because NHDOT uses its own county list rather than the Census Bureau's list. Similarly, the project team used an integer for element 7, "City/Local Jurisdiction Urban Code" for the same reason. The project team also added a lookup table for NHDOT's Statewide Route Inventory.

Begin and End Point Segment Descriptors

MIRE Version 1.0 lists several different ways an agency could identify element 10, "Begin Point Segment Descriptor" and element 11, "End Point Segment Descriptor." For technical reasons, no single column in a database could accommodate all of these data types. To accommodate these elements the project team created two columns for each Segment Start Node and Segment End Node and defined them as character fields, all of which link to NHDOT's GIS data. Segment Start Mile and Segment End Mile are numeric fields and refer to milepoints on the segment's primary route.

DATA IMPORT

NHDOT provided several data sets for inclusion in the MIRE MIS prototype. These include roadway inventory and crash data that NHDOT had previously collected, as well as intersection data the project team collected as a part of the Lead Agency Program. The project team also included supplemental data sets with information about bridges, rail crossings, and lighting. Some of the data supplied by NHDOT includes multi-year data (either multiple years for the same data set or different data sets that were collected in different years). Data collected in different years may correspond to a different referencing system, which makes it impossible to build a dependable correlation between data sets. The project team added a column to denote data collection year to ensure that data is only joined to other data from the same year.

The project team examined each of the data sets to determine which data corresponded to elements in the MIRE definition and the MIRE MIS data model. Data that mapped to the MIRE model were transformed as needed to rectify any differences between the structure of the source data and the structure developed for the MIRE database. Supplemental data which did not correspond to elements in the MIRE model were added to supplemental tables in the MIRE MIS prototype. These supplemental tables used the same structure as the source data provided by NHDOT.

Data Mapping and Transformation

The data from NHDOT came in several formats (e.g., Microsoft Access, DBase files). The project team extracted these data from NHDOT's GIS systems. NHDOT provided a data dictionary for their roadway segment inventory and the intersection data collected under the Lead Agency Program.

The roadway segment data provides the foundation for correlating all of the data in the MIRE MIS prototype. These data provide a spatial reference of the State and local roads included in the system. This reference is a subjective reference in that each segment does not include an absolute spatial reference (e.g., xy coordinates, latitude/longitude). Instead, segment location is described by relating to other data elements (nodes, routes, mileposts). A point along a roadway is described by identifying the roadway segment, then specifying a linear distance along that segment. This referencing system is specific to the year that the data were collected, and this system corresponds to some of the other data collected in that period (NHDOT updates their segment data on a three year schedule). While most of the segments do not change from one collection period to the next, the unique segment identifiers are re-assigned, and it is not possible to relate data from different years using the segment ID. Starting with the roadway segments, the project team examined each dataset and compared it to the structure of the MIRE MIS logical model. The project team mapped elements to a corresponding element in the MIRE structure. The project team used this mapping to import data to the system, and was able to transform the source data into the structure of the prototype system. Any data that the project team could not map to a corresponding element in the MIRE structure were mapped to a supplemental table in the extended MIRE MIS structure. Some of the data collected by NHDOT are mapped to the roadway segments using the unique segment identifiers. Other data referred instead to routes (a construction of one or more connected segments), that needed to be mapped to the corresponding roadway segments. These data needed to be compared to the roadway segment data to confirm that the references were valid. Records that could not be matched to a valid location (through matching a segment ID or spatial analysis) were identified as orphaned records. These data were imported into the MIRE MIS, but location data was stripped out to prevent any false results. Other data that did not use roadway segment ID's to reference location were loaded into a GIS to correlate the data using spatial analysis. The GIS was able to identify when an element in the source data matched a segment in the roadway segment data set, and map the segment ID when the data were imported. Data that did not match a segment ID were flagged as orphaned records and were added to the MIS without a location reference.

FINDINGS AND RECOMENDATIONS

MIRE MIS is a fairly large and complex system and, like other similar systems, has a number of challenges. Some of these challenges include:

The quality of the data is variable and often difficult to assess and/or correct.
It is difficult to foresee all the requirements for the system before building it.
Development cycle could be lengthy enough that requirements will change significantly during development.
Cooperation of disparate organizations is required.

Many more challenges could be listed. This section will not discuss generic factors that influence the feasibility of implementing a large and complex MIS in general. Instead, this section will focus on issues that, if not unique to MIRE MIS, at least have some special significance for this system in particular.

Implementation Scenarios and their Effect on Feasibility

Factors affecting feasibility will come into play in varying degrees depending on how an agency implements the system. In deciding how to implement the MIRE MIS, agencies should consider several issues, including:

Existing systems and data.
Budget.
Schedule.
Perceived value of various portions of the MIS.
Experience and skills of in-house IT teams and/or contractors.

Based on these issues, agencies should select implementation scenarios which could include implementing all, or part, of the MIRE MIS recommendations. An agency might attempt an all-at- once approach or a gradual, incremental approach. Most agencies would be expected to attempt the relatively low-cost, low-risk, and high-return pieces first.

The project team defined the three categories of implementation scenarios, in order of probable complexity and cost.

Scenario 1: MIRE MIS is largely a stand-alone system. Data are imported periodically, maybe once a year for analysis and planning the next year's work. In this scenario, all data could be imported to MIRE MIS so that the MIS is the sole source for exporting to external tools, such as SafetyAnalyst. Alternatively, MIRE MIS data could be combined with data from one or more other systems to create these outputs. MIRE MIS could also be used directly for some analysis.
Scenario 2: MIRE MIS is the primary system for input and analysis of inventory, crash, and supplemental data. There could be some elements that are still imported periodically. There could be other operational systems obtaining this information from MIRE MIS. These other systems would also have additional data elements not included in MIRE. In this scenario, MIRE MIS would still be the source for exporting data to external tools.
Scenario 3: MIRE MIS is part of an integrated set of systems performing data analysis, spatial analysis, and operational functions. In this scenario, MIRE MIS is connected to other systems in real time or near real time.

MIRE MIS as a stand-alone system is the easiest and fastest scenario. At a minimum, it requires only the database, import functions, and export functions. Of these, the import functions will present most of the challenges. If an agency is to use MIRE MIS for analytical purposes, it would be necessary to create user interfaces, reports, etc. Assuming the database and the import routines are sound and thorough, the interfaces and reports should present no specific challenges beyond what one would find in any system of comparable scope and complexity. In this scenario, MIRE MIS could be implemented in small increments, based on what data are most readily available. MIRE MIS as the primary system will have additional requirements and benefits compared to the stand-alone version. Specifically, it will require user interfaces for data input. While the interfaces will almost certainly include traditional data entry screens, there could be additional data entry modes. Note that in this scenario, the initial data load would probably come from existing systems, so import routines would still be required. MIRE MIS as part of an integrated set of systems could be extremely challenging to implement. MIRE MIS would probably include data elements that are not in the other systems, and would need data input and validation for those elements. Two-way data exchange with the other systems could require robust and complex connections on a network. The integration would mean that any changes to any of the member systems would have to be examined for their impact on all. Again, import routines will be needed at least for the initial data load.

Multiple Organizations

Implementing the MIRE MIS involves cooperation across many organizations, particularly in the area of data collection. Some of these organizations' missions may align closely with the objectives of MIRE MIS, some hardly at all. Quality and comprehensiveness of the data will vary among the organizations. The data may be contained in a variety of organization- or task- specific formats not easily imported. Returning bad data to the issuing organization for correction may be difficult. Correlating the data will be challenging; especially, if different organizations use different ways of identifying items (e.g., locations by milepoint, GIS, latitude/longitude, etc.). Error correction and other workflows involve the movement of data, images, etc., across multiple systems owned by different jurisdictions with different priorities, goals, and standards. Political issues related to authority and responsibility will likely arise and have to be dealt with. Agencies should set up conflict resolution protocols to help deal with potential issues.

Data Import

Importing data from existing systems is likely to be a significant challenge under any of the implementation scenarios. While this challenge is not entirely MIRE-specific, the project team expects it to be particularly significant in this context. An agency can expect the following conditions:

Actual data does not match its documentation.
There are conflicts within the data, whether from different sources or from the same source.
The date when the data were applicable is often not known.
Data are known to be incorrect, or cannot be verified.
Data in some proprietary formats may be difficult to transform into an intermediate format for transfer.
There are invalid references (e.g., FK values that do not exist in the referenced table).
Data contain dummy values that require special handling.
Different data sources use the same name for different things, or different names for the same thing, or they understand data differently. An example of the last case would be where one source application uses one list of road surface types, while a different source uses a different list.
There are cross-jurisdictional disagreements over characteristics of the data, such as validity, meaning, value, and ownership.
It is difficult or impossible to resolve data problems as they are discovered during the validation task of the import process.
Undefined or incorrectly defined business rules make it difficult to validate data.

Temporal Aspect of Data

Obviously, inventory changes over time as roads are widened, signals added to intersections, and so on. Depending on an agency's data collection procedures, there might or might not be the possibility of tracking changes to an inventory item over time. If an agency collects roadway segment data in 2007 and again in 2012, it may not be possible to identify the same segment in both datasets. This can make it difficult to determine the effect of a given roadway improvement. It can be difficult to analyze the correlation between roadway conditions and crashes. For example, if an agency knew that a given segment had no rumble strips on March 1, and did have rumble strips on September 1 (but did not have specific information on the installation date), and there is a crash in July, the agency cannot know whether the rumble strips were present at that time. The agency cannot determine whether that crash happened on a roadway segment with or without rumble strips.

Multiple Segmentation Methods

The project team encountered multiple segmentation methods in NHDOT's roadway segment data. The segments for pavement data are shorter than the segments used for other data. In addition, a single pavement segment might not be entirely contained within a single "regular" segment.

In itself, supporting multiple segmentation methods should not be very difficult. Agencies could use several roadway segment inventory tables, one for each method, or an additional column in the primary key indicating segment type. The problem, however, will be correlating them. Given a specific locus, such as a node, grade crossing, or crash, it could be difficult to identify all applicable segments.

Spatial Analysis and GIS Integration

The datasets used to evaluate the feasibility of implementing an MIRE MIS originated in a GIS format from multiple bureaus at NHDOT. Specifically, the attribute data associated with each dataset was imported into the MIRE MIS, but the spatial component of these datasets was not incorporated into the MIS. Within the MIRE MIS database, the GIS attribute information was restructured to match the data dictionary elements defined by MIRE. Currently, there is a disconnect between the management of the spatial data associated with each feature (geometry of the roadway segment), and the corresponding database elements. For example, if a roadway segment record in the MIRE MIS is modified or deleted there is no direct link back to the spatial component for that record.

Other factors affecting the use of the GIS data in the MIRE MIS relate to the design of the GIS input data. For example, the NHDOT road inventory database is designed around a linear referencing system and dynamic segmentation. Linear features (roadway segments) can be split/segmented (in some cases split into many tiny segments) based on attribute events located along a roadway without affecting the underlying geometry of the linear features. However, linear referencing does necessarily lend itself useful in the actual MIS because there can be instances where there is not a one-to-one (PK) relation between elements. For example, the pavement management data is stored in event tables that are linked to the GIS roadway segments using the GIS Unique Identifier. However there can be multiple pavement condition ratings for any given segment, which are identified based on the begin/end mile post of a roadway segment.

Based on the work completed to date, further research is needed to integrate the MIRE MIS data structure to the geographic/spatial roadway and traffic segments. Relationship classes, which define the relationships between spatial objects and non-spatial objects, must be created between the database attributes and the corresponding geometric features in order for spatial analyses to be performed on the information system as a whole. It appears that MIRE MIS could be advanced by integrating additional core principles of GIS into the data management scheme. For instance, in addition to tabular relationships, GIS heavily relies on spatial relationships and topological rules to represent data quality and to promote data analysis.

USA Banner