Crash safety at signalized intersections, Hierarchical tree based regression, Ordered probit model


Many studies have shown that intersections are among the most dangerous locations of a roadway network. Therefore, there is a need to understand the factors that contribute to traffic crashes at such locations. One approach is to model crash occurrences based on configuration, geometric characteristics and traffic. Instead of combining all variables and crash types to create a single statistical model, this analysis created several models that address the different factors that affect crashes, by type of collision as well as injury level, at signalized intersections. The first objective was to determine if there is a difference between important variables for models based on individual crash types or severity levels and aggregated models. The second objective of this research was to investigate the quality and completeness of the crash data and the effect that incomplete data has on the final results. A detailed and thorough data collection effort was necessary for this research to ensure the quality and completeness of this data. Multiple agencies were contacted and databases were crosschecked (i.e. state and local jurisdictions/agencies). Information (including geometry, configuration and traffic characteristics) was collected for a total of 832 intersections and over 33,500 crashes from Brevard, Hillsborough and Seminole Counties and the City of Orlando. Due to the abundance of data collected, a portion was used as a validation set for the tree-based regression. Hierarchical tree-based regression (HTBR) and ordered probit models were used in the analyses. HTBR was used to create models for the expected number of crashes for collision type as well as injury level. Ordered probit models were only used to predict crash severity levels due to the ordinal nature of this dependent variable. Finally, both types of models were used to predict the expected number of crashes. More specifically, tree-based regression was used to consider the difference in the relative importance of each variable between the different types of collisions. First, regressions were only based on crashes available from state agencies to make the results more comparable to other studies. The main finding was that the models created for angle and left turn crashes change the most compared to the model created from the total number of crashes reported on long forms (restricted data usually available at state agencies). This result shows that aggregating the different crash types by only estimating models based on the total number of crashes will not predict the number of expected crashes as accurately as models based on each type of crash separately. Then, complete datasets (full dataset based on crash reports collected from multiple sources) were used to calibrate the models. There was consistently a difference between models based on the restricted and complete datasets. The results in this section show that it is important to include minor crashes (usually reported on short forms and ignored) in the dataset when modeling the number of angle or head-on crashes and less important to include minor crashes when modeling rear-end, right turn or sideswipe crashes. This research presents in detail the significant geometric and traffic characteristics that affect each type of collision. Ordered probit models were used to estimate crash injury severity levels for three different types of models; the first one based on collision type, the second one based on intersection characteristics and the last one based on a significant combination of factors in both models. Both the restricted and complete datasets were used to create the first two model types and the output was compared. It was determined that the models based on the complete dataset were more accurate. However, when compared to the tree-based regression results, the ordered probit model did not predict as well for the restricted dataset based on intersection characteristics. The final ordered probit model showed that crashes involving a pedestrian/bicyclist have the highest probability of a severe injury. For motor vehicle crashes, left turn, angle, head-on and rear-end crashes cause higher injury severity levels. Division (a median) on the minor road, as well as a higher speed limit on the minor road, was found to lower the expected injury level. This research has shed light on several important topics in crash modeling. First of all, this research demonstrated that variables found to be significant in aggregated crash models may not be the same as the significant variables found in models based on specific crash types. Furthermore, variables found to be significant in crash type models typically changed when minor crashes were added to complete the dataset. Thirdly, ordered probit models based on significant crash-type and intersection characteristic variables have greater crash severity prediction power, especially when based on the complete dataset. Lastly, upon comparison between tree-based regression and ordered probit models, it was found that the tree-based regression models better predicted the crash severity levels.


If this is your thesis or dissertation, and want to learn how to access it or for more information about readership statistics, contact us at

Graduation Date





Abdel-Aty, Mohamed


Master of Science (M.S.)


College of Engineering and Computer Science


Civil and Environmental Engineering

Degree Program

Civil and Environmental Engineering








Release Date

August 2004

Length of Campus-only Access


Access Status

Masters Thesis (Open Access)


Dissertations, Academic -- Engineering and Computer Science; Engineering and Computer Science -- Dissertations, Academic