Ranking and selection of motor carrier safety performance by commodity.

We use recent safety performance data to rank US motor carrier commodity segments (e.g., Tank segment or Produce segment) in terms of several driver-related, vehicle-related, and crash-related safety measures. Ranking and selection inference techniques are used to determine the best and worst performing commodity segments at the 95% confidence level. The results are mixed, however the Passenger segment is generally best, while the Produce, Intermodal, and Refrigerated segments tend to be worst.


Introduction
The primary mission of the Federal Motor Carrier Safety Administration (FMCSA) is to prevent commercial motor vehicle fatalities and injuries. In an effort to better understand the diverse nature of the commercial motor carrier industry and explore safety and operational differences among its major commodity segments, the FMCSA undertook the Motor Carrier Industry Profile Study (MCIPS). The study uses the Motor Carrier Information Management System (MCMIS) and the Motor Carrier Safety Status Measurement System (SafeStat) as its sources for safety and operations data on commercial motor carriers. The MCMIS and Safe-Stat are maintained by the FMCSA and are populated with carrier-level data from roadside inspections, FMCSA and State compliance reviews, crashes, and enforcement cases against motor carriers. The data are used to track carrier-level safety performance, based on several different measures of driver and vehicle performance and compliance.
Using the MCIPS data, this paper explores the following important questions: (1) which commodity segments of the commercial motor carrier industry perform best and worst in terms of safety, and (2) which safety performance measures are the most and least reliable for making this assessment? The policy implications are important: if there exist significant safety performance differentials across segments of the industry, then motor carrier safety regulations and enforcement could be tailored specifically to address the poorest performing segments. We develop safety performance estimates across commodity segments, rank them by segment, and then apply the theory of ranking and selection to determine (in a probabilistic sense) which of the segments are the most and least safe. 1 Generally, we find that while there are significant safety performance differences across some commodity segments, most of the perceived differences may be statistical noise. We also find that some safety measures are less reliable than other in terms of their statistical precision.

Data
The Motor Carrier Management Information System (MCMIS) was the primary source of data used in this study. The MCMIS is used by FMCSA to maintain a comprehensive safety record of property and passenger carriers subject to the Federal Motor Carrier Safety Regulations (FMCSR). The MCMIS Census File contains records from over 500,000 entries (e.g., motor carriers, hazardous materials shippers, and registrants), and contains information on each company's identity (name, address), operations classification (type of business), cargo classification (segment), and numbers of trucks and drivers within the company.
The MCMIS data set also contains an Inspection File, which contains the results of roadside inspections (submitted by States), enforcement actions (taken by Federal personnel against a motor carrier), and compliance reviews (conducted by FMCSA and State safety investigators). Enforcement actions may include civil penalties or out-of-service (OOS) orders placed against a carrier. (The majority of driver OOS violations pertain to noncompliance with the hours-of-service rules.) Compliance reviews are on-site reviews of a motor carrier's operations, conducted by FMCSA and State personnel to determine the level of compliance with the FMCSR. The MCMIS data set also includes a Crash File, which contains data from State police crash reports electronically transmitted to FMCSA. 2 The census, inspection, and crash data are reviewed and updated as new information is collected by FMCSA on a motor carrier, whether through inspections, compliance reviews, enforcement action, or reportable crashes.
This study also uses data produced by the Motor Carrier Safety Status Measurement System (SafeStat), a powerful analytical tool developed by FMCSA and the Volpe National Transportation Systems Center to accurately identify and monitor high-risk motor carriers within the overall motor carrier population. SafeStat became operational in 1995 and uses MCMIS data as input to evaluating a motor carrier's relative safety fitness. (Our data goes up to and includes the year 2000.) SafeStat incorporates current on-road safety performance data with on-site compliance review data collected by FMCSA and State safety investigators to provide a comprehensive evaluation of a motor carrier's safety performance using four types of data: accident, driver, vehicle, and safety management. Safety event data are normalized to account for a carrier's size or amount of exposure using carrier-descriptive data such as number of power units or the number of roadside inspections. Data are also time-weighted, with the most recent events having greater weight than older ones. Crash data are also severity-weighted, based on the number and type of injuries sustained by victims involved in the crashes.
All the SafeStat data serve to measure a carrier's relative safety fitness and assess its risk of having future crashes. It should be noted that not all motor carriers contained in the MCMIS data set are analyzed by SafeStat. To be assessed by SafeStat, a motor carrier must have experienced at least three inspections within the past 30 months. As such, the number of carriers with SafeStat data (e.g., those analyzed in this study) is significantly smaller than the total population of carriers contained in MCMIS. More importantly, the results of this study are only relevant for the subpopulation of motor carriers that have had at least three inspections within the past 30 months; they are not valid for the entire population.
The final data used in this study are at the carrier level. Each observation consists of an individual carrier identification number, carrier name and contact information, several SafeStat carrier-level safety measures along different metrics, and the commodity segments in which the carrier operates. A single carrier can operate in several different (or multiple) segments, even though the safety measures in the data are for the entire fleet. The data also differentiate carriers into two distinct sectors: the For-Hire sector and the Private sector. Generally speaking, the For-Hire trucking sector is comprised of carriers whose primary business it is to haul property or passengers for another person or company in exchange for direct compensation. The Private trucking sector is generally comprised of carriers whose primary business is not trucking, but who choose to haul their own property or passengers.
We examine the recent safety performance of eleven For-Hire and ten Private commodity segments of the motor carrier industry. The ten Private segments examined are: Building Materials, Bulk Freight, Refrigerated (nonproduce), General Freight Truckload, Household Goods, Intermodal, Large Machinery, Passenger, Produce, and Tank. The 11 For-Hire segments examined include the same ten commodity segments, plus the "General Freight Less-Than-Truckload" segment. While most of the commodity segments are self-explanatory, two are less obvious. Bulk freight is an aggregation of the grain and feed, coal and coke, and dry bulk segments. "General Freight Less-Than-Truckload" carriers collect small shipments from local pickup points, move them between terminals as truckload shipments, and break them up at the destination terminal, where the shipments are then made to individual destination points.
Our analysis uses nine safety measures. Safety performance is evaluated with two driver-related measures: driver safety evaluation area (SEA), and driver out-of-service (OOS) rate; two vehicle-related measures: vehicle SEA, and vehicle OOS rate; three crash-related measures: accident SEA, fatal crash rate, and total crash rate; and two safety management performance measures: Safety Management Review Measure (SMRM), and Enforcement Severity Measure (ESM). A complete description of safety measures is in the data Appendix A.

Safety estimation
Let represent an observation of SafeStat safety measure k (k = 1, . . . ,K) for motor-carrier j (j = 1, . . . , J) in no particular motor-carrier segment. K equals the number of safety measures under consideration (we consider nine such measures). J is the total number of carriers with at least three inspections within the last 30 months (J is approximately 94,000 in the For-Hire sector, and approximately 55,000 in the Private sector). Let i = 1, . . . ,M index commodity segments (this study is limited to M = 10 segments in the Private sector and M = 11 segments in the For-Hire sector). Notice that is not indexed by commodity segment, so SafeStat compiles safety measure k at the carrier-level but not at the segment-level within carriers. Since this study assesses safety performance at the segment level, our ideal safety measures would be complied by segment and by carrier. Unfortunately, safety measures are available at the carrier-level only (without regard to the segments in which they operate). Consequently, let be an indicator variable equal to 1 if the jth carrier operates in segment i (i = 1, . . . ,M), equal to "0" otherwise. Then, for a given carrier j, the number of segments in which it operates is non-negative integer: The indicator variable makes the multiple-segment nature of carriers explicit. Based on this notation, the number of observations of safety measure k in segment i is Then a measure of average safety performance of segment i for safety measure k is The average implies the sample variance measure: The segment safety measures , imply rank statistics for each safety measure k: Unfortunately, the measure is potentially biased, because it does not account for the effects of multiple-segment carriers in the data. To see this one need only recognize that it is based on the carrier-wide measure which ignores segment. If a carrier performs well in one segment (A) and poorly in another (B), then this difference is masked in the data, and its performance in segment A is understated while its performance in B is overstated. This is less of a problem if a carrier's true safety performance is not segment specific, but more of a carrier-level phenomenon. The data set did contain quite a number of multiple-segment carriers. The For-Hire sector had about 60,000 (64%) operating in a single commodity segment, while the Private sector had about 42,000 (about 78%). The rest operated in multiple segments. For example, the For-Hire sector had about 16,000 carriers (17%) operating in two segments while 9000 (9%) operate in three. These numbers imply that multiple-segment carriers may be a significant percentage of the data set and should be addressed in the assessment of safety across segments. 3 (Note: There were also a number of carriers with no segment designation; these carriers were ignored in the analysis.)

Ranking and selection procedures
Ranking and Selection Theory is attributed to Gupta (1965), and it is a subset of a larger body of statistical inference procedures called "Multiple Comparison Procedures." The technical Appendix B provides some background information on what follows, however the interested reader is referred to this literature for a complete understanding of the theory and proofs.
Let λ be a value on the unit interval (0, 1), then λ can be thought of as an inferential error rate and (1−λ)×100% a confidence level. For example, for this study we select λ = 0.05, so that our inferential confidence level is 95%. Define the set of all segment indices {1, . . . ,M}k for safety measure k. Ranking and Selection Theory defines two subsets from the set {1, . . . , , denoted and , each containing some subset of segment indices from {1, . . . , . Call the "subset of the best," and the "subset of the worst." Then, with probability at least 1 − λ, the set contains the best motor-carrier segments for safety measure k, and with probability at least 1 − λ, the set contains the worst motor-carrier segments for safety measure k.
Therefore, motor carrier segments that fall into the subset of the best can be deemed "best," and those in the subset of the worst can be deemed "worst" at the (1 − λ) × 100% confidence level for each safety measure. These probability statements are extremely powerful. They allow us to better understand the significance of the segment-by-segment rankings for each safety measure. It should be noted that these subsets will generally contain more than one segment (but not always). It should also be noted that as λ decreases, the probability associated with each statement increases. For this study we select the standard λ = 0.05 implying that the probability statements are made at the 95% confidence level. At larger values of λ, the confidence level decreases and the inference procedure can produce a single segment in the each of the subsets. Therefore, confidence level experiments could be conducted to single out one best and one worst segment for each safety measure, k.

Results
The results for the best and worst subsets for for both the For-Hire and Private sectors are in Tables 1 and 2, respectively. In the tables the columns contain the segments (i) and the rows contain the safety measures (k). For each safety measure the cells are populated with "B" if the segment was contained in the subset of the best with 95% confidence, with "W" if the segment was contained in the subset of the worst with 95% confidence, and with "B, W" if it was contained in both. Blank designations imply that the segment was neither "best" nor "worst" for that measure.
In Table 1 (For-Hire), for the driver SEA measure (first row) the single best segment was the Passenger segment and the single worst was the Refrigerated segment. This is a strong inference statement because it identifies single segments as the safety extrema at the 95% confidence level. This is also the case for the driver OOS rate and the vehicle SEA measures. In Table 1, the only occurrence of a "B, W" classification is for the Enforcement Severity Measure (ESM) in the "General Freight Less Than Truckload" segment. This occurred because the measure was only based on = 19 observations, so the precision of the inference for that measure is suspect; there is not enough data for the ESM measure to be statistically meaningful (in a ranking sense). In Table 1, the results are generally good because the sample sizes tended to be large. Overall, the Passenger and the "General Freight Less Than Truckload" segments performed the best, being in the best subset six times each. The worst segments were the Intermodal and Produce segments, being in the worst subset four times each. The least reliable measure for ranking the segments was the fatal crash rate and the ESM, which tended to have multiple segments in the best and worst subsets.
Turning to the Private sector results of Table 2, the inference is less sharp, due to generally smaller sample sizes than in the For-Hire analysis. For example, in the driver SEA category, the best segment was the Tank segment, but the worst subset consisted of five segments (Household, Intermodal, Large Machine, Passenger and Produce), but again this may be due to small sample sizes (e.g., Intermodal only had = 166 observations). The driver OOS rate is an interesting case here. Notice that the segments with the lowest (best) safety scores were Tank (6.43), Bulk Freight (8.33),and Intermodal (8.75). However, the inference only selected Tank and Intermodal to be best at the 95% level. Why? Tank was selected because it had by far the lowest score. The Bulk Freight and Intermodal segments had similar scores, but only Intermodal was selected because it was measured with less precision than Bulk Freight (the Bulk Freight measure was based on = 10, 061 observations, while the Intermodal measure was based on = 261), implying that the inference could not reject the hypothesis that the Intermodal driver OOS rate was the smallest. This makes clear the dangers of drawing conclusions on rank statistics without statistically valid inference procedures.
Again the fatal crash rate and the ESM tended to be the least reliable safety measures in terms of discerning a best and worst segment. However, the Safety Management Review Measure (SMRM) was also fairly unreliable. All three measures had a multiplicity of segments that were both in the best and in the worst subsets, and often suffered from small sample sizes within certain segments. For example, in the ESM category the Passenger, Intermodal and Household segments only had = 7, 12, and 29 observations, respectively. The sharpest overall inference was for the driver OOS rate, vehicle SEA, vehicle OOS rate and the accident SEA. The only safety measures that produced a single best or worst segment were the driver SEA, which determined that the Tank segment was the single best, the accident SEA, which determined that the Refrigerated segment was the single worst, and the total crash rate, which determined that the Passenger segment was the single best. The Household Goods segment was best six times but worst five times. The Passenger segment was best four times but worst six times. The Intermodal segment was best five times, but worst five times.

Conclusions and future research
This inferential study has highlighted the limitations and the strengths of the MCIPS safety measures in various motor carrier segments, and, perhaps more importantly, it has highlighted the strengths and weaknesses of the data itself. Without rehashing the specific results of Section 5, some strong conclusions have surfaced. First, the subsample of MCIPS data used in the analysis only contains carriers who have had at least three inspection in the last 30 months, therefore the results are only interpretable for this subpopulation of carriers. Second, rank statistics by themselves need to be interpreted with caution. This is not to say that the segment-by-segment ranks are wrong, but it is to say that inferential procedures are necessary to get a true sense of which segments are best and worst. If identifying the best and the worst performing segments is important, then it is even more important to recognize that there may be more then one best or worst segment along any safety metric, even though the point estimates suggest otherwise.
It may also be useful to reexamine the crash data results (fatal and total crash rates), since crashes are clearly important safety measures for policy and enforcement. These data tend to have relatively smaller sample sizes than other measures. While increasing the sample sizes of crash data is clearly not feasible, perhaps the information provided in the SafeStat data set for crashes should be expanded to improve the accuracy and sharpen the inference for these measures. One obvious improvement would be to provide information on what trucks are carrying when crashes occur; this would allow disaggregating of the data along segments, mitigate the multiplesegment carrier problem, and potentially improve the accuracy of the results.
Concerns over the effects of multiple-segment carriers must be meaningfully addressed. For instance, the study could be repeated for single-segment carriers only. However, the smaller sample sizes might make the inference less sharp. Alternatively, methods that quantify and control for "multi-segment carrier effects" in the data themselves could be developed. Currently, the data do not admit this quantification, so perhaps the data collection process could be altered to make this feasible.