A Monte Carlo Study of Ranked Efficiency Estimates from Frontier Models

Parametric stochastic frontier models yield firm-level conditional distributions of inefficiency that are truncated normal. Given these distributions, how should one assess and rank firm-level efficiency? This study compares the techniques of estimating (a) the conditional mean of inefficiency and (b) probabilities that firms are most or least efficient. Monte Carlo experiments suggest that the efficiency probabilities are easier to estimate (less noisy) in terms of mean absolute percent error when inefficiency has large variation across firms. Along the way we tackle some interesting problems associated with simulating and assessing estimator performance in the stochastic frontier model.

useful to empiricists interested in assessing relative ranks of technical efficiency. In empirical exercises where the conditional distributions of inefficiency prior to truncation have common variance, the firm rankings based on the conditional mean will be identical to those based on the efficiency probabilities of Horrace (2005). As such, calculating rank correlations with the true inefficiency rankings for each measure reveals nothing about the relative merits of the two approaches.
This paper uses Monte Carlo simulations to compare the precision of the conditional mean estimates and efficiency probability estimates in terms of MAPE. That is, the simulations assess the ability of a firm's conditional mean estimator to serve as an estimate of its (unknown) condition mean; they also assess the ability of a firm's efficiency probability estimator to serve as an estimate of its (unknown) efficiency probability. In particular, the simulations are not concerned with assessing the ability of a firm's conditional mean and efficiency probability estimators to serve as estimates of its unknown technical efficiency (a realization of the error component u in a typical stochastic frontier specification). The simulations also present several complications that underscore the difficulties of efficiency estimation, in general, and that provide insights into the inherent differences of the two estimation approaches. These are discussed in the sequel. We find that the efficiency probabilities are more reliable when the variance of technical inefficiency is large; this is the ''usual'' case in the sense that it is the only time when estimation of inefficiency is at all precise and when it may be even warranted. In addition to the MAPE results, we present mean squared error (MSE) and bias calculations to examine the effects of changes in the variance parameters and sample sizes on the performance of each estimator (in isolation). We also demonstrate that relative efficiency probabilities can be made for any subset of the firms in the sample, where the subset might be selected based on some additional criterion which does not enter into the frontier estimation. (In fact, we use this technique to simplify our Monte Carlo study when the number of firms is large.) The next section reviews the stochastic frontier model and defines the estimates to be studied, including the new subset probabilities. Section 3 contains the Monte Carlo study, and Sect. 4 provides a final discussion of the results and concludes.

Efficiency Estimation
The parametric stochastic frontier model was introduced simultaneously by Aigner et al. (1977) and Meeusen and van den Broeck (1977). Since then, there have been many re-formulations of the basic model. For example, consider the standard linear frontier specification for panel data with time-invariant efficiency: where is productive output or cost for firm in period is a vector of production or cost inputs and is an unknown parameter vector. The are random variables representing shocks to the frontier. Let have an zero-mean normal distribution with variance . The are random variables representing productive or cost inefficiency, added to the cost function representation or subtracted from the production function representation. Let have a distribution that is the absolute value of an zero-mean normal random variable with variance (a halfnormal distribution). Additionally, let the be independent across and across There are more flexible parameterizations of the linear model. For example, Kumbhakar (1990), Battese and Coelli (1992), and Cuesta (2000) considerforms of time-varying efficiency, 2 Greene (2005) considers an extremely flexible model that incorporates firm level heterogeneity in addition to the usual error components. Our selection of the more simple model in Eq. 1 is merely to parallel the model and discussions in Horrace (2005) and should not be construed as a limitation on the applicability of the results that follow. In fact, the inferential procedures detailed herein apply in timevarying efficiency models, in Greene (2005), or in any frontier model where the conditional distribution of efficiency is truncated normal (including the case where the unconditional distribution of efficiency is exponential). In this model per Jondrow et al. (1982), the distribution of conditional on is a random variable truncated below zero. Per Battese and Coelli (1988), the and are: where (The right-hand side of Eq. 2 is for the cost frontier or ''-'' for the production frontier) Parametric estimation usually proceeds by corrected GLS or MLE (e.g. Horrace and Schmidt (1996) for details), yielding estimates Then, defining ;''estimation'' of follows by substituting for in Eqs. 2 and 3. Then, for a log-production function, the usual measure of technical efficiency based on a assumption is the conditional mean: This is the sample equivalent of assuming that substitution of does not change the shape of the conditional distribution (or at least asymptotically). In the next section, we are interested in understanding how precisely estimates and not how precisely estimates 3 Horrace (2005) argues that the point estimate in 4 is ''misleading.'' Granted the shape of the conditional distribution is truncated normal, but it is unrealistic to think that the first moment of an asymmetric, truncated distribution can summarize its entire probabilistic nature. Illustration of this point is the essence of the contributions of Horrace and Schmidt (1996) and Bera and Sharma (1999): the first moment does not adequately summarize efficiency, so one should also quantify the second moment by constructing confidence intervals (Horrace and Schmidt 1996) or calculating the variance of the truncated distributions (Bera and Sharma 1999). Ideally, one might calculate higher moments as well, particularly odd moments, which affect the probability of extreme realizations of inefficiency in clear ways. 4 This suggests that the point estimate, does not adequately account for (or inform our understanding of) the varying shape of the conditional distribution of across firms. Horrace (2005) addresses these shortcomings in by calculating multivariate probabilities conditional on given that the distribution of is truncated (at zero) normal. These probabilities are: Notice that there is room for confusion in the notation. The ''max'' notation in is intended to represent the fact that is ''maximally efficient'', which happens to coincide with being minimal in a probabilistic sense). The ''max'' notation should not be confused with ''maximal '', which is synonymous with ''minimal efficiency''. Similarly, the ''min'' notation in represents the fact that is ''minimally efficient'' in a probabilistic sense). Specifically, the probabilities are given by: where are the probability function and the cumulative distribution function of a distribution truncated at zero, respectively. That is, where is the cumulative distribution function of the standard normal. The probabilities in Eqs. 5 and 6 condense all the information on the relative differences of the distributions of efficiency into a single statement and also account for the multiplicity of the probability statement on maximal (minimal) efficiency, which the conditional mean and conditional variance cannot. In particular, they more adequately capture the effect of the shape of the distribution on the magnitude of a firm's realization of than the point estimates Estimates of the probabilities, follow by substituting estimates into Eqs. 5 and 6. (In the next section, we are interested in understanding how precisely estimate and not how precisely they estimate A useful feature of these probabilities is that they are statements of relative efficiency (efficiency relative to a within-sample standard), whereas the typical efficiency measure, is a measure of absolute efficiency (efficiency relative to an unobserved population standard). Relative efficiency is often empirically relevant, as when the research question is about the most or least efficient firms within an industry. In addition, one may be interested in understanding relative performance among a subset of the sample of firms ; based on a certain information criteria or decision rule. For example, one may be interested in estimating a cost function for a sample of 500 firms, but then only calculating probabilities of maximal cost efficiency for a small subset of the firms with an observable characteristic that is empirically relevant. 5 The probabilities will change as the cardinality of and the membership within this subset changes. Let be the set of all firm indices in the sample, and let the subset of interest be based on some external information or decision rule, Then the probabilities in Eqs. 5 and 6 become: for all These will be different, in general, than the probabilities of Horrace (2005). In fact, the probabilities in Eqs. 5 and 6 are a special case of Eqs. 7 and 8 when is empirically relevant, then probabilities like may be more useful than Also, experiments on the effects of different on the probabilities in Eqs. 7 and 8 may be of particular interest to empiricists. These types of experiments flow more naturally from relative efficiency measures like the probabilities in Eqs. 7 and 8 than they do from absolute efficiency measures like in Eq. 4.
The next section examines the small and large sample performance of the estimates of via Monte Carlo analysis. For each estimate we calculate MSE and bias for various sample sizes, , and various selections of Reliability comparisons across the different measures are made using the unitless MAPE.

Monte Carlo Experiment
The specification used for the experiment is the production function: Following Olsen et al. (1980), we fix the variance of the composed error term to Hence, the individual variances of may be characterized by a single parameter-we use the ratio However, unlike the estimates in Olsen et al. (1980), the and are more complicated transformations of the data, so we cannot say immediately what the effect of changes in would be. 6 While we estimate the production function in Eq. 9 for the entire sample, we only estimate the various efficiency measures for a subset of five randomly chosen firms. This is done primarily for ease of computation of which involve integration over a product of functions, one for each firm in the comparison group, but it also demonstrates the usefulness of the probabilities in Eqs. 7 and 8. In essence, we calculate b is the rule ''randomly select five firms from '' Consequently, we only calculate five values of in each simulation iteration for comparison. This randomization introduces an additional source of variability into the exercise, which may cause some instability in the convergence results, but the instability is the price we pay for computational ease. Fortunately, the additional variability is common to all estimators considered, so any instability will be globally manifest.

Simulation procedure
The experiment is designed to assess (vis a vis ) over a range of common panel sizes and variance ratios . We use eight panel configurations: T = 5 and n = 25, 100, 500; T = 10 and n = 25, 100, 500; and T = 20 and n = 25,100. 7 In all cases we are concerned with the usual panel setting of large and fixed T, so asymptotic arguments are along the dimension . For each panel configuration we conduct simulation exercises for five variance ratios , so there are forty simulations in total. For reasons discussed above, we fix the number of firms for calculation of to five (randomly selected from . 8 Each iteration within a simulation exercise (indexed by goes through the following sampling and estimation procedure, which is repeated times. First, the errors are drawn from the appropriate half-normal and normal distributions (with respective variances ), and the regressors are drawn from an independent uniform [0,1] distribution. 9 Then is generated for (the only parameterization of the conditional mean function considered). Since each is observed, we can calculate the true values of for each draw, m. These map into the true values for for each m, so the ''parameters'' of interest are not constant across m. Estimation of and proceeds with corrected GLS (the ''random effects'' estimator). 10 After estimating and using for for in Eqs. 2 and 3, five firms are randomly selected to produce the subset From these results we calculate estimates for the five firms using Eqs. 4, 7, and 8. In what follows it is very important to remember that the are not fixed across iterations, (This should be clear, since all three of these measures are indexed by ) This produces nonstandard formulae for the MSE, bias, and MAPE, although their interpretations are, indeed, standard. It also underscores the difficulties in estimating efficiency in these models: we are trying to make inferences about the distribution of efficiency for each firm from what amounts to a single draw from the distribution, and that single draw uj is not even observed; it is merely ''estimated'' from the convolution, With the results from the 5,000 iterations for each simulation exercise, we calculate the mean square error of Our nonstandard formula is (typically): and similarly for 11 Even though the MSE is nonstandard because it includes sampling variability across the true parameters (even asymptotically), it still seems theoretically sensible. As we shall see, it also produces results that are sensible. Again, this is an unavoidable feature of efficiency estimation from these models (in general).
For the bias and MAPE, we separately use only the best or worst firms within each five-firm subsample. This is necessary as the probability statements within a comparison group automatically sum to one (e.g., so there is no average bias for the whole group for these estimators. This is an artifact of their ''relative nature'' and perhaps a nice feature. More specifically, using the population ranking of among the five randomly selected firms, we calculate the bias and MAPE of and for each iteration. Hence, the biases for each extremum measure are (typically): and similarly for We could have selected any firms in the ranking for this purpose 7 We omitted n = 500, T = 20 to save computing time for the entire exercise.

8
This also allowed us to indirectly examine the validity of the subset efficiency probabilities introduced in Eqs. 7 and 8. 9 We could have allowed the xjtm to be correlated within firms but did not. 10 When CGLS fails due to ^r2u \0; we set ^r2u ¼ 0; per Waldman (1982).
(i.e., [2], [3] or [4]), but the best and the worst seemed appropriate for evaluating the performance of ranked estimators. Also, the extreme firms map into efficiency probabilities from the population that tend to be large, precluding a ''divide-by-zero'' problem in the MAPE calculation, as we shall see.
quantifies the extent to which the estimate of technical efficiency for the most efficient firm in the randomly selected subsample is mismeasured on average. Similarly, the quantifies the extent to which the estimate of the probability of being most efficient for the most efficient firm in the randomly selected subsample is mis-measured on average. Finally, since the units of are different, the MSE and Bias measures are only relevant for making comparisons for a single measure (in isolation).
To make comparisons across measures we employ the unitless MAPE (typically): With the MAPE, we wish to avoid division by numbers close to zero, so we calculate it only for , the efficiency probability of the most efficient firm and the inefficiency probability of the least efficient firm, respectively, in the population. That is, efficiency probabilities like may be very close to zero in the denominator of the MAPE formula, so it is only calculated for ; which should both be fairly large in each draw. The results of the simulations and their discussion follow.

Results
First, the experiment shows that failure of the CGLS procedure is a problem only for extremely ''noisy'' variance ratios and for small in Tables 1, 2, 3. There are no failures with and with only a small number of failures (less that 1%) occur using the smallest sample n = 25, T = 5.
As expected, the MSE of all measures decreases with increasing n and fixed T. Of course, Tables 1, 2, 3 do not allow us to make comparisons across measures, since the units are different across measures. Also, it is not surprising that as the signal-to-noise ratio increases, the MSE of the estimates is usually non-increasing, but not always. In Tables 1, 2, 3 the ; the average MSE of the probability that is most efficient over is always non-increasing in . However, this is not true for the the average MSE of the conditional mean of firm and the ; the average MSE of the probability that j is least efficient over For example, in Table 3 for and moving from equal 1 to 5 to 10, the is increasing from 0.0032 to 0.0048 to 0.0055. Similarly the is increasing across these in the same simulations. The non-monotonicities are highlighted with asterisks in Table 1, 2, 3. Why might these non-monotonicities in arise? It is well-known that the random effects estimator of is a weighted sum of the between estimator and the within (or fixed effects) estimator (e.g., see Hsiao 1986 p36). The between estimator ignores the within firm variation, is large the random effects estimator places more weight on the within variation and the random effect estimator is close to the fixed effect estimator. It is also well-known that the random effects estimator is asymptotically efficient relative to the fixed effects estimator (e.g., see Baltagi 2005 p17), so when is very large, the random effects estimator may have a larger variance than when is small. This imprecision feeds into the estimates ; so nonmonotonicities in Tables 1, 2, 3 may reflect this lack of precision. Notice that they (highlighted with asterisks) occur primarily for the largest (and hence for largest 12 Another factor that may induce the non-monotonicities is the size of which appears as in the formulae for the conditional mean and efficiency probabilities. For our simulations, the true value of reaches a maximum between depending on the value of Obviously, smaller values of ceteris paribus inflate any error in the ratio so the estimators may be less precise for large . 13 Why is the MSE of the non-increasing in ? More accurately, why is the maximal efficiency probability immune to the variability of the random effects estimator when is large? Consider When (and hence ) is large, the probability of is large, so that differences in tend to be large. The efficiency probabilities are based on differences of these means and their relative variability. 12 The imprecision may be worsen by the fact that the fixed effects estimator cannot exploit correlations between x and u, as they have not been built into the DGP. 13 Of course there is no way to disentangle this phenomenon from the effect of the random effects estimator approaching the fixed effects estimator, but it is interesting to note.
When the differences are large, the ability of the probabilities to distinguish the efficiency distributions is improved. It must be the case that this ability to distinguish outweighs the increased variability in the random effects estimator. Of course this phenomenon does not occur for Why? It may be related to approximation error in caused by very large (in absolute value) Since follows from relatively small ; it is immune to approximation error. In fact, absent approximation error, we believe that would exhibit the same monotonicities as . The results for the MSE in Tables 1, 2, 3 are similar (for the most part) to the Bias results in Tables 4, 5, 6, which are tabulated for extreme-efficiency firms ([1] and [5]) from the ranked subsample of five. As expected, the biases of all measures are generally non-increasing in (in absolute value), and they are generally decreasing in with a few exceptions that are similar in nature to those of Tables 1, 2, 3. While the imprecision of the random effects estimator for large manifests itself in the variance of the efficiency estimates and, hence, the MSE of each estimator (Tables 1, 2, 3), it may also affect the bias of the estimates in this exercise. To see this, remember that the nonstandard bias formula is not based on a fixed parameter across all 5,000 draws. Our formulation does not ''average out'' deviations around a fixed parameter, so the possibility for large deviations persists. These persistent deviations may appear as bias in our results. Notice also that the probability measures are almost always negatively biased, while the conditional mean measures are almost always positively biased. We suspect that this reversal comes from the fact that the probabilities are based on the distribution of while the conditional means are based on the distribution of Across Tables 4, 5, 6, only is uniformly improving in both (in the sense that the absolute value of the bias is non-increasing). However, comparisons of the bias across different measures is not possible due to inconsistency of the units of measure.
To make comparisons across different measures, mean absolute percentage errors (MAPE) for the extreme ends of the population order statistic are presented in Tables 7, 8, 9. Across all three tables the results are clear: is less than for values of and is less than for values of In other words, the probabilities are out-performing the conditional mean measures, when the variance of inefficiency, is large. For example in Table 7, are 0.0890, 0.1633, 0.0688 and 0.0347, respectively. Our results are complicated by the fact that had extremely large values in some simulations with large These instances are indicated in the tables with double asterisks (**) and were due to a few draws where the true values of were so large, that they generated approximation errors in the computer calculations of the probabilities. (This is the same approximation error discussed for the MSE, but made worse since we are now selecting ) This is an unfortunate feature of the probabilities, but it is purely computational in nature (i.e., it could be corrected with a more accurate algorithm for calculating ). As for monotonicities in the MAPE, all measures improve with n as expected. Both and appear to have MAPE non-increasing in as well, except in one case for (and this may be due to approximation error in ). The MAPE of usually reaches a minimum MAPE at or below in all panel configurations.

Conclusions
This study provides evidence on the sampling performance of two very different technical efficiency estimators that are used to assess absolute and relative firm-level efficiency, based on parametric stochastic frontier models. We find that both the traditional conditional mean estimates and the efficiency probabilities appear to be monotonically more precise as increases. However, the effect of the variance ratio is more complicated. The efficiency probabilities out-perform the conditional mean when c is strictly greater than one. This is the empirically (and theoretically) important case for the frontier model. Our precision assessments are based on the unitless mean absolute percentage error, the only measure that could be used for comparison of these different estimators.
We are aware that we have introduced two other sources of variability in our study. One follows from the quantities of interest varying over , and the other follows from our random sample of five firms for each to calculate the measures of interest. The first source of variability could not be avoided and underscores the fact that efficiency ''estimates'' are not estimates of traditional population parameters. They are, in fact, proxies for an unobserved realization from inefficiency distributions. This is precisely the challenge that the frontier literature presents, and it is manifest in our study. The second source of variability was included by choice to relieve some computational burden. However, this variability is purely random and affects all efficiency estimators in similar ways. Finally, approximation error in calculating may have invalidated (or precluded) some simulation results for the largest values of , but the results for moderate values of are to be believed.