The conditional mode in parametric frontier models

We survey formulations of the conditional mode estimator for technical inefficiency in parametric stochastic frontier models with Normal errors and introduce new formulations for models with Laplace errors. We prove that the conditional mode estimator in the Normal-Exponential model achieves near-minimax optimality by estimating small inefficiencies as exactly zero. We also consider a rule for selecting a subset of maximally efficient firms based on zero conditional mode estimates and show that the subset has reasonably high probability of containing the most efficient firm, particularly when inefficiency is exponentially distributed. We include an empirical example demonstrating the merits of the conditional mode estimator.


Introduction
This paper studies estimation of technical inefficiency in the canonical cross-sectional stochastic frontier model (SFM) introduced by Aigner et al. (1977) and Meeusen and van den Broeck (1977): where y i is productive output, X i is a vector of production inputs and the error components are −∞ < v i < ∞ (statistical noise) and u i ≥ 0 (technical inefficiency).The nonnegative error u i captures the deviation of firm's production from the efficient frontier. 1  An important feature of the parametric SFM is that it yields estimates of firm-level technical inefficiency.Under the assumptions that u i and v i are i.i.d.over i, and u i , v i and X i are mutually independent, consistent estimation of the vector β follows by maximum likelihood estimation (MLE) or corrected ordinary least squares estimation (COLS, Olson et al. 1980), producing a consistent estimate of the composed error, εi .Then, estimation of inefficiency reduces to the problem of recovering it from εi . 2ondrow et al. (1982) recommend the mean or mode of the conditional distribution f(u i |ε i ) as an estimator of u i . 3We denote them as the conditional expectation estimator, ûe i Eðu i jε i Þ, and the conditional mode estimator, ûm i Mðu i jε i Þ.They derive closed-form formulae for the conditional expectation and mode under a Normal-Half Normal assumption, i.e., v i ~i.i.d.Nð0; σ 2 v Þ and u i ~i.i.d.j N ð0; σ 2 u Þj, and a Normal-Exponential assumption, i.e., v i ~i.i.d.Nð0; σ 2 v Þ and u i ~i.i.d.Exp(1/σ u ). 4 For notational simplicity, we will omit the subscript i unless necessary for exposition.
In the SF literature, the conditional expectation estimator ûe has become a common way to estimate firm-level inefficiency, while the conditional mode estimator ûm has received less attention.In terms of its closed-form expressions, ûe has been derived under a wide range of distributional assumptions: the Normal-Truncated Normal case by Kumbhakar and Lovell (2003); the Normal-Gamma case by Greene (1990); and the Normal-Uniform, Laplace-Exponential, Cauchy-Half Cauchy, etc., cases by Nguyen (2010).Its statistical properties are also well known.Waldman (1984) compares ûe with two other estimators (e.g., −ε) and shows that ûe is superior to the others in terms of mean squared error when the model is correctly specified.Wang and Schmidt (2009) derive the distribution of ûe and show that ûe convergences to the true inefficiency value, u, as the variance of the statistical noise approaches zero.The conditional expectation estimator has been employed in a variety of empirical settings as well: higher education production efficiency (Zoghbi et al. 2013), chemical industry energy efficiency (Lin and Long 2015), retail promotion strategy efficiency (Trivedi et al. 2017), corn farm productivity (McFadden et al. 2022), among others.By comparison, formulations of ûm have been extended only to the Normal-Gamma and Normal-Truncated Normal models (Kumbhakar and Lovell 2003), and its application is relatively rare.
It is only recently that the conditional mode estimator has gained attention again in the literature.Papadopoulos (2021) considers a Normal-Generalized Exponential model and shows the monotonicity of the conditional mode in the composed error.Zeebari et al. (2023) generalize the monotonicity result to any log-concave inefficiency distributions and extend the analysis of Wang and Schmidt (2009) to the conditional mode, deriving similar convergence and distributional results.
This paper follows this emerging body of research and discusses other important merits of the conditional mode that have not been discussed in the literature.First, we prove that ûm in the commonly deployed Normal-Exponential model achieves the near-minimax optimality that LASSO-type estimators possess.Our minimax result implies that risk-saving when many firms are nearly efficient is important to attain the optimality, and the conditional mode estimator minimizes estimation risk in this case by estimating small inefficiencies as exactly zero.Our simulation results suggest that this feature enables the mode estimator to be more robust to parametric misspecification than other inefficiency estimators.Thus, the conditional mode estimator may be the preferred choice for estimating technical inefficiency, particularly when misspecification is a concern.
Also, we consider a selection rule for a subset of maximally efficient firms based on zero conditional mode estimates and show that the subset has reasonably high probability of containing the most efficient firm in the sample.Our simulation results indicate that the cardinality of the subset increases as the uncertainty in the rankings of inefficiencies across firms increases, allowing the probability of the subset containing the best firm to remain (by and large) constant as the sample size grows.This zero conditional mode selection rule may be useful for practitioners to identify a credible subset of efficient firms when multivariate selection rules (e.g., MCB, Horrace and Schmidt 2000) are not available for practical reasons (e.g., computational burden). 5 Given that the literature on the conditional mode is still very thin, our results begin to fill this gap in the research of the conditional mode and should help promote its use empirically.The rest of the article is organized as follows: Section 2 summarizes closed-form expressions of ûm under various distributional assumptions with some being new to the literature; Section 3 proves its minimax optimality; Section 4 considers the zero conditional mode selection rule to identify a subset of maximally efficient firms and examines its statistical properties; Section 5 analyzes the inefficiency of U.S. electric utility firms; and Section 6 concludes.All proofs are included in the Appendix.

Closed-form expressions of ûm
Most formulations of ûm in the literature are derived under the assumption that the two-sided error v is normally distributed (e.g., Jondrow et al. 1982;Kumbhakar and Lovell 2003;Nguyen 2010, among others).Under this assumption, the conditional distribution f(u|ε) often follows a Normal distribution truncated below zero, and the conditional mode in this case is simply either the pre-truncated mean or zero, whichever is larger.We now summarize formulations of the conditional mode estimator for several parametric specifications of the model, which can be a useful resource for practitioners.
The upper panel of Table 1 summarizes closed-form expressions of ûm when v is normally distributed. 6First, we observe that ûm is weakly monotonic in the composed error ε, while the conditional expectation estimator ûe is strictly monotonic, 7 which precludes ties.In other words, the efficiency rankings based on ûe do not change from the rankings of ε, while those based on ûm can be weakly different 5 Our selection rule is a set approach for identifying efficient firms, which can be an effective alternative to the conventional inference techniques (e.g., point identification) since only one observation is used to estimate each firm's inefficiency and thus inefficiency estimates are in general noisy.This type of set identification of efficient firms was first considered in Horrace and Schmidt (2000), but rarely used in practice. 6Papadopoulos (2021) provides formulas and steps to calculate the conditional modes in his Normal-Generalized Exponential models, instead of closed form expressions. 7  Ondrich and Ruggiero (2001) show that the rank correlation between ε and ûe in SFMs is unity when the distribution of v is strictly logconcave.A Normal distribution is strictly log-concave, so their result applies here.from the rankings of ε due to possible ties in inefficiency estimates.
We also observe that, as opposed to ûe (which only generates positive estimates), ûm may produce zero estimates, which may correspond to maximally efficient firms.This feature may be useful when analyzing highly efficient markets, and the percentage of the firms with zero conditional mode estimates may be a naive estimate of overall market efficiency.Selection of efficient firms in this manner and near-minimax optimality related to this unique feature of estimating zero inefficiencies are discussed in the next sections.
Some papers consider a Laplace distribution for v. Recently, Horrace and Parmeter (2018) analyze SFMs under Laplace-Exponential and Laplace-Truncated Laplace distributions, and their simulation results show that the Laplace-Exponential model often outperforms the Normal-Exponential model when the distribution of v is misspecified.The lower panel of Table 1 includes closed-form expressions of ûm under Laplace-Uniform, Laplace-Half Normal, Laplace-Exponential and Laplace-Truncated Laplace distributions, which are new to the literature.The formulas show that ûm under a Laplace assumption is determined jointly by ε and the scale parameters of the distribution.This complication comes from the absolute value sign in the Laplace distribution function.Another distinct feature is that in certain cases, ûm is not a point estimate, but an interval.For instance, in case of the Laplace-Truncated Laplace model, when σ u = σ v , where σ u and σ v are the scale parameters of the Laplace distributions for v and u (respectively), ûm can be any point between 0 and μ, if ε is non-negative, where μ is the location parameter of the Laplace distribution for u.

Minimax optimality of the mode estimator
This section proves that ûm in the Normal-Exponential model attains the near-minimax optimality that LASSOtype estimators possess.To show this, we first discuss its connection to the Least Absolute Shrinkage and Selection Operator (LASSO; Tibshirani 1996). 8It is well known that, in a Bayesian context, the LASSO can be viewed as the posterior mode estimator when independent Laplace priors are imposed on the coefficients in regression models (Park and Casella 2008;Tibshirani 1996).Since the Exponential (Half-Laplace) inefficiency distribution may be seen as a prior and the conditional inefficiency distribution may be seen as a posterior distribution, ûm may be regarded as a posterior mode estimator under an Exponential prior.Then, the Bayesian interpretation of the LASSO suggests that ûm may be regarded as a LASSOtype estimator. 9ts closed-form expression clearly shows the connection: Table 1 Closed-Form Formulas for the Conditional Mode Truncated Laplace refers to the Laplace distribution truncated below zero with a positive pre-truncated mean.Truncated Laplace with a non-positive pre-truncated mean is an Exponential distribution.
where [z] + returns z if z > 0 and 0 otherwise, and λ ¼ σ 2 v =σ u .The parameter λ serves as a tuning parameter, which shrinks inefficiency estimates toward zero as the noise σ v grows relative to the inefficiency signal, σ u . 10.One notable difference between (2) and the usual LASSO-type estimator (i.e., L 1 shrinkage) is that the shrinkage effect in the LASSO is symmetric around the zero, but it is asymmetric in (2) due to the non-negativity constraint on inefficiency.Donoho and Johnstone (1994) show that L 1 shrinkage achieves near-minimax risk when estimating nonparametric regression functions.In the following subsection, we show that the same optimality is attained by (2).

Near-minimax optimality
To show near-minimax optimality in the Normal-Exponential model, we consider the multivariate Normal estimation problem discussed in Donoho and Johnstone (1994) with some modifications: we are given n independent observations ε We first derive the following risk bound of the conditional mode estimator (2): where The proof is in the Appendix.The risk bound (3) indicates that the mean squared loss of the conditional mode estimator (2) cannot be worse than a factor of 2 log n of Donoho and Johnstone (1994) show that, as n → ∞, in the usual multivariate Normal estimation problem, where The rate (4) is the asymptotic minimax risk rate expressed in terms of σ 13 which, when combined with (3), implies the conditional mode estimator (2) achieves near-minimax risk.The proof of (4) in Donoho and Johnstone (1994) implies that risk-saving when most true parameter values are near zero (most inefficiency values are near zero in our case) is important to attain the risk bound, 14 and the conditional mode estimator achieves this risk-saving by estimating small inefficiencies as exactly zero.This indicates that when there is little inefficiency, estimating a set of equally efficient firms (by estimating their inefficiencies as zero), rather than estimating all the inefficiencies individually, can reduce estimation risk.Kumbhakar et al. (2013) and Horrace et al. (forthcoming) explicitly allow for multiple efficient firms (i.e., zero inefficiencies) in their models.Particularly, Horrace et al. (forthcoming) apply the adaptive LASSO to select a set of maximally efficient firms in the panel fixed-effect stochastic frontier model and discuss a similar risk-saving.
Note that the risk bound is valid for all possible values/ distributions of u i ≥ 0, so the minimax optimality can be seen as robustness of ûm to distributional misspecification of inefficiency.Thus, the conditional mode estimator may be the preferred choice for estimating technical inefficiency, particularly when misspecification is a concern. 15Next section examines this robustness using simulations.
The minimax optimality requires λ ¼ σ v ð2 log nÞ 1=2 .Since λ in (2) is calculated based on σ2 v and σu estimates in practice, this indicates that (2) achieves optimality when σ v =σ u % ð2 log nÞ 1=2 (i.e., signal-to-noise ratio, σ u /σ v , approaches zero as n → ∞).This is a reasonable scenario since many firms will attain full efficiency, and departures from efficient performance may become rare or negligible as markets get larger and more competitive.Nonetheless, the simulation results in the next section show that ûm exhibits mean squared errors comparable to those of ûe even 10 The conditional expectation estimator can also be viewed as a shrinkage estimator as it shrinks inefficiency toward its mean (Wang and Schmidt 2009). 11Note that no distributional assumption is imposed on inefficiency in this problem. 12More precisely, this statement requires P n i¼1 u 2 i =n<1 as n → ∞ (so that κ n is finite).This is satisfied if a finite second moment is assumed for u i , which is standard in the literature. 13The minimax argument in Donoho and Johnstone (1994), which we follow in this section, is based on the concept of ideal risk, which is and represents the (infeasible) risk attained when estimating only those θ i larger than the noise level.This serves as a benchmark risk-level (i.e., ideal risk performance) to access the quality of estimators. 14They derive the minimax risk using a three-point prior distribution, F a,μ = (1 − a)ν 0 + a(ν μ + ν −μ )/2, where ν x denotes Dirac mass at x and μ > 0. They calculate the Bayes risk under the sequence of the prior as 'a → 0,' which corresponds to the case when most parameter values are near zero, and use it as a lower bound of the minimax risk.For more details, see Appendix 5 of Donoho and Johnstone (1994). 15Nonetheless, the shrinkage in the conditional mode estimator induces biases in the inefficiency estimates, particularly for large inefficiencies.However, the minimax optimality and the simulation results in the next section indicate that the bias, a cost for overall risk improvement, is relatively small, even when non-zero inefficiencies are prevalent.
when the signal-to-noise ratio is high, while achieving notable risk savings when the signal-to-noise ratio is low.

Simulation results
To demonstrate minimax optimality of the Normal-Exponential conditional mode estimator, we simulate a data generating process (DGP) that is misspecified in the inefficiency distribution.Let ε i = v i − u i , i = 1, …, n, with v i ~i.i.d.N(0, 1).We consider a mixture distribution for inefficiency, where u i ~i.i.d.F p;k ¼ pδ 0 þ ð1 À pÞχ 2 k .The parameter δ 0 represents a Dirac mass at 0, and p ∈ [0, 1] regulates sparsity in the continuous portion of the distribution, given by χ 2 k , a chi-square distribution with k degrees of freedom, and k being randomly selected from 1 to 10.The parameter p is a sparsity parameter since a larger value of p increases the number of efficient firms in the sample, creating greater sparsity in the number of inefficient firms.This DGP is taken because risk-saving when true inefficiencies are at or near zero is important to understand the optimality, as discussed in the previous section.Obviously, the sparsity parameter affects the signal-to-noise ratio of the model as well.A large value of p decreases the signal variance.We set n = 1000. 16ith all variables generated, we estimate σ 2 u and σ 2 v using the Normal-Half Normal (NHN) and Normal-Exponential (NE) models of Aigner et al. (1977), from which conditional expectation and conditional mode estimates are ultimately computed.Thus, the distribution of u is misspecified in these simulations.Note that the conditional mode estimates from the NE model are (2) (hereafter the minimax estimator) and its performance is our primary interest in this simulation experiment.We repeat this procedure 1000 times for each case with p ∈ {0, 0.1, 0.2, 0.3, …, 0.9}.
We report two types of results in , and average rank correlation between ûm i (or ûe i ) and u i over the repetitions.For rank correlation, we use Spearman's ρ and Kendall's τ b .17When p is small (i.e., many inefficient firms), the RMSEs and rank correlations of the four estimators are comparable.Specifically, when p = 0 or 0.1, the RMSE of the minimax estimator (in columns 5) is worst, but the gaps of RMSE between the four estimators are small.The rank correlations of the four estimators in these cases are almost the same.However, as p increases, the performance of the minimax estimator significantly improves, while the performances of the other estimators deteriorate in terms of both RMSE and rank correlation, leading to a large disparity between the minimax estimator and the others.As discussed above, the defining feature of the minimax estimator is that it estimates small inefficiencies as exactly zero, and the results clearly show that this feature enables it to achieve notable risk savings when most inefficiencies are near zero.Therefore, the conditional mode estimator may be preferred for analyzing highly competitive markets, where inefficiency may be sparse.4 Ranking and selection by the condition mode

Selection of efficient firms based on zero conditional mode
Inference on ranked technical efficiency estimates from SFMs has a recent but rich history.Using the conditional distribution of inefficiency, Horrace and Schmidt (1996) and Wheat et al. (2014) develop univariate prediction intervals for inference on u.Simar and Wilson (2009) consider univariate bootstrap inference.Horrace and Schmidt (2000) propose multiple comparisons for the fixedeffect version of the model, and Horrace (2005) and Flores-Lagunes et al. ( 2007) develop multivariate inference in the parametric SFM.
In particular, Flores-Lagunes et al. ( 2007) detail selection procedures for identifying a minimal cardinality subset of firms which contains the maximally efficient firm at a pre-specified confidence level.Using the conditional inefficiency density f(u|ε) and its cumulative distribution function, F(u|ε), the method calculates the probability of each firm j being most efficient in the sample as Note the similarity between equation ( 5) and the distribution of the minimum order statistic for a i.i.d.sample of size n from a single population.Equation ( 5) is the distribution of the minimum order statistic for a single draw from each of n potentially different populations, conditional on ε.Note also that the magnitude of this probability is independent of whether we chose to estimate inefficiency using the conditional expectation estimator or the conditional mode estimator.It only uses the information in the conditional distribution of inefficiency.In particular, the ranks of Pj may not correspond to the ranks of either ûm j or ûe j .Then, the probability that any subset of firms contains the maximally efficient firm can be computed by summing the probabilities, Pj , for only those firms in the subset of interest. 18Therefore, the technique can be used to calculate this probability sum for the subset of firms with zero conditional mode estimates.The goal of this section is to perform simulations on properly specified Normal-Half Normal and Normal-Exponential models in (1) to examine the probabilities that the zero conditional mode subsets contain the maximally efficient (best) firm in the sample.
The empirical relevance of the exercise is to understand when practitioners can simply use the zero conditional mode selection rule (without calculating the computationally intensive probabilities in ( 5)) and still feel confident that this ad hoc selection criterion is credible (i.e., it has reasonably high probability of identifying a subset of firms that contains the best firm).We note that Horrace et al. (2022) use this zero conditional mode selection rule for determining efficient schools in New York City, but their paper doesn't examine the statistical properties of the zero conditional mode rule under various specifications.

Simulation results
We consider the following panel DGP with time invariant u: the simulations, and η denotes signal-to-noise ratio (σ u /σ v ).The total number of iterations for each case is 1000.Let N = {1, 2, …, n} be the set of all firm indices in the sample, and N m ¼ fi : ûm i ¼ 0g N be the subset of firms with zero conditional mode estimates.Let the ranked true inefficiencies be u [1] corresponds to the index of the most efficient firm in the sample.Then, the probability that the zero conditional mode subset contains the best firm is Then, the empirical question is, how large is Pm on average for our various simulation designs?Let the ranked inefficiency estimates based on the conditional expectation be ûe ð1Þ ûe ð2Þ ûe ðNÞ .So, (1) corresponds to the index of the firm with the smallest conditional expectation estimate.We also calculate the probability that the firm with the smallest conditional expectation estimate is the true best firm Pe ¼ Prf½1 ¼ ð1Þjε 1 ; ; ε n g:

Generally speaking,
Pe will be a decreasing function of the sample size, so we are also interested in understanding how Pm responds to the sample size.
We report the average Pe , the average Pm and the average cardinality of N m , denoted by |N m |, over M = 1000 iterations.To assess accuracy of the estimated probabilities Pe and Pm , we also calculate their empirical analogs, where 1 k is an indicator function for the kth simulation sample.We expect average Pe and average Pm to be similar to Pe and Pm , respectively.
The results are reported in Table 3.The first panel reports the results when the number of firms n increases from 5 to 100 while fixing T = 1 and η = 1.We can see that, as n increases, average Pe significantly decreases, while average Pm remains relatively constant in the range of 0.4 ~0.5 under the Normal-Half Normal distribution and 0.80 ~0.85 under the Normal-Exponential distribution.That is, the zero conditional mode subset has reasonably high probability of containing the best firm (0.80 ~0.85) when the Normal-Exponential model is the true model.Note that average cardinality, |N m |, increases proportionally to n, which allows the zero conditional mode subset to maintain the high probability level.
Second panel shows the results when η 2 changes from 0.1 to 10 while fixing n = 50 and T = 1.As η increases, i.e., signal increases, average Pe increases, but average Pm remains largely constant under the Normal-Half Normal distribution and decreases under the Normal-Exponential distribution.Even with the decrease, we still observe a significant gap between average Pe and Pm .The results in the last panel, where the number of time periods T increases from 1 to 100 while fixing n = 50 and η = 1, are overall to those in the second panel.This is because, under the time-invariant inefficiency setting, an increase in T leads to an increase in signal-to-noise ratio.In all cases, average Pe and average Pm are reasonably close to Pe and Pm .

Empirical example
We analyze the technical efficiency of 123 U.S. electric utility firms using ûm and the zero conditional mode selection rule considered in Section 4. The dataset used in this section was previously analyzed by Greene (1990) and Nguyen (2010).We consider the following production function specification: where Q is output, L is labor, K is capital and F is fuel.The following steps were taken to carry out the empirical analysis: 1  Unless otherwise specified, we fix n = 50, T = 1 and η 2 = 1.

Pm and
Pm are the averages of the calculated and empirical probabilities that the zero conditional mode subset contains the best firm, respectively.
Pe and Pe are the averages of the calculated and empirical probabilities that the firm with the smallest conditional expectation estimate is the best firm, respectively.|N m | is the average cardinality of the zero conditional mode subset.
There are forty firms with a zero conditional mode estimate.This subset of firms has 59.4% probability of containing the most efficient firm, implying that if we naively selected these firms as candidates for the most efficient firm simply by inspecting their conditional mode estimates, we still have a nearly 60% chance to have included the best firm in the subset without calculating the probabilities in column 5.This reassures us that selection based on the zero conditional mode estimate may be a convenient and reliable way to identify a subset with high probability of containing the most efficient firm.
The distributions of the conditional expectation and conditional mode estimates are plotted in Fig. 1, where we can see a mass of zero estimates in the graphs of the conditional mode estimates.It is also noteworthy that under Normal-Half Normal or Normal-Exponential distributional assumptions, the conditional distribution is a Normal distribution truncated below zero (Jondrow et al. 1982) and thus the value of a conditional mode estimate is never larger than that of its conditional mean counterpart, which is reflected in the results here.

Conclusions
One of the main purposes of stochastic frontier model is to measure firm-specific technical inefficiency, for which  Jondrow et al. (1982) propose two estimators -the conditional expectation and conditional mode.While there are numerous papers studying and applying the conditional expectation estimator, the conditional mode estimator has been largely overlooked in the literature.
Recently, Papadopoulos (2021) and Zeebari et al. (2023) derive several important statistical properties of the conditional mode estimator.This paper complements these existing studies, suggesting other important merits of the conditional mode estimator.First, zero conditional mode estimates can be used as a simple selection rule that chooses efficient firms with a reasonably high probability of containing the most efficient firm in the sample, particularly when the model is Normal-Exponential.Second, it possesses near-minimax optimality when the Normal-Exponential model is employed.Our theoretical and empirical results indicate that the conditional mode estimator is particularly suited for analyzing competitive markets, where most firms may be near the frontier and identification of multiple efficient firms is desirable.

Appendix: Proof of theorem
We follow the proof of Theorem 3 in Zou (2006), but account for the positivity constraint on inefficiency parameters.We prove for the univariate case here and the multivariate case follows by summation.
Let εju $ NðÀu; σ 2 v Þ where u ≥ 0, and ûm ðλÞ ¼ . We first expand the mean squared error of ûm ðλÞ such that where we have used the Stein's lemma (Stein 1981)

Table 2
Minimax optimality of the conditional mode estimator . MLE: Under the pre-specified production function and distribution assumptions, implement MLE to estimate β, σ v and σ u .2. Technical Inefficiency Estimation: Use MLE residuals and estimated coefficients to calculate the conditional expectation and mode estimates for each firm based on formulas in Table 1. 3. Probability Statement Calculation: Calculate the

Table 3
Selection of efficient firms by the conditional mode estimator