Date of Award

May 2016

Degree Type


Degree Name

Doctor of Philosophy (PhD)




William C. Horrace


Dirac Delta, Ordinary Least Squares, production, skewness, Stochastic Frontier

Subject Categories

Social and Behavioral Sciences


This dissertation consists of three essays on stochastic frontier models in characterizing inefficiency for a cross section of firms in essays one and three and a set of firms overtime in essay two. The first essay looks at stationary points for several models used in stochastic frontier analysis. The second essay extends the multivariate probability statements of Horrace (2005) to calculate the probability that a firm is any particular efficiency rank. These rank probabilities are used to calculate expected efficiency ranks for each firm. The third and final essay adds spatial correlation to the production function of each firm and generalizes the Horrace (2005) probability statements.

The skew of Ordinary Least Squares (OLS) residuals of the composed error is expected to be negative and positive for a production function and a cost function, respectively. However, because of sampling errors in empirical applications, modelers may get a positive skew for a production function and this has serious implication for Maximum Likelihood Estimates (MLEs) – this is called the “wrong skew problem”. Waldman (1982) shows that for the normal-half normal model if the wrong skew occurs then (1) MLEs reduce to OLS, (2) this solution is stable, and (3) there is a relationship between the skew of OLS residuals and the MLE of the pretruncated variance of inefficiency. In the literature two solutions are provided when the wrong skew occurs; (1) find a new random sample, however this might be too costly and (2) respecify the distribution of inefficiency.

The first essay generalizes part 1 of Waldman (1982) result using the theory of the Dirac measure (Dirac, 1930). This essay shows that if the inefficiency distribution converges to a Dirac delta function when the pretruncated variance of the inefficiency distribution goes to zero, the likelihood of the composed error will converge to a likelihood based solely on the noise distribution. In particular this essay shows that if the Dirac delta function is centered at zero then the maximum likelihood estimator equals the ordinary least squares estimator in the limit. The parameters of the inefficiency distribution are not identified in the limit. Stability of the maximum likelihood estimator and the “wrong skew” results are derived or simulated for common parametric assumptions on the inefficiency distribution. This essay shows that the full suite of Waldman (1982) result holds for the normal-doubly truncated normal and the normal-truncated normal models when the pretruncated mean is non-positive. Simulation results show that if the wrong skew occurs the MLEs for the normal-doubly truncated normal (when the upper bound B, B>2μ where μ is the pretruncated mean), the normal-truncated normal and the normal-exponential models reduce to OLS. A cost function with the wrong skew of OLS residuals is estimated using the Greene’s Airline data and the results show that the normal-truncated normal and the normal-exponential models reduce to OLS. Overall the results reveal that respecifying using the traditional assumptions for the inefficiency distribution is unnecessary if the wrong skew of OLS residuals occurs.

Empirical applications of frontier analysis are abundant ranging from the Airline industry to the farming industry, see Battese and Coelli (1995; 1992), Druska and Horrace (2004) and Almanidis, Qian and Sickles (2014). In empirical applications a modeler typically proceeds by estimating a Cobb-Douglas production function or a cost function for a set of firms. For a production function output is proxy by the total sales deflated by a price index. Inputs include all the factors of production such as land, labor and capital. The first step is to estimate OLS since it provides consistent estimates for all the parameters except the intercept or Corrected Ordinary Least Squares (COLS) which is OLS corrected for the biased intercept. The next step is to examine the skew (the third central moment) of OLS residuals before proceeding to MLE which is more efficient than OLS. The skew has important information so it is used as a guideline for empiricists as to how to proceed in applications. If the skew has the correct sign (negative for a production function) empiricists proceed to MLE. If the skew has the wrong sign (positive for a production function) empiricists respecify the distribution for inefficiency. This first essay shows that in empirical applications if a modeler encounters the wrong skew respecifying using the normal-truncated or the normal-exponential model is a futile procedure since these models do not provide any new results.

The second essay extends the multivariate probability statements of Horrace (2005) to calculate the conditional probability that a firm is any particular efficiency rank in a sample. Conditional expected efficiency ranks are constructed for each firm, in particular, it can be determined which firm in the sample is the best, 2nd best,…., 2nd worst and worst in the population of firms. Firm level conditional expected efficiency ranks are more informative about the degree of uncertainty in regards to ranking when compared to the traditional ranked efficiency point estimates. A Monte Carlo study reveals that under low skew the expected efficiency rank provides inferential insights which the traditional conditional mean function would not uncover.

The MLEs of the parameters (under the assumption that there are no estimation errors or parameters uncertainty) post estimation are substituted into the conditional mean function. The conditional mean function is the mean of inefficiency conditioned on the composed error and is used to produce estimates for inefficiencies for each firm in the sample, see Jondrow, Lovell, Materov and Schmidt (1982). The probability statements utilize both the first and the second moments which provide a more accurate description of the distribution inefficiency. In empirical applications to determine which firm is any efficiency rank the modeler substitutes MLEs into the probability statements (inefficiency conditioned on the composed error) and simulates the probabilities. The firm with the largest probability is interpreted as the best firm in the sample and the firm with the smallest probability is interpreted as the worst firm in the sample. Thereafter the modeler uses these conditional probabilities to compute the expected efficiency rank, such that the firm with the largest value of the expected efficiency rank is deemed the least efficient or ranked the worst and the firm with the smallest value is ranked as the most efficient or the best firm in the sample.

The third essay generalizes the Horrace (2005) probability statements to account for spatial correlation in the unobservable for a cross section of firms. This essay relaxes the assumption of independence on the noise or signal or both noise and signal distributions. This essay makes two assumptions on the inefficiency (signal) distribution, (1) inefficiency is assumed to be truncated from a normal distribution prior to the addition of spatial correlation and (2) inefficiency is drawn from a normal distribution and then truncated. The addition of spatial correlation to the production function results in the likelihood being intractable as the number of integrals increases with the sample size. This essay uses sequential conditioning by Spanos (1986; 1999) to factor the joint distribution into the product of a marginal and univariate conditional distributions to compute the probability of the least and most efficient firm. Unlike Horrace (2005) if inefficiency is assumed to be spatially correlated, the conditional distribution of inefficiency conditioned on the composed error is not needed to compute the probabilities. The MLEs are substituted directly into the probability statements. This is because spatial dependence induces heteroskedascity that results in variation across the firms. Overall this essay provides some insights to empiricists in making inference when the assumption of independence on the noise and inefficiency distributions is relaxed.

The presence of spatial correlation in the errors shifts the production function outward or inward. The composed error is not random because firms are locating in specific areas due to easier access to specialized workers which reduces the search cost of matching workers to the appropriate firms. Furthermore firms will locate in places where there are more favorable demand conditions, similar cultural practices, bureaucratic organization, work ethics and economic activities. Having better access to inputs will affect the productivity of a given firm, however these activities are not observed by the econometrician. These activities affect efficiency and need to be accounted for empirically to provide a better characterization of inefficiency. The spatial correlation is captured using a prespecified weighted matrix. There are several ways of determining the weights. For instance, a modeler could employ contiguity weights or use an inverse distance function. The inverse distance function means that firms further away from each other will impact each other less. These weights are typically known prior to estimation. The weighted matrix is added to the production function before estimation begins. Post estimation MLEs or COLS are substituted into the probability statements developed in Horrace (2005), in which the modeler will be able to compute the probability that firm i is the least or most efficient firm in the sample.


Open Access