Results on the Bias and Inconsistency of Ordinary Least Squares Results on the Bias and Inconsistency of Ordinary Least Squares for the Linear Probability Model for the Linear Probability Model

This note formalizes bias and inconsistency results for ordinary least squares (OLS) on the linear probability model and provides sufficient conditions for unbiasedness and consistency to hold. The conditions suggest that a b trimming estimator Q may reduce OLS bias. D 2005 Elsevier B.V. All rights reserved.


Introduction
Limitations of the Linear Probability Model (LPM) are well-known. OLS estimated probabilities are not bounded on the unit interval, and OLS estimation implies that heteroscedasticity exists. Conventional advice points to probit or logit as the standard remedy, which bound the maximum likelihood estimated probabilities on the unit interval. However, the fact that consistent estimation of the LPM may be difficult does not imply that either probit or logit is the correct specification of the probability model; it may be reasonable to assume that probabilities are generated from bounded linear decision rules. Theoretical rationalizations for the LPM are in Rosenthal (1989) and Heckman and Snyder (1977). Despite the attractiveness of logit and probit for estimating binary dependent variable models, OLS on the LPM is still used. Recent applications include Klaassen and Magnus (2001), Bettis andFairlie (2001), Lukashin (2000), McGarry (2000), Fairlie andSundstrom (1999), Reiley (2005), and Currie and Gruber (1996). Empirical rationales for the LPM specification are plentiful. McGarry appeals to ease of interpretation of estimated marginal effects, while Reiley cites a perfect correlation problem associated with the probit model. Fairlie and Sundstrom prefer LPM because it implies a simple expression for the change in unemployment rate between two censuses. Bettis and Farlie choose LPM because of an extremely large sample size and other simplifications implied by it. Lukashin uses the LPM, because it lends itself to a model selection algorithm based on an adaptive gradient criterion. Currie and Gruber state that logit, probit, and OLS are similar for their data and only report LPM results.
Other rationales for the OLS on the LPM are complications of probit/logit models in certain contexts. Klaassen and Magnus cite panel data complications in their tennis example and select OLS. OLS is perhaps justified in simultaneous equations/instrumental variable methods. The presence of dummy endogenous regressors is problematic if the DGP is assumed to be probit or logit; these problems were first considered by Heckman (1978). While perhaps less popular than logit and probit, OLS on the LPM model still finds its way into the literature for various reasons.
Some well-known LPM theorems are provided in Amemiya (1977). Econometrics textbooks (e.g., Greene, 2000), acknowledge complications leading to biased and inconsistent OLS estimates. Nevertheless, the literature is not clear on the precise conditions when OLS is problematic. This note rigorously lays out these conditions, derives the finite-sample and asymptotic biases of OLS, and provides additional results that highlight the appropriateness or inappropriateness of OLS estimation of the LPM. Finally, we suggest a trimmed sample estimator that could reduce OLS bias.

Results
Let y i be a discrete random variable, taking on the values 0 or 1. Let x i be a 1 Â k vector of explanatory variables on R k , b be a k Â 1 vector of coefficients, and e i be a random error. Define probabilities over the random variable x i b a R.
where p + c + q = 1. Consider a random sample of data: ( y i , x i ); i a N; N = {1, . . . , n}. Define the data partition: implying The LPM DGP is: The conditional probability of y i is: Therefore, y i traces the familiar ramp function on x i b with error process: and probabilities OLS proceeds as: where u i is a zero-mean random variable, independent of the x i . Notice that the OLS error term, u i , differs from e i : with probability function: The distinction between u i and e i induces problems in OLS.
Theorem 1. If c b 1, then Ordinary Least Squares estimation of the Linear Probability Model is generally biased and inconsistent.
Proof. Eq. (6) implies: Therefore, the conditional expectation of the OLS error, u i , is a function of x i with probability (1 À c). Hence, OLS is biased and inconsistent, if c b 1. 5 Hence, only observations i a j g possess mean-zero errors, so OLS with i g j g is problematic.
Remark 2. If n c p N, then OLS estimation is biased and inconsistent. That is, if the sample used to estimate b contains any i g j g , then c b 1, so OLS is problematic. Also: Remark 3. If c = 1, then OLS is unbiased and consistent, because p = q = 0, E(u i | x i ) = 0 for all i a N, and: Define random variables z i and w i : Hence, Pr(z i = 1) = c and Pr(w i = 1) = p. Alternative representation of Eq. (3) is: making explicit that u i is not the correct OLS error. Notice, so the conditional probability function of u i z i is the same as that of e i . Therefore, E(u i z i | x i ) = 0, and Eq.
(7) has a zero-mean error, independent of x i . Taking the unconditional mean of Eq. (7): where l xg = E(x i | z i = 1). Eq. (8) will be used in the sequel. The OLS estimator is: Partitioning the data by j g and j k , and taking into consideration z i and w i in each regime: which is generally biased and asymptotically biased, because c b 1. When c = 1, j g = N, the first term on the RHS is b, the second term is 0, and bˆn is unbiased. The inconsistency of bˆn follows in a similar fashion. Letting C denote the cardinality operator, define n k = C(j k ), n g = C(j g ) and n U = n À n k À n g . Let plim denote the probability limit operator as n Y l.
where l xk V and l x V are finite vectors. Assume plim [n À 1 n k ] = p and plim [n g n À 1 ] = c. Then: Even if c and p were known, b n could not be bias corrected, yet Eq. (8) seems to imply that if c and p were known, an OLS regression of ( y i À p) on (cx i ) might be unbiased. Define transformed OLS estimator: Theorem 4. b n * is biased and inconsistent for b. Hence, Thus, knowledge of p and c does not ensure an unbiased OLS estimator of b, and the bias will persist asymptotically. Moreover, it does not facilitate consistent estimation. The problem with bˆn and bˆn * is not that c and p are unknown but that j g is unknown. If we knew j g , we could perform OLS only on observations i a j g . Therefore: Remark 5. Sufficient information for unbiased and consistent OLS estimation is knowledge of j g .
Also, if j g = N, then: Therefore, Eq. (10) becomes: unbiased for j g = N. A similar argument can be made for consistency. If c = 1, then j g = N. Therefore: Remark 6. Without knowledge of j g and j k , a sufficient condition for unbiased OLS when c b 1 is j c = N. j g = N is a weaker sufficient condition than c = 1, but probably unlikely in large samples. For any given random sample, Pr[j g = N] = c n , so lim nYl Pr j c pN Â Ã ¼ lim nYl 1 À c n ð Þ¼1: Remark 7. Without knowledge of j g and j k , if c b 1 and j g = N, then as n Y l, j g p N with probability approaching 1, and b n is asymptotically biased and inconsistent.
Therefore, as N grows, once the first observation x i b g [0, 1] appears, then j g p N and unbiasedness is lost. Oddly, the estimator b n could be reliable in small samples yet unreliable in large samples.

Conclusions
Although it is theoretically possible for OLS on the LPM to yield unbiased estimation, this generally would require fortuitous circumstances. Furthermore, consistency seems to be an exceedingly rare occurrence as one would have to accept extraordinary restrictions on the joint distribution of the regressors. Therefore, OLS is frequently a biased estimator and almost always an inconsistent estimator of the LPM. If we had knowledge of the sets j g and j k , then a consistent estimate of b could be based on the sub-sample i a j g . This is tantamount to removing observations i g j g , suggesting that trimming observations violating the rule ŷ i = x i b n a [0, 1] and re-estimating the OLS model (based on the trimmed sample) may reduce finite sample bias. This seems to hold in simulations, but formal proof of this result is left for future research.