Asymptotic Properties of Estimators for the Linear Panel Regression Model with Individual Effects and Serially Correlated Errors: The Case of Stationary and Non-Stationary Regressors and Residuals

This paper studies the asymptotic properties of standard panel data estimators in a simple panel regression model with error component disturbances. Both the regressor and the remainder disturbance term are assumed to be autoregressive and possibly non-stationary. Asymptotic distributions are derived for the standard panel data estimators including ordinary least squares, fixed effects, first-difference, and generalized least squares (GLS) estimators when both T and n are large. We show that all the estimators have asymptotic normal distributions and have different convergence rates dependent on the non-stationarity of the regressors and the remainder disturbances. We show using Monte Carlo experiments that the loss in efficiency of the OLS, FE and FD estimators relative to true GLS can be substantial.


Introduction
Econometricians have long been concerned with conditions under which the ordinary least squares (OLS) estimator is asymptotically e¢ cient. The standard textbook result is that, under a general variance-covariance structure on the disturbances, the OLS estimator is less e¢ cient than generalized least squares (GLS). This is well documented for the case of stationary autoregressive disturbances and stationary regressors. However, Phillips and Park (1988) showed that in a regression with integrated regressors, OLS and GLS are asymptotically equivalent.
Recently, Choi (1999) studied the limiting distributions of the …xed e¤ects (FE), GLS, and within-GLS estimators for a panel data regression model with autoregressive disturbances, while Choi (2002) extended this work to instrumental variables (IV) estimation. Phillips and Moon (1999) presented a fundamental framework for studying sequential and joint limit theories in nonstationary panel data analysis, while Kao (1999) studied the asymptotic properties of the FE estimator of a spurious regression and proposed residualbased tests for panel cointegration. See Baltagi and Kao (2000), Choi (2006) and Breitung and Pesaran (2006) for recent surveys of this rapidly growing subject. In an early …nding, Baltagi and Krämer (1997) showed the equivalence of the GLS and FE estimators in a simple panel data regression with time trend as a regressor. Kao andEmerson (2004a, 2004b) extended Baltagi and Krämer to a model with serially correlated remainder errors. Kao and Emerson showed that the FE estimator is asymptotically equivalent to GLS when the error term is I(0); but that GLS is more e¢ cient than FE when the error term is I(1). It is known that the panel time trend can be seen as a special case of the panel regression with a non-zero drift I(1) regressor.
This paper extends the literature by studying the asymptotic properties of OLS, FE, …rst di¤erence (FD) and GLS in the panel regression with an autocorrelated regressor and an autocorrelated remainder error (both of which can be stationary or nonstationary). We show that when the error term is I(0) and the regressor is I(1), the FE estimator is asymptotically equivalent to the GLS estimator and OLS is less e¢ cient than GLS (due to a slower convergence speed). However, when the error term and the regressor are I(1), GLS is more e¢ cient than the FE estimator since GLS is p nT consistent, while FE is p n consistent.
This implies that GLS is the preferred estimator under both cases (i.e., regression error is either I(0) or I(1)).
All asymptotic results in this paper assume that T ! 1 followed by n ! 1. We use (n; T ) seq ! 1 to denote this sequential limit. We write the integral R 1 0 W (s)ds as R W andW as W R W when there is no ambiguity over limits. ) to denote weak convergence, to denote equivalence in distribution, p ! to denote convergence in probability, [x] to denote the largest integer x, I(0) and I(1) to signify a time series that is integrated of order zero and one, respectively, and BM ( ) to denote Brownian motion with covariance matrix . All proofs are collected in an appendix available upon request from the authors. 2

The Model and Assumptions
Consider the following panel regression: y it = + x it + u it ; i = 1; : : : ; n; t = 1; : : : ; T where u it = i + it ; and and are scalars. For simplicity, we consider the case of one regressor, but our results can be extended to the multiple regressor case. We assume that i iid(0; 2 ) and f it g is AR(1) where e it is a white noise process with variance $ 2 e . Let x it be also an AR (1) such that where " it is a white noise process with variance $ 2 " . In this paper, we assume that The initialization of this system is y i1 = x i1 = O p (1) for all i. Our interest is in the estimation of the common slope . This paper shows that the asymptotic properties of OLS, FE, FD, and GLS estimators depend crucially on the serial correlation properties of x it and v it : When y it and x it are both I(1) but v it is I(0), equation (1) is a panel cointegrated model. On the other hand, when v it is I(1) and y it and x it are both I(1); equation (1) is a panel spurious model. FE estimators for panel cointegrated and panel spurious models have been discussed in Phillips and Moon (1999) and Kao (1999). The case of a panel time trend model, x it = t, has been studied by Baltagi and Krämer (1997) and Kao andEmerson (2004a, 2004b).
Next, we characterize the innovation vector w it = (e it ; " it ) 0 . We assume that w it is a linear process that satis…es the following assumption: Assumption 1 For each i, we assume: 1. w it = (L) it = P 1 j=0 j it j ; P 1 j=0 j a k j k < 1; j (1)j 6 = 0 for some a > 1: 2. For a given i, it is i.i.d. with zero mean and variance-covariance matrix ; and …nite fourth order cumulants.
Assumption 2 We assume it and jt are independent for i 6 = j: That is we assume cross-sectional independence for our model.

3
Assumption 1 implies that the partial sum process 1 p T P [T r] t=1 w it satis…es the following multivariate invariance principle: where The long-run 2 2 covariance matrix of fw it g is given by $ e" $ 2 " 3 5 : The long-run covariance matrix can be decomposed into = + 2 , where e" 2 " Assuming $ 2 " is non-zero, we de…ne $ e:" = $ 2 e $ 2 e"  The assumption of constant variances/covariances across i; such as in ; ; and is used to simplify the notation. It can be extended into the case where di¤erent variances are allowed for di¤erent i at the expense of more complicated notation.

OLS Estimator
The OLS estimator of is given by Theorem 1 Under Assumptions 1 2, we obtain the following results: 1. If j j < 1 and j j < 1, i(t r) , and 00 = E " 2 it e 2 it .
2. If = 1 and j j < 1, 3. If j j < 1 and = 1, 4. If = 1 and = 1, It is important to note that $ e:" =$ 2 " can be seen as the long-run signal-to-noise ratio. The OLS estimator ignores the individual e¤ects in the disturbance term. Thus, the variance of i , i.e., 2 might appear in the variance-covariance matrix of b OLS depending on the case considered. In case 1, both i and it a¤ect the variance of b OLS . In cases 2 and 4, it dominates i . In case 3, i dominates it and hence the convergence speed is p nT , which di¤ers from the T -asymptotics in the panel cointegration literature. Also the asymptotic normality of the OLS estimator comes naturally. When summing across i, the nonstandard asymptotic distribution due to unit root in the time dimension, such as for cases 2-4, is smoothed out.
Corollary 1 When E e it " i(t+k) = 0 for all i and k, under the assumptions in Theorem 1, then When " it and e it are independent, , 3. If j j < 1 and = 1, . Corollary 1 follows directly from Theorem 1.

FE Estimator
The Fixed-E¤ects estimator of is given by where Theorem 2 Under Assumptions 1 2, we have the following results: 1. If j j < 1 and j j < 1, 2. If = 1 and j j < 1, 3. If j j < 1 and = 1, and F E 3 = 6$e:" (1 ) 2 $ 2 " : 4. If = 1 and = 1, Note $ "e is due to the endogeneity of the regressor x it ; and "e is due to serial correlation. Because Corollary 2 When E e it " i(t+k) = 0 for all i and k, under the same conditions as for Theorem 2, then 1. If j j < 1 and j j < 1, If " it and e it are independent, 3. If j j < 1 and = 1, . Corollary 2 follows directly from Theorem 2. Note that case 1 is the textbook result under the assumptions of stationarity of the regressor and the disturbance term. Case 2 is new. Case 3 is discussed by Phillips and Moon (1999) and Kao and Chiang (2000). Case 4 is discussed in Kao (1999).

FD Estimator
The First-di¤erence estimator of is given by Theorem 3 Under Assumptions 1 2, we obtain the following results: 1. If j j < 1 and j j < 1, (1 ) 2. If = 1 and j j < 1, and F D 2 = 2 00 (1 + ) 3. If j j < 1 and = 1, 4. If = 1 and = 1, : Similar to the FE estimator, the individual e¤ect i is also eliminated by the FD estimator because u it u it 1 = it it 1 . In cases 2 and 4, = 1, and the FD estimator is asymptotically equivalent to the GLS estimator because both methods transform the disturbance from I (1) into I (0). Actually, the FD estimator is mathematically the same as the GLS estimator except for the omission of the …rst observation for each individual.
Corollary 3 When E e it " i(t+k) = 0 for all i and k, under the same conditions as for Theorem 3, then 1. If j j < 1 and j j < 1, If " it and e it are independent, If " it and e it are independent, 3. If j j < 1 and = 1, If " it and e it are independent, . Corollary 3 follows directly from Theorem 3.

GLS Estimator
Let us rewrite equation (1) in vector form where y is nT 1, x is a vector of x it of dimension nT 1, nT is a vector of ones of dimension nT . and u is nT 1. As shown in the Appendix, where = E (uu 0 ).
One can decompose the variance-covariance matrix into where T is a vector of ones of dimension T . A is the variance-covariance matrix of it , T . When j j < 1, this estimation is equivalent to the Prais-Winsten transformation method suggested by Baltagi and Li (1991). One can easily verify that A 1 = CC 0 , where C =  is the Prais-Winsten transformation matrix as in Baltagi and Li (1991).
Thus, we have the following theorem: Theorem 4 Under Assumptions 1 2, we obtain the following results: 1. If j j < 1 and j j < 1, 2. If = 1 and j j < 1, 3. If j j < 1 and = 1, + "e + "e It is well known that the random e¤ects model imposes the critical assumption that i needs to be independent of x it . It is worth pointing out that this assumption is only needed for the case < 1. When = 1; the GLS transformation is identical to the …rst-di¤erence estimation except for the …rst observation of each individual. The Cochrane-Orcutt (CO) procedure of GLS transformation ignores the information contained in the …rst observation. Hence, if one use Cochrane-Orcutt (CO) procedure, the GLS estimation will be the same as the FD estimation and i need not to be independent of x it when = 1: When < 1; E ( i jx it ) = 0 is required, otherwise^ GLS would be biased and inconsistent. In this case, one may use the within or …rst-di¤erence transformation to wipe out i and then run GLS estimation. The Within-GLS or FD-GLS estimators can be shown asymptotically as e¢ cient as GLS estimator. However, this is beyond the scope of this paper and can be left as a further extension. The following corollary follows directly from Theorem 4.

Corollary 4
When E e it " i(t+k) = 0 for all i and k, under the same conditions as for Theorem 4, then If " it and e it are independent, If " it and e it are independent, 3. If j j < 1 and = 1, . Case 1 is the textbook result. Case 3 is discussed in Choi (1999). Cases 2 and 4 are new.

Feasible GLS Estimator
It is clear that the GLS estimator in Section 6 is not feasible. In this section, we discuss feasible GLS estimation. Assuming E e it " i(t+k) = 0 for all i and k and " it and e it are independent, a feasible GLS estimator can be calculated by estimating the autocorrelation coe¢ cient and the variance components 2 and 2 e . To estimate these parameters, we take the following steps. First, retrieve the residual estimator b it from the FE regression in (1). Now can be estimated as the where^ is the sample average of b it . Alternatively, as suggested by Baltagi and Li (1991), one can estimate Baltagi and Li (1997) suggests another consistent We choose the correlation coe¢ cient estimator because it ensures that b is always between 0 and 1. It can be shown that b in (14) is a consistent estimator of by using the Theorem Next, using the FE residuals b it and the estimate of the autocorrelation coe¢ cient b ; we can get b e it : Therefore 2 e can be estimated by^ 2 u it denote the OLS residuals from equation (10):^ 2 e and^ 2 are consistent estimators for 2 e and 2 respectively, i.e.,^ 2 e p ! 2 e ;^ 2 p ! 2 if j j < 1:These variance components can be estimated by using the variance decomposition and the Prais-Winsten (PW) transformation suggested by Baltagi and Li (1991). Alternatively, one can also use the Cochrane-Orcutt (CO) procedure, which ignores the information contained in the …rst observation. As suggested by Maeshiro (1976), Beach and MacKinnon (1978) and Park and Mitchell (1980), estimation using the PW transformation is more e¢ cient than using the CO procedure when the regressors are trended.
When the assumptions of corollary 4 hold, one can show that feasible GLS has the same asymptotic distribution as true GLS. De…ne = ; 2 ; 2 e and^ is its corresponding estimator. Then = ( ) : For example, in case 1, a Taylor's series expansion as in Fuller and Battese (1973) gives where lies between^ and ; hence This follows using similar arguments in the proofs of the Theorems above. The proofs are similar for the remaining three cases and are omitted to save space.

E¢ ciency Comparisons
This section summarizes the relative e¢ ciency of OLS, FE, GLS and FD estimators. First, the speed of convergence for the di¤erent cases considered are summarized as follows: In case 1, the four estimators have the same convergence speed of p nT . The e¢ ciency of the OLS estimator is hard to compare with the remaining estimators because OLS does not di¤erence out i , and as a result its variance still contains 2 : That GLS is more e¢ cient than FE and FD is evident from the Gauss-Markov theorem. Since these estimators all converge at same rate p nT ; we plot the relative e¢ ciency of the FE and FD estimators with respect to true GLS in Figure 1 and 2. The relative e¢ ciency of the FE estimator with respect to true GLS is given by The relative e¢ ciency of the FD estimator with respect to true GLS is given by One can easily verify that both relative e¢ ciencies are larger or equal to 1. Comparing the GLS estimator with the FE and FD estimators, the relative e¢ ciency depends on the values of and : As shown in Figure   1 and 2, when is small, the FE estimator performs well in terms of relative e¢ ciency with respect to true GLS. When is large, the FD estimators performs well in terms of relative e¢ ciency with respect to true GLS.
In case 2, the disturbance is I(1) but the regressor is I(0). The noise is strong so that it dominates the signal. In the time series case, the OLS estimator is not consistent. After double smoothing using panel data, the asymptotic distribution becomes normal and the convergence speed is p n. GLS estimation, however, transforms the disturbance into I(0). Therefore the convergence speed is p nT . When the disturbance is I(1), …rst-di¤erence estimation will be the same as GLS except for the …rst observation. Hence it is also p nT consistent.
In case 3, the disturbance is I(0) but the regressor is I(1). This is the cointegration case. The cointegration literature shows that the GLS estimators is T consistent in time series models. In the panel data model, both GLS and FE are p nT consistent.
In case 4, both the disturbance and the regressor are I(1). This is the spurious regression case. As shown in Kao (1999), the FE estimator is p n consistent. For the same reason given in case 2, …rst-di¤erencing transforms the disturbance term from I(1) to I(0). Therefore, the convergence speed of both the GLS or the FD estimators is p nT : In case 3, the FE estimator is more e¢ cient than the FD estimator when v it are stationary, including the special case when v it are serially uncorrelated. In cases 2 and 4, the FD estimator is more e¢ cient when v it follows a random walk. These results verify the conclusion in Wooldridge (2002). However, in case 1, when is large, even though v it does not follow a random walk, the FD estimator is still more e¢ cient than the FE estimator.
with = 10; i iid N (0; 5) and v it and x it follow an AR(1) process given in (2) and (3) Random numbers for i and " it were generated by the GAUSS procedure RNDNS. We generated n(T +1000) random numbers and then split them into n series so that each series had the same mean and variance. The …rst 1; 000 observations were discarded for each series. Tables 1-3 give the relative mean square error (MSE) of each estimator of with respect to true GLS for various values of ; ; n; and T . We only report 3 Tables to give a ‡avour of the results, the rest are available upon request from the authors. Several conclusions emerge from these results. First, the true GLS estimator is the most e¢ cient one in terms of mean squared error. Its e¢ ciency gain over the OLS estimator is enormous particularly when and/or is large. Second, the FE estimator is less e¢ cient than true GLS, but more e¢ cient than the feasible GLS estimator when = 0. However, when increases, the feasible GLS estimator quickly becomes more e¢ cient than the FE estimator. Third, the FD estimator is also less e¢ cient than true GLS. When increases, the FD estimator becomes as e¢ cient as the GLS estimator.
Interestingly, the FD estimator behaves poorly when is close to 1 but is small. Fourth, the feasible GLS estimator is slightly less e¢ cient than the true GLS estimator and beats OLS, FE and FD as long as > 0:2: In summary, our simulation results show that the feasible GLS estimator performs well, and is second best only to true GLS when > 0:2

Conclusion
In this paper, we compared the e¢ ciency of OLS, FE, FD, and GLS estimators in panel models with I(0) and I(1) regressor and regression error. When the regression error is I(0) and the regressor is I(1) and hence the model is cointegrated, both the FE and GLS estimators are asymptotically e¢ cient. When the regression error is I(1) and the regressor is I(1) and hence the model is spurious, the FE and GLS estimators are p n and p nT consistent, respectively. This implies that GLS is the preferred estimator as far as the regression error speci…cation is concerned since GLS converges at as good or better rate in both cases (i.e., regression error is either I(0) or I(1)).

24
Appendix A Proof of Theorem 1 The following lemmas are needed to prove Theorem 1. All limits are taken as T ! 1 and followed by n ! 1 sequentially, (n; T ) seq ! 1: Proof. Consider (1). For a …xed n, it is clear to see that = 0 for all i and t since there is no non-zero drift in (??). Then obviously, holds for all n and hence it holds for a large n as well. This proves (1).
Next we consider (2). Similarly for a …xed n as n ! 1 by a law of large numbers (LLN). This is because This proves (2).

Lemma 2 If Assumptions 1 3 hold, then
1. If j j < 1 and j j < 1, 2. If = 1 and j j < 1, 3. If j j < 1 and = 1, 4. If = 1 and = 1, Proof. : Consider (1). For part (a), we note Consider I. It is easy to see that for a …xed n, as n ! 1 because E ( i ) = 0: This proves that Next we consider III. Clearly as (n; T ) seq ! 1: Also it is easy to see that Collecting I IV we then prove (a).
Consider (2). For part (a), note First we consider III. Note as T ! 1: The above is taken from Lemma 1(a) in Kao and Chiang (2000).
It is clear to see that and For part (b), First consider III, as (n; T ) seq ! 1. It is easy to see that and Hence, we have as (n; T ) seq ! 1.

Consider (3). For part (a), it is easy to see that
For a …xed n, and For a …xed n, . It is easy to show that Consider (4). For part (a), it is easy to see that For a …xed n, as T ! 1 by a CLT. We then have as n ! 1: So as (n; T ) seq ! 1. Again x it x u it p ! "e + $ "e 2 as (n; T ) seq ! 1: For part (b), as T ! 1 by a CLT. As (n; T ) seq ! 1, we have Also it is easy to see that as (n; T ) seq ! 1. Hence, we have as (n; T ) seq ! 1.

B Proof of Theorem 1
Proof. The proof is straightforward by using lemmas 1 and 2.

C Proof of Theorem 2
The following lemmas will be used to prove Theorem 2.
Lemma 3 If Assumptions 1 2 hold, then Proof. Consider (1). For a …xed n, it is clear to see that holds for all n and hence it holds for a large n as well. This proves (1).
Consider (1). For part (a), note that as (n; T ) seq ! 1. Also it is easy to see that Hence, we have This proves (a).
For part (b), for a …xed n, as T ! 1, by a CLT. Hence, we have Consider (3). For part (a), note that for a …xed n, as T ! 1, by a central limit theorem. And as n ! 1. Hence, we have as (n; T ) seq ! 1: For part (b), for a …xed n, as T ! 1, by a central limit theorem. Hence, we have Proof. Te proof of Theorem 2 is straightforward with above lemmas.

E Proof of Theorem 3
The following lemmas will be used to prove Theorem 3.
Lemma 5 If Assumptions 1 2 hold, then Proof. Consider (1). If j j < 1, for a …xed n, it is clear to see that 1 holds for all n and hence it holds for a large n as well. This proves (1).

F Proof of Theorem 4
De…ne z = [ nT ; x], then With the de…nition of , we have When j j < 1, A = 1   . And  . And The following lemmas will be used to prove Theorem 4.