A Generalized Spatial Panel Data Model with Random Effects

This paper proposes a generalized panel data model with random effects and first-order spatially autocorrelated residuals that encompasses two previously suggested specifications. The first one is described in Anselin’s (1988) book and the second one by Kapoor, Kelejian, and Prucha (2007). Our encompassing specification allows us to test for these models as restricted specifications. In particular, we derive three LM and LR tests that restrict our generalized model to obtain the Anselin model, the Kapoor, Kelejian, and Prucha model, and the simple random effects model that ignores the spatial correlation in the residuals. For two of these three tests, we obtain closed form solutions and we derive their large sample distributions. Our Monte Carlo results show that the suggested tests are powerful in testing for these restricted specifications even in small and medium sized samples.


Introduction 1
The recent literature on spatial panels distinguishes between two di¤erent spatial autoregressive error processes. One speci…cation assumes that spatial correlation occurs only in the remainder error term, whereas no spatial correlation takes place in the individual e¤ects (see Anselin, 1988, Baltagi, Song, and Koh, 2003, and Anselin, Le Gallo, and Jayet, 2008; henceforth referred to as the Anselin model). Another speci…cation assumes that the same spatial error process applies to both the individual and remainder error components (see Kapoor, Kelejian, and Prucha, 2007; henceforth referred to as the KKP model). 2 While the two data generating processes look similar, they imply di¤erent spatial spillover mechanisms. For example, consider the question of …rm productivity using panel data. Besides the deterministic components, …rms di¤er also with respect to their unobserved know-how or their managerial ability to organize production processes e¢ ciently. At least over a short time period, this managerial ability may be time-invariant. Beyond that there are innovations that vary from period to period like random …rm-speci…c technology shocks, capacity utilization shocks, etc. Under this scenario, it seems reasonable to assume that …rm productivity may be spatially correlated due to spillovers. Such spillovers can occur, e.g., through information ‡ows (transmission of process technologies) embodied in worker ‡ows between …rms at local labor markets or through input-output channels (technology requirements and interdependence of capacity utilization). Whereas the Anselin model assumes that spillovers are 1 We would like to thank Matthias Koch, Ingmar Prucha, two anonymous referees and the editor Esfandiar Maasoumi for their helpful comments and suggestions. Prelimimary versions of this paper were presented at the 13th International conference on panel data held in Cambridge, England, and the 23rd annual Canadian econometric study group meeting in Niagara Falls, Canada. 2 There has been a lot of attention to cross-sectional dependence in panel data models, modeled through factor models. A rapidly growing reasearch topic within this general …eld has been the reconciliation of factor models and spatial models, with attempts to express weak and strong cross-sectional dependence, see Chudik, Pesaran, and Tosetti (2011), Pesaran andTosetti (2011), andSara…dis andWansbeek (2011), to mention a few.
inherently time-varying, the KKP process assumes the spillovers to be timeinvariant as well as time-variant. For example, …rms located in the neighborhood of highly productive …rms may get time-invariant permanent spillovers a¤ecting their productivity in addition to the time-variant spillovers as in the Anselin model. While the Anselin model seems restrictive in that it does not allow permanent spillovers through the individual …rm e¤ects, the KKP approach is restrictive in the sense that it does not allow for a di¤erential intensity of spillovers of the permanent and transitory shocks.
This paper introduces a generalized spatial panel model which encompasses these two models and allows for spatial correlation in the individual and remainder error components that may have di¤erent spatial autoregressive parameters.
We consider a (quasi-)maximum likelihood estimator (MLE) for this more general spatial panel model when the individual e¤ects are assumed to be random.
This in turn allows us to test the restrictions on our generalized model to obtain (i) the Anselin model, (ii) the Kapoor, Kelejian, and Prucha model, and (iii) a simple random e¤ects model that ignores the spatial correlation in the residuals.
We derive the corresponding LM and LR tests for these three hypotheses and we compare their size and power performance using Monte Carlo experiments.

A Generalized Model
Econometric models for panel data with spatial error processes have been proposed by Anselin (1988), Baltagi, Song, and Koh (2003), Kapoor, Kelejian, and Prucha (2007), Anselin, Le Gallo, and Jayet (2008), Lee andYu (2010a, 2010b) to mention a few. A generalized spatial panel data model that encompasses 3 these previous speci…cations is given as follows: 3 y t = X t + u 1 + u 2t ; t = 1; :::; T where the (N 1) vector y t includes the observations on the dependent variable at time t, with N denoting the number of unique cross-sectional units. The non-stochastic (N K) matrix X t gives the observations at time t for a set of K exogenous variables, including the constant. is the corresponding (K 1) parameter vector. The disturbance term follows an error component model which involves the sum of two disturbances. The (N 1) vector of random variables u 1 captures the time-invariant unit-speci…c e¤ects and therefore has no time subscript. The (N 1) vector of the remainder disturbances u 2t varies with time. Both u 1 and u 2t are spatially correlated with the same spatial weights matrix W, but with di¤erent spatial autocorrelation parameters 1 and 2 , respectively. The (N N ) spatial weights matrix W has zero diagonal elements and its entries are typically declining with distance.
We further assume that the row and column sums of W are uniformly bounded in absolute value and that r is bounded in absolute value and independent of N . In case W is row normalized, the parameter space for r is a closed interval contained in ( 1; 1). For the case where W is not normalized, we assume that the parameter space for r is contained in the closed interval 1= max < r < 1= max for all N and r = 1; 2, where max is the largest absolute value of the eigenvalues of W. Hence, the spatial weights matrix may be either row normalized or maximum row normalized (see Kelejian and Prucha, 2010). Further, let A = I N 1 W and B = I N 2 W. The matrices A and B are non-singular for all r ; r = 1; 2 in the parameter space and all N .
The elements of are assumed to be independently and identically distrib-3 To avoid index cluttering, we suppress the subscript indicating that the elements of the spatial weights matrix may depend on N and that the dependent variable and the disturbances form triangular arrays. 4 uted as N (0; 2 ) across i. The elements of t are assumed to be independently and identically distributed as N (0; 2 ) across i and t. Also, the elements of and t are assumed to be independent of each other. Appendix B provides a more detailed set of assumptions.
Stacking the cross-sections over time yields where y = [y This model encompasses both the KKP model, which assumes that 1 = 2 , and the Anselin model, which assumes that 1 = 0. If 1 = 2 = 0, i.e., there is no spatial correlation, this model reduces to the familiar random e¤ects (RE) panel data model; see Baltagi (2008).
Let A = (I N 1 W) and B = (I N 2 W); then, under the present assumptions we have u 2 = (I T B 1 ) N (0; 2 (I T (B 0 B) 1 ).
Let E T = I T J T , where J T = J T =T is the averaging matrix with J T being a matrix of ones of dimension T . The variance-covariance matrix of the spatial random e¤ects panel data model is given by . This uses the fact that E[u 1 u 0 2 ] = 0 since and are assumed to be independent. The last equality in (3) replaces J T by T J T and I T by E T +J T . Note that It is easy to show that the inverse of the (N T N T ) matrix u can be obtained from the inverse of matrices of smaller dimension (N N ) as follows: We also assume that the inverses A 1 ; B 1 and [T 2 (A 0 A) 1 + 2 (B 0 B) 1 ] 1 have bounded row and column sums, uniformly in N and in the parameter space (see Assumption A2 in the Appendix for further details). Under the present assumptions, the log-likelihood function of the general model is given by where =( 2 ; 2 ; 1 ; 2 ). The maximum likelihood estimates are obtained by maximizing the log-likelihood function numerically using a constrained quasi-Newton method. 4 The hypotheses under consideration in this paper are the following: (1) H A 0 : 1 = 2 = 0, and the alternative H A 1 is that at least one component is not zero. The restricted model is the standard random e¤ects (RE) panel data model with no spatial correlation, see Baltagi (2008).
(2) H B 0 : 1 = 0; and the alternative is H B 1 : 1 6 = 0. The restricted model is the Anselin (1988) spatial panel model with random e¤ects. In fact, the restricted log-likelihood function reduces to the one considered by Anselin (1988, p.154). 4 The numerical maximization procedure can be simpli…ed, if one concentrates the likelihood with respect to and 2 . However, our optimization for the Monte Carlo simulation using MATLAB were quite fast using the constrained quasi-Newton method. Appendix F describes some details on the numerical optimization procedure.
In the next subsections, we derive the corresponding LM tests for these hypotheses and we compare their performance with the corresponding LR tests using Monte Carlo experiments. 5 Appendix A describes some general results used to derive the score and information matrix for these alternative models;

LM and LR Tests for H
The (quasi-)ML estimates under H A 0 are labeled by a tilde and the corresponding restricted parameter vector is indexed by A. The joint LM test statistic for the null hypothesis of no spatial correlation, H A 0 : 1 = 2 = 0, is derived in Appendix C and it is given by where e 2 In this case, e u = y X e denotes the vector of the estimated residuals under H A 0 . The restricted model is the simple random e¤ects (RE) panel data model without any spatial autocorrelation. In fact, e 2 = e u 0 (E T I N )e u N (T 1) and e 2 1 = e asymptotically distributed as 2 2 as shown in Appendix C. Note this test does not require the assumption of normally distributed disturbances.
Under normal disturbances one can also derive the corresponding LR test for H A 0 : 1 = 2 = 0 as LR A = 2(L G L A ), 5 LM tests for spatial models are surveyed in Anselin (1988Anselin ( , 2001 and Anselin and Bera (1998), to mention a few. For a joint test for the absence of spatial correlation and random e¤ects in a panel data model, see Baltagi, Song, and Koh (2003).
using the maximized log-likelihood of the general model denoted by L G and the maximized log-likelihood under H A 0 : This test statistic is likewise asymptotically distributed as 2 2 .

LM and LR Tests for
Under H B 0 : 1 = 0, the restricted model is the spatial panel data model with random e¤ects described in Anselin (1988). The corresponding LM test for H B 0 is a conditional test for zero spatial correlation in the individual e¤ects, allowing for the possibility of spatial correlation in the remainder error term, i.e., 2 6 = 0.
In fact, under H B 0 , the information matrix is block-diagonal with the lower block being independent of . Let d be the (4 1) score vector referring to the parameter vector = ( 2 ; 2 ; 1 ; 2 ) and denote the 4 4 lower block of the information matrix by J . The (quasi-)ML estimates under H B 0 are labeled by a hat. The corresponding estimated residuals are then b The LM test for H B 0 is calculated as where b J 1 33 is the (3; 3) element of the inverse of the estimated information matrix b J 1 under H B 0 . This test statistic has no closed form representation, but using similar assumptions and proofs as in the Appendices, this test statistic should be asymptotically distributed as 2 1 . Under non-normal disturbances the LMtest can be derived following White (1982) and in the Appendix D it is derived as where we de…ne the 4 4 matrix with klth element [ 1 3)] and R = [0; 0; 1; 0]. The elements a k;ii are de…ned in the Appendix D, This robust LM-test statistic is asymptotically distributed as 2 (1): With normal disturbances the corresponding LR test is based upon the maximized log-likelihood under H B 0 : This restricted log-likelihood is the same as that given by Anselin (1988, p. 154).

LM and LR Tests for
Under H C 0 : 1 = 2 = , the true model is the one suggested by Kapoor, Kelejian, and Prucha (2007). In this case, B = A and the parameter estimates under H C 0 are labeled by a bar. The corresponding estimated residuals are given by u= y X . The score and the information matrix needed for this test are derived in Appendix E. With normal disturbances the joint LM test statistic for H C 0 is given by . Under H C 0 , the LM C statistic is asymptotically distributed as 2 1 as shown in Appendix F. If the disturbances are not normally distributed one may use the robust version of this LM test, which is derived in Appendix E as The true correction factors are de…ned as , respectively (see Appendix E for details). Under normal disturbances the LR test is based on the following maximized log-likelihood under H C 0 : Kapoor, Kelejian, and Prucha (2007) consider a generalized method of moments estimator, rather than (quasi-)MLE, for their spatial random e¤ects panel data model. L C is the maximized log-likelihood for the KKP model with normal disturbances.

Monte Carlo Results
In the Monte Carlo analysis, we use a simple panel data model that includes one explanatory variable and a constant (K = 2) y it = 0 + 1 x it + u it ; i = 1; :::; N and t = 1; :::; T , where 0 = 5 and 1 = 0:5. x it is generated by N (0; 20(1 )) with 0 < < 1. = 2 2 + 2 is the proportion of the total variance due to the heterogeneity of the individual-speci…c e¤ects. This implies that 2 + 2 = 20.
We generate the spatial weights matrix by allocating observations randomly on a grid of 2N squares. Consequently, as the number of observations N increases, the number of squares in the grid grows larger, too. The probability that an observation is located on a particular coordinate is equal for all coordinates on the grid. This results in an irregular lattice, where each observation possesses 3 neighbors on average. The spatial weighting scheme is based on the Queens design, where each observation (except that in the …rst and last row and column) has four neighbors situated in the north, south, east and west neighboring cells. The corresponding spatial weights matrix is normalized so that each row sums to one.
The parameters 1 and 2 vary over the set f 0:8; 0:5; 0:2; 0; 0:2; 0:5; 0:8g. For each experiment, we calculate the three LM and LR tests as derived above, using 2000 replications. 7 ===== Tables 1-3 ===== Table 1 reports the frequency of rejections for N = 50, T = 5, and = 0:5 in 2000 replications. This means that 2 = 2 = 10. The size of each test is denoted in bold …gures and is not statistically di¤erent from the 5% nominal size. The only exception where the LM test might be undersized is for the KKP model, for high absolute values of 1 and 2 ; both equal to 0:8. The size adjusted power 8 of the LR and LM tests is reasonably high for all three hypotheses considered. The performance of the LM test is almost the same as that of the LR test, except for a few cases. For H A 0 : 1 = 2 = 0; when 1 = 0:5 and 2 = 0, the size adjusted power of the LM test is 61:4% as compared to 64:6% for LR. At 1 = 0:5 and 2 = 0, the size adjusted power of the LM test is 70% as compared to 66:4% for LR. Similarly, for H B 0 : 1 = 0, when 1 = 0:5 and 2 = 0, the size adjusted power of the LM test is 70:2% as compared to 72:9% for LR. At 1 = 0:5 and 2 = 0, the size adjusted power of the LM test is 76:7% as compared to 74:6% for LR. For H C 0 : 1 = 2 = , when 1 = 0:5 and 2 = 0, the size adjusted power of the LM test is 66:1% as compared to 68:5% for LR. At 1 = 0:5 and 2 = 0, the size adjusted power of the LM test is 70:6% as compared to 65% for LR.
Things also improve if the number of observations increases. The increase in power is larger when we double N from 50 to 100 as compared to doubling T from 5 to 10. 9 We conclude that the three LM and LR tests perform reasonably well in testing the restrictions underlying the simple random e¤ects model without spatial correlation, the Anselin model and the KKP model in small and medium sized samples.
Figures 1-4 plot the size adjusted power for the various hypotheses considered. In Figure 1, the pure random e¤ects model is true, whereas in Figure 2, the Anselin model is true. In Figures 3 and 4, the KKP-type model is true with di¤erent values for the common .

===== Figures 1-2 =====
Let us start with a comparison of the panels given in Figure 1, which assumes that the random e¤ects model is true ( 1 = 2 = 0). On the left hand side, we plot the size adjusted power of the LM test for deviations of 1 from 0, maintaining that 2 = 0. On the right hand side it is the other way around.
Observe that the power of the LM test is higher for deviations of 2 from 0 as compared to deviations of 1 from 0. Keep in mind that the estimates of 2 are based on N T observations, while those of 1 rely on only N observations. The top two panels show that the power increases for deviations in 1 as increases.
However, for deviations in 2 , the power of the test is insensitive to . The two panels at the center of Figure 1 illustrate that both the size and the power 9 We do not include the corresponding Tables for (N = 50; T = 10) and (N = 100; T = 5); for = 0:25; 0:50; and 0:75, in order to save space. However, these tables are available upon request from the authors.

13
of the LM test improve as the sample size increases, especially as N becomes larger. A comparison of the two panels at the center with those at the bottom of Figure 1 provides information on the interaction of sample size (N , T ) and the relative importance of . It is obvious that for deviations of 1 from 0 (on the left), the power improves with N , especially as increases.  Figure 4 is 1 = 2 = 0:5. The major di¤erence between the two …gures is that assuming a null that is di¤erent from 1 = 2 = 0 shifts the size adjusted power function and renders it skewed to the right. Otherwise, the conclusions regarding the impact of , N , and T are qualitatively similar to those of the random e¤ects model. A major di¤erence from the random e¤ects model is that for the KKP model the power is lower in the 2 direction, especially for small .

Robustness Checks
We also assess the performance of the proposed LM tests with respect to (i) non-normal errors (using the derived robust vs. the non-robust LM test statistics) and (ii) the speci…cation of the spatial weighting matrix. To compare the simulated power functions for normal vs. non-normal errors, we generate the remainder error term …rst as it s t(5) and normalize its variance to 10.
Hence, = 0:5 holds in this case and the results are comparable to the basic Monte Carlo set-up de…ned above. This implies that the distribution of the remainder error exhibits heavier tails as compared to the normal distribution but it is still symmetric. Second, we analyze a skewed error distribution assuming that it follows a log-normal distribution with variance 10, i.e., ===== Table 4 ===== In Table 4, we focus on the size of the LM and LR tests under alternative distributional assumptions of the error term for N = 50, T = 5 and = 0:5. In the …rst pair of columns we give the true parameters 1 , 2 , the second pair of columns summarizes the size of the tests under the assumption that it s t(5), in the third pair of columns we assume that it follows a log-normal distribution with variance 10.
It turns out that both the (non-robust) LM tests and the LR tests are fairly insensitive to the chosen alternative assumptions about the distribution of the disturbances at intermediate levels of 1 and 2 . However, the LM tests tend to be somewhat more undersized than the LR tests, especially for 1 = 2 = 0:8.
With the caveat of the limited experiments we performed, this …nding suggests that the (non-robust) LM tests considered are fairly robust to deviations from the assumption of a normally distributed error term.
Interestingly, with small samples as the ones considered and a relatively small signal-to-noise ratio as assumed here, there is no gain from using robust LM test statistics rather than non-robust ones. In many cells of Table 4, the robust test size is more o¤ the nominal size than this is the case for the non-robust test size. The reason for this result is the following. The correction factors of the LM statistics de ‡ate the non-robust test statistics. Hence, with oversized LM tests, the corresponding correction factors would adjust the test size towards the nominal size (see Yang, 2010, for an example with cross-section data). In our case, there is no systematic over-rejection in the samples considered so that the correction factors lead to even more pronouncedly undersized tests. In broader terms, problems with such correction factors in small samples also accrue to the use of higher moments of the disturbances which can not be estimated without bias in small samples (see Teuscher, Herrendörfer, and Guiard, 1994). 10 Furthermore, we repeated the LM and LR tests for the same model con-…guration as in Table 1 for an alternative model which assumes the vector of explanatory variables, x, to be generated as a spatial moving average of the where x old is the speci…cation of x as de…ned above. Our original conclusions are not sensitive to this alternative speci…cation of x. We also investigated the extent to which the speci…cation of the spatial weighting scheme matters for the size and power of the tests considered. We generated an alternative spatial weighting matrix allowing for a more densely populated grid. In particular, we randomly allocated the observations on the grid so that there are 5 rather than 3 neighbors per observation on average. As expected, the power of the tests is somewhat lower in this case, but still big enough to detect relevant deviations from the null. 11

Conclusions
The recent literature on …rst-order spatially autocorrelated residuals (SAR (1)) with panel data distinguishes between two data generating processes of the error term. One process described in Anselin (1988) and Anselin, Le Gallo and 1 0 With robust LM tests, we estimate the kurtosis from the realized (true) disturbances for every draw. In applications, one would have to rely on the estimated kurtosis which can be biased substantially in small samples. 1 1 All results on the mentioned sensitivity checks are available from the authors upon request.
They are suppressed here for the sake of brevity.

20
Appendix A: Score and Information Matrix Below we make use of the following derivatives to obtain the score and the relevant part of the information matrix: 12

Appendix B: Identi…cation and Consistency
In the sequel, we use subscript 0 to indicate true parameter values where necessary. First, we state the full set of Assumptions.
Assumptions 13 A1 (random e¤ects model): The model comprises unit-speci…c random effects denoted by the (N 1) vector . The elements of are i:i:d: is the vector of remainder errors and its elements are A2 (spatial correlation): (i) Both u 1 and u 2t are spatially correlated with the same (N N ) nonstochastic spatial weighting matrix W whose elements may depend on N .
1 2 Hartley and Rao (1971) and Hemmerle and Hartley (1973) give a general useful formula that helps in obtaining the score of = ( 2 ; 2 ; 1 ; 2 ) 0 : @L 1 u u; r = 1; :::; 4: To derive the relevant part of the information matrix, we use the general di¤erentiation result given in Harville (1977): To avoid index cluttering, we suppress the subscript indicating that the elements of the spatial weights matrix may depend on N and that the dependent variable and the disturbances form triangular arrays. For a similar set of assumptions and a discussion of them see Lee (2004a) and Lee and Yu (2010a and 2010b).

21
The elements of W are non-negative and w ii = 0: (ii) The row and column sums of W are uniformly bounded in absolute value.
(iii) The parameter space for r is a closed interval contained in 1= max < r < 1= max for all N and r = 1; 2, where max is the largest absolute eigenvalue of W. max is assumed to be bounded away from zero by some (v) The inverse We note that Assumptions A1 and A2 imply that = f( ; 1 ; 2 )j( 2 ; 2 ; 1 ; 2 ) 2 g with = 2 = 2 is also compact. In the following, the elements of are denoted by the vector #.

A5 (identi…cation of
under H C 0 ): The non-random matrix X has full column rank K < N and its elements are uniformly bounded by some …nite con- Consistency of the (quasi-)ML estimates under the general model.
In proving the consistency of (quasi-)MLE, we make use of the following Proof. To prove consistency, we have to show that 1 N T (L c (#) Q(#)) converges uniformly to 0 in probability. Note that 1 N T (L c (#) Q(#)) = 1 2 (ln b 2 (#) ln 2 (#)) and that b 2 (#) = 1 The third line follows since X 0 u (#) 1 X 1 and X 0 u (#) 1 u( 0 )u( 0 ) 0 u (#) 1 X are positive de…nite matrices (see Abadir and Magnus, 2005, p. 216 and 329) for all # 2 and the elements of 1 N T X 0 u (#) 1 X 1 are uniformly bounded by some positive constant, say c 1 , uniformly in the parameter space of # by Assumptions A2 (v) and A5 (see also Kapoor, Kelejian and Prucha (2007, p. 118f.). This implies This follows from Assumptions A2 and A5 and the observations made in Kapoor, Kelejian and Prucha (2007, p. 118f.). In particular, u (#) 1 u (# 0 ) u (#) 1 possesses bounded row and column sums, uniformly in N and uniformly in the parameter space of # using Assumption A2 (v), and the elements of X are uniformly bounded by Assumption A5. Then the elements of 1 N T X 0 u (#) 1 u (# 0 ) u (#) 1 X are bounded, uniformly in N and uniformly in the parameter space of #; say by some constant c 2 . Next observe that By Chebyshev's inequality, we conclude that plim N !1 Using the mean value theorem it follows that ln b 2 (#) = ln 2 (#)+ with the 2 (#) lying in between 2 (#). Since b 2 (#) 2 (#)=o p (1) uniformly in , b 2 (#) will be bounded away from zero uniformly in probability if 2 (#) is bounded away from zero. Below we show that lim sup N !1 max #2N"(#0) 1 N T (Q(#) Q(# 0 )) < 0 under the present assumptions so that is the complement of an open neighborhood of # 0 of diameter ": 2 (# 0 ) > 0 by Assumption A1. By Lemmata 1 and 2 det u (#) 1 = det u (# 0 ) 1 > 0, uniformly in N and uniformly in the parameter space of # and we conclude that 2 (#) is bounded away from zero and Secondly, we have to prove the following uniqueness identi…cation condition (see Lee, 2004a). For any " > 0, lim sup N !1 max #2N"(#0) 1 2 1 N T ln(det u (#)= det u (# 0 ))) < 0 by Assumption A4. Accordingly, we conclude that the maximum likelihood estimator b # of # 0 under the general model is unique and consistent, since Q(#) is continuous and the parameter space is compact.
Lastly, the consistency of b ( b #) under H A 0 or H C 0 is established by observing that our assumptions imply those made in Theorem 4, part b, given in Kapoor, Kelejian and Prucha (2007). Hence, we conclude that under Appendix C: LM Test for random e¤ects The following Lemma is useful in proving Theorems 6 and 7 that derive the asymptotic distribution of the LM tests for the random e¤ects model and the KKP model.
Lemma 4 Assume that Assumptions A1, A2 and A6 hold and that 1 = 2 = : where H is a conformable symmetric matrix and 0 1 is a real number. Then,  :

27
It can easily be veri…ed that 3 : For = 1 one obtains 3 and for = 0 3 : The present assumptions imply that V ar Under H A 0 : 1 = 2 = 0, B = A = I N . Using the general formulas for the score and the information matrix given above one can show that the corresponding LM test statistic is given by Theorem 5 (LM A ) Suppose Assumptions A1 -A5 hold and H A 0 : 1 = 2 = 0 is true. Then, g Proof. First, use the residuals of the true model u = y X 0 and de…ne = 1 is also asymptotically normal and, hence, the vector of quadratic forms converges to a bivariate standard normal by the Cramér-Wold device. Consequently, implies that

Appendix D: LM Test for the Anselin Model
Remember that under We diagonalize u = SS 0 so that 1 u = S 0 1 S 1 : In the following the index r stands for restricted estimation so that H 0 : 1 = 0 is true. Following Kelejian Prucha (2010), let = S 1 u and E[ 3 it ] = (3) and E[ 4 it ] = (4) and let k refers to 2 , 2 ; 1 or 2 : In general, one obtains where a 2 k;ii is an element of A k = S 0 1 u;r @ u @ k = r 1 u;r S: Note, since s ( ; r ) is linear in u and s k ( r ) is a quadratic form in u, Cov[s k ( r ); s ( ; r )] = 0 and u is block diagonal. So we need a matrix of correction factors with elements 3), which can be calculated numerically. In particular, (4) = E[ S 1 u 4 ] can be estimated from b S 1 b u using u = SS 0 = P P 0 or S = P 1 2 ; since u is a real symmetric matrix. It follows that where the elements of are iid(0; 1) so that 3)], R = [0; 0; 1; 0], the robust LM-test statistic following White (1982) is given by and asymptotically distributed as 2 1 :

Appendix E: LM Test for the KKP Model
To derive the asymptotic distribution of the LM test for H C 0 , it proves useful to re-parameterize the model so that 1 = 2 + and to test H B 0 : = 0 vs.
Using the general formulas for the score and for the information matrix given above, the LM test statistic can be derived as . Then, 2 and is asymptotically distributed as 2 1 : Proof. We will make use of the following …rst order conditions evaluated 31 under H C 0 : From the …rst order condition (11) Inserting the ML-estimates denoted by a bar in (10) gives the estimated score as to establish that of the LM test.
Observe that 3 : and Lemma 4 under = 0 with Q de…ned as in Lemma 4.

(4)
3 and use (T 1) 2 p 2(T 1) 4 tr[D 2 ]+cw d ! N (0; 1). Since the two quadratic forms are independent it follows that Inserting the quadratic forms in the nominator of Q yields where we de…ne G Cb = u 0 (J T H)u and G Cw = u 0 (E T H)u. Remember the denominator is given by p (T 1)T + (T 1) 2 d b + d w : The test can then be based on Under normality the test statistic is given by Lastly it can be shown that with the higher moments being estimated consistently that LM C ; robust LM C ; robust = o p (1). 16

Appendix F: Numerical optimization
We use the constrained quasi-Newton method involving the constraints 2 > 0, 2 > 0, 1 < 1 < 1 and 1 < 2 < 1 to estimate the parameters of the four models (the unrestricted model and the three restricted ones: random e¤ects, Anselin, and KKP). The quasi-Newton method calculates the gradient of the loglikelihood numerically. We use the optimization routine fmincon available from Matlab which uses the sequential quadratic programming method. This method guarantees super-linear convergence by accumulating second order information regarding the Kuhn-Tucker equations using a quasi-Newton updating procedure.
An estimate of the Hessian of the Lagrangian is updated at each iteration using the BFGS formula. All tests are based on the analytically derived formulas for both the gradient and the information matrix, using the estimated parameters.
(N=50, T=5, σ σ σ σ      For convenience, we reproduce the variance-covariance matrix of the general model given in (3): where A = (I N 1 W) and B = (I N 2 W). Denote the vector of parameters of interest by = ( 2 ; 2 ; 1 ; 2 ) 0 . Below, we can focus on the part of the information matrix corresponding to . 1 part of the information matrix corresponding to can be ignored in computing the LM test statistics, since the information matrix is block-diagonal between and , and the …rst derivative with respect to evaluated at the restricted (quasi-)MLE is zero.
First, we drive the score and the relevant information submatrix of the general model. These results are then used to test the three hypotheses of interest below. Hartley and Rao (1971) and Hemmerle and Hartley (1973) give a general useful formula that helps in obtaining the score: To derive the information submatrix we use the general di¤erentiation result given in Harville (1977): r; s = 1; :::; 4: Here, @L @ r and J rs are evaluated at the (quasi-)MLE estimates.

Appendix B: Identi…cation and Consistency
In the sequel, we use subscript 0 to indicate true parameter values where necessary. First, we state the full set of Assumptions. (iii) The parameter space for r is a closed interval contained in 1= max < r < 1= max for all N and r = 1; 2, where max is the largest absolute eigenvalue of W. max is assumed to be bounded away from zero by some …xed positive constant.
(iv) Let A = I N 1 W and B = I N 2 W. The non-stochastic matrices A, B are non-singular for all r in the parameter space and have bounded row and column sums, uniformly in N . Also, its inverses have bounded 1 To avoid index cluttering, we suppress the subscript indicating that the elements of the spatial weights matrix may depend on N and that the dependent variable and the disturbances form triangular arrays. For a similar set of assumptions and a discussion of them see Lee (2004a) and Lee and Yu (2010). row and column sums, uniformly in N and uniformly in the parameter space of 1 and 2 .

A5 (identi…cation of
under H C 0 ): The non-random matrix X has full column rank K < N and its elements are uniformly bounded by some …nite constant. Further, let Q 0 = E T I N and Q 1 = J T I N and de…ne X ( ) = Hence, in this case a su¢ cient condition for Assumption A2 (v) is A2 (iv). Note Lemma 1 shows that this inverse exists for all 2 ; 2 ; 1 ; 2 in the parameter space.
A6 (positive variance of LM tests): N T 1 2( 2 4 T 2 > b Q for some b Q > 0; 2 1 = T 2 + 2 and 0 1: H and l ii are de…ned in Lemma 4 below. Consistency of the (quasi-)ML estimates under the general model.
In proving the consistency of (quasi-)MLE, we make use of the following lemmas. (ii) The row and column sums of u (#) are uniformly bounded in absolute value by Assumptions A2 and A3. To see this, denote the typical ele- exists by Assumption A2, it remains to be shown that (T (A 0 A) 1 +(B 0 B) 1 ) 5 is invertible. Using the updating formula we have (T ( The inverse will exist if det( 1 T A 0 A+ B 0 B) 6 = 0: Observe that 1 T > 0; A and B have full rank by Assumption A2 (iv), and that A 0 A and B 0 B are positive de…nite. We have det( 1 Abadir and Magnus, 2005, p. 215 and p. 325) and the claim follows.
The proof of consistency of the maximum likelihood estimates is based on the concentrated log-likelihood. Recall that the unconcentrated loglikelihood is given by In the following, we use a hat to indicate the maximum likelihood estimates 6 of parameters. The …rst order conditions for and 2 are given by The concentrated log-likelihood function then reads L c (#) = N T 2 (ln 2 + 1) N T 2 ln b 2 (#) 1 2 ln det u (#): To obtain the non-stochastic counterpart of L c (#), we use The non-stochastic counterpart to the concentrated likelihood is given by Theorem 3 Let Assumptions A1-A5 hold: Then (i) the maximum likelihood estimates of # are unique and consistent. (ii) Assume in addition that H C Proof. To prove consistency, we have to show that 1 N T (L c (#) Q(#)) converges uniformly to 0 in probability. Note that 1 N T (L c (#) Q(#)) = 1 2 (ln b 2 (#) ln 2 (#)) and that b 2 (#) = 1 The third line follows since (X 0 u (#) 1 X) 1 and X 0 u (#) 1 u( 0 )u( 0 ) 0 u (#) 1 X are positive de…nite matrices (see Abadir and Magnus, 2005, p. 216 and 329) for all # 2 and the elements of 1 N T X 0 u (#) 1 X 1 are uniformly bounded by some positive constant, say c 1 , uniformly in the parameter space of # by Assumptions A2 (v) and A5 (see also Kapoor, Kelejian and Prucha (2007, p. 118f.). This implies This follows from Assumptions A2 and A5 and the observations made in Kapoor, Kelejian and Prucha (2007, p. 118f.). In particular, we have that u (#) 1 u (# 0 ) u (#) 1 possesses bounded row and column sums, uniformly in N and uniformly in the parameter space of # using Assumption A2 (v), and the elements of X are uniformly bounded by Assumption A5. Then the elements of 1 N T X 0 u (#) 1 u (# 0 ) u (#) 1 X are bounded, uniformly in N and uniformly in the parameter space of #; say by some constant c 2 . Next observe that By Chebyshev's inequality, we conclude that plim N !1 Using the mean value theorem it follows that ln b 2 (#) = ln 2 (#)+ b 2 (#) 2 (#) with the 2 (#) lying in between 2 (#). Since b 2 (#) 2 (#)=o p (1) uniformly in , b 2 (#) will be bounded away from zero uniformly in probability if 2 (#) is bounded away from zero.
Below we show that lim sup N !1 max #2N"(# 0 ) 1 N T (Q(#) Q(# 0 )) < 0 under the present assumptions so that N T ln(det u (#)= det u (# 0 ))) < 0 by Assumption A4. Accordingly, we conclude that the maximum likelihood estimator b # of # 0 under the general model is unique and consistent, since Q(#) is continuous and the parameter space is compact.
Lastly, the consistency of b ( b #) under H A 0 or H C 0 is established by observing that our assumptions imply those made in Theorem 4, part b, given in Kapoor, Kelejian and Prucha (2006). Hence, we conclude that under H A 0 or 3 : Observe that V ar[Q] is uniformly bounded away from zero by some positive constant under the present assumptions. Also, the assumptions imply that the row and column sums of A, (A 0 A) 1 and H are uniformly bounded and so are those of L. Since the elements of are independently distributed by Assumption A1, the assumptions of the central limit theorem for linear quadratic forms given as Theorem 1 in Kelejian and Prucha (2001, p. 227) are ful…lled and the claim of the lemma follows.
Next, this Appendix derives the LM test for the null hypothesis H A 0 : 1 = 2 = 0, i.e., that there is no spatial correlation in the error term. The joint LM test for the null hypothesis of no spatial correlation in model (1) tests H A 0 : 1 = 2 = 0. The LM statistic is given by where e D = (@L=@ )( e ) is a 4 1 vector of partial derivatives of the loglikelihood function with respect to the elements of , evaluated at the restricted (quasi-)MLE, e . e J = E[ @ 2 L=@ @ 0 ]( e ) is the part of the information matrix corresponding to , also evaluated at the restricted (quasi-)MLE, e .
Under let k refers to 2 , 2 ; 1 or 2 : In general, one obtains @L @ : = s ( ; r ) = X 0 1 u u @L @ k = r : = s k ( r ) = 1 2 tr where a 2 k;ii is an element of A k = S 0 1 u;r @ u @ k = r 1 u;r S: Note, since s ( ; r ) is linear in u and s k ( r ) is a quadratic form in u, Cov[s k ( r ); s ( ; r )] = 0 and u is block diagonal. So we need a matrix of correction factors with elements 1 2 P N T i=1 a k;ii a l;ii ( 3), which can be calculated numerically. In particular, (4) = E[(S 1 u) 4 ] can be estimated from b S 1 b u using u = SS 0 = P P 0 or S = P 1 2 ; since u is a real symmetric matrix. It follows that 1 u = S 0 1 S 1 ; V ar(S 1 u) = S 1 u S 0 1 = S 1 SS 0 S 0 1 = I: Observe that