On the Estimation and Testing of Fixed Effects Panel Data Models with Weak Instruments

This paper studies the asymptotic properties of within groups k-class estimators in a panel data model with weak instruments. Weak instruments are characterized by the coefficients of the instruments in the reduced form equation shrinking to zero at a rate proportional to nT δ ; where n is the dimension of the cross-section and T is the dimension of the time series. Joint limits as ( , ) n T →∞ show that this within group k-class estimator is consistent if 0 δ ≤ ≤ 1⁄2 and inconsistent if 1⁄2 δ ≤ ≤ ∞ .


Introduction
This paper contributes to the literature on weak instrumental variable (IV) for panel data models with fixed effects. The problem of weak instruments have attracted considerable attention in recent years, see Stock, Wright and Yogo (2002) for an excellent survey. Weak instruments are characterized by the coefficients of the instruments in the reduced form equation shrinking to zero at a rate proportional to the square root of the sample size. In case of weak instruments, the usual asymptotic normal approximations of the 2SLS estimator can be quite poor, even if the number of observations is large. Staiger and Stock (1997) use weak-instrument asymptotics to show that the 2SLS estimator is inconsistent (i.e., converges to a random variable) and has a nonstandard limiting distribution. This is a serious problem as inference, test of hypotheses and confidence intervals in the case of weak-instruments becomes unreliable and misleading. Bai and Ng (2010) show that for panel data models in which all regressors are endogenous but share exogenous common factors, valid instruments can be constructed from the endogenous regressors that are themselves invalid instruments in a conventional sense. This requires both dimensions of the panel n and T 1 to be large. More recently, Cai, Fang and Li (2012) argue that there may be benefits to using panel data when the available instruments are cross-sectionally weak. They consider the within-group 2SLS (W2SLS) estimator in a panel context where the degree of weakness of the instruments depends upon the number of cross-sectional observations n only. For large n, and fixed T , they show that the bias of W2SLS is of order 1/T as n → ∞. They argue that leting the degree of weakness of the instruments depend on n only is an "analytical device" and with T fixed, "it is natural to relate the degree of weakness to n only". However, from Staiger and Stock (1997), the degree of weakness of the instruments depends upon the total number of observations nT and how n and T tend to infinity is crucial for the asymptotics of weak instruments in panel data. 1 This paper extends the results presented in Cai, Fang and Li (2012) to the case where the weak instruments are modeled as "Pitman drift" local-to-zero sense, and the degree of weakness of the instruments is allowed to depend upon both n and T , but with different impact. To be specific, we let the degree of weakness of the instruments depend upon √ nT δ where δ ≥ 0. When δ = 0, it reduces to the weak instrument case in Cai, Fang and Li (2012). When δ = 1/2, it reduces to the weak instrument case in Staiger and Stock (1997). The basic argument is that with enough time periods observed, panel data may provide enough information to yield consistent estimation. In fact, it is well known that for cross-sectional data, when the concentration parameter stays constant as the sample size grows, the signal of the model is too weak compared to the corresponding noise. Hence the model is weakly identified, and 2SLS converges to a random variable. However, in the panel data set-up, if the time series dimension is large, the weak signal can be strengthened by the repeating regression across the time series dimension. This argument is similar in spirit to the argument of establishing consistency for the panel spurious regression, see for example Phillips and Moon (1999) and Kao (1999).
Cai, Fang and Li (2012) also considered the case where the degree of weakness of the instruments depends upon n δ , where δ ≥ 0. For a fixed T , when 0 < δ < 1 2 , the correlation between the instruments and endogenous variables converges to zero more slowly than the square root of the sample size, as n → ∞.
This corresponds to the nearly weak instruments case of Hahn and Kuersteiner (2002) and Hahn, Hausman and Kuersteiner (2004). For δ = 1/2, this is the weak instruments case, and for δ > 1 2 , this is the nearly non-identified case because the correlation converges to zero faster than the square root of the sample size, as n → ∞. For cross-section or time-series models, Hahn and Kuersteiner (2002) showed that 2SLS for the nearly weak instruments case is consistent and its limiting distribution is normal. However, for the weak instruments case as well as the nearly non-identified case, 2SLS is inconsistent and its limiting distribution is not normal. Cai, Fang and Li (2012) similarly showed that for panel data models with fixed T , the bias 1 How n and T tend to infinity was emphasized by Phillips and Moon (1999) for panel unit root testing.
2 of W2SLS estimator with weak or nearly non-identified instruments is of order 1/T as n → ∞. They argue that as T → ∞, W2SLS is consistent and asymptotically normal. They also consider a mixed case where some instrumental variables are weak and others are nearly weak and show that as n → ∞, with T fixed, the W2SLS estimator of the weak instruments is biased of order 1/T , while the W2SLS estimator of the nearly weak instruments is consistent. We generalize the Cai, Fang and Li (2012) panel data results by studying the asymptotic properties of the general within-group k-class estimator, which includes W2SLS and within-group LIML as special cases. We allow the degree of weakness of the instruments to depend upon √ nT δ where δ ≥ 0. We study the asymptotics using joint limits in n and T , rather than fixing T and letting n → ∞. We show that for the simple case of one right hand side endogenous variable and no included exogenous variables, W2SLS is consistent if 0 ≤ δ < 1 2 and inconsistent if 1 2 ≤ δ < ∞. Next, we generalize these results to the within group k-class estimator with included exogenous regressors applied to fixed effects panel data. We show using joint limits that this within group k-class estimator is consistent if 0 ≤ δ < 1 2 and inconsistent if 1 2 ≤ δ < ∞. We characterize these conditions for three special cases of the within group k-class estimator including W2SLS, within group LIML, and within group bias-adjusted 2SLS. We also generalize the test for weak instruments proposed by Cragg and Donald (1993) and Stock and Yogo (2005) to the case of fixed effects panel data as well as test of hypothesis that is robust to weak instruments in the fixed effects panel data set-up. We study the asymptotic properties of these tests as both The rest of the paper is organized as follows. Section 2 introduces the fixed effects panel data model with weak instruments. Section 3 discusses the within group k-class estimator. Section 4 generalizes the test for weak instruments proposed by Cragg and Donald (1993) and Stock and Yogo (2005) to the case of fixed effects panel data. Section 5 considers the problem of hypothesis testing whose size is robust to weak instruments in the fixed effects panel data set-up. Section 6 provides Monte Carlo results, while Section 7 concludes. All the proofs are relegated to the appendix. All the limits in the paper are taken as (n, T ) → ∞ jointly, except when otherwise noted.

Model and Assumptions
Consider the following panel IV regression model with endogenous regressors and for t = 1, 2, · · · , T , where y t is a n × 1 vector and Y t is a n × L matrix of endogenous variables, X t is a n × K 1 matrix of K 1 exogenous regressors, Z t is a n × K 2 matrix of K 2 instruments, and β, γ, Π, and Γ are unknown parameters. µ and α denote the individual effects which are of dimensions n × 1 and n × L respectively.
The remainder disturbances (u t , V ′ t ) ′ are of dimensions n × 1 and n × L respectively. These disturbances are assumed to be i.i.d. N (0, Σ) across t = 1, 2, · · · , T , with the elements of Σ denoted by σ uu , = 0 for all i and t. This i.i.d. assumption for the errors can be relaxed to allow for weak dependence across the time series and cross-section dimensions at the expense of more complicated notation. This will be taken up in a future extension of this paper. Equation (1) is the structural equation and β is the parameter of interest. The reduced-form equation (2) relates the endogenous regressors to the instruments. In matrix form, equations (1) and (2) can be rewritten as and where y = (y ′ 1 , y ′ 2 , · · · , y ′ T ) ′ is a nT × 1 vector, ι T is a vector of ones of dimension T , and Y , X, Z, u, and V are similarly defined.
To wipe out the individual effects, we premultiply equations (3) and (4)  Following Staiger and Stock (1997), we assume: The following joint limits hold, as (n, T ) → ∞

1.
( 3. ( Notice that Assumption 2 implies that and Staiger and Stock (1997), we define where and The random variable with and I L is an identity matrix of dimension L.

Estimation
Most of the Theorems in this section are developed for the within-group k-class estimator. However, we start by deriving the asymptotic properties of W2SLS for the simple case of one right hand side endogenous regressor and no included exogenous regressors. estimator is defined as and The results in Theorem 1 imply that β W 2SLS is consistent only if 0 ≤ δ < 1 2 and inconsistent if The strength of the instruments is measured by the following concentration matrix Λ T n = Σ (1) and (2), we have Note that T 1−2δ can be interpreted as the rate at which Λ T n grows as T increases. Clearly, for the consistency of W2SLS, one needs Λ T n → ∞ as T 1−2δ → ∞ which holds if 0 ≤ δ < 1 2 . We also note from Theorem 1 that the limiting distribution near the point of non-identification, i.e., δ = 1 2 , is discontinuous. 3

Within-group k-class Panel Data Estimators
We now generalize the results to the within-group k-class estimator with included regressors. 4 Let PX = X (X ′X ) −1X ′ be the projection matrix on the space spanned by the columns ofX and MX = I − PX .
Premultiplying equations (3) and (4) by MX , we get where the superscript "⊥" denotes the residuals from the projection onX, such asỹ ⊥ = MXỹ,Z ⊥ = MXZ, andỸ ⊥ = MXỸ . The within-group k-class estimator of β is given by for some choice of k. Note that the W2SLS estimator is a special case of the within-group k-class estimator when k = 1. Theorem 2 derives the asymptotic properties of this within-group k-class panel data estimator.
Theorem 2 Under Assumptions 1 and 2. As (n, T ) → ∞ we have Similar to the results of Theorem 1 for β W 2SLS , Theorem 2 shows that β (k) is consistent if 0 ≤ δ < 1 2 and inconsistent if 1 2 ≤ δ < ∞. Similarly, using Assumptions (1), (2) and Lemma 2, the strength of the instruments is measured by the following concentration matrix: Note that T 1−2δ can be interpreted as the rate at which Λ T n grows as T increases. Clearly, for consistency of the within-group k-class estimator, one needs Λ T n → ∞ as For the W2SLS estimator with k = 1,it follows that T 1/2+δ n (k − 1) = 0 and T n (k − 1) = 0. Therefore, the W2SLS estimator satisfies the conditions of κ T n for the three cases considered in Theorem 2.
The within-group k-class estimator also includes the within-group bias-adjusted 2SLS (B2SLS) described in Donald and Newey (2001) for the cross-section or time-series regression case. This is a special case of the k-class estimator with k = nT / (nT − K 2 + 2) . Rothenberg (1984) showed that B2SLS is unbiased to the second order for the fixed-instrument, normal error model. For this special case, (1). Hence, the within-group B2SLS estimator satisfies the conditions of κ T n for the three cases considered in Theorem 2.
For the within-group LIML estimator in panel data, we obtain the following results: (10), we have

Wald Test Under Weak Identification
Next, we consider testing the q linear restrictions Rβ = r, where R is q × L. The standard formula for the Wald statistic, based on the within-group k-class estimator, is given by

Theorem 4 Under Assumptions 1 and 2. As
a noncentral chi-squared distribution with q degrees of freedom and noncentrality parameter Λ = distribution with q degrees of freedom.

Testing for Weak Instruments
Following Stock and Yogo (2005), we focus in this section on testing the null hypothesis that the set of instruments is weak against the alternative that they are strong. In this case, the instruments are defined to be strong if W2SLS inference is reliable for any linear combination of the coefficients. From the results in Theorems 2 and 4, weak instruments can produce biased IV estimators and test of hypotheses with large size distortions, e.g., when 1 2 ≤ δ < ∞. The Stock and Yogo (2005) test is based on the partial identification test statistic proposed by Cragg and Donald (1993). For our case, this statistic is g min , the smallest eigenvalue of the matrix analog of the F statistic from the first stage regression of W2SLS, i.e., . A small g min indicates that the instruments are weak, see Stock and Yogo (2005). Let W (K, Ω, Υ) denote the Wishart distribution with K denoting the degrees of freedom, Ω denoting the covariance matrix, and Υ denoting the noncentrality matrix, we have the following result: Theorem 5 Under Assumptions 1 and 2, we have Therefore, as pointed out in Stock, Wright and Yogo (2002), tr (G T n ) /L can be thought of as a measure of the strength of the Next we discuss how to use g min to detect the presence of weak instruments.
which has a noncentral Wishart distribution with noncentrality matrix λ ′ λ. This noncentrality matrix is the limit of the concentration matrix On the other hand, when 0 Let δ min be the smallest eigenvalue of λ ′ λ. Following Stock and Yogo (2005), we propose using the conservative critical value x which satisfies the relationship 5 where χ 2 (ν, δ min ) denotes the noncentral chi-squared random variable with ν degrees of freedom and noncentrality parameter δ min . Stock and Yogo (2005) focus on the worst-behaved linear combination and it is in this sense that this test is conservative. We refer the reader to their tables for critical values.

Robust Inference with Weak Instruments
The above results indicate that for δ ≥ 1 2 , the within-group k-class estimator is inconsistent. In this section, we discuss hypothesis testing whose size is robust to the weak instruments in the panel data set-up. Following the survey by Stock, Wright and Yogo (2002), we will discuss the AR test of Anderson and Rubin (1949), the Lagrange multiplier (LM) test of Kleibergen (2002) and Moreira (2009), and the conditional likelihood ratio (CLR) test of Moreira (2003) but applied to the fixed effects panel data model. For simplicity, we only consider the case of one right hand side endogenous variable, i.e., L = 1. 6 For convenience, we assume thatX t andZ t are non-stochastic such thatZ ′ tXt = 0. The reduced form equations corresponding to the structural equations (5) and (6) are as follows: The reduced-form errors are assumed to be homoskedastic with covariance matrix The concentration parameter can be rewritten as Σ

Consider the null hypothesis
Three test statistics that are functions of Θ are the LM statistics of Kleibergen (2002) .
see Andrews and Stock (2006a, b). Define Note thatỸ * can be written asỸ where a = (β, 1) That is,Ỹ is multivariate normal with mean matrixZΠa +Xη. Then S is K 2 × 1 multivariate normal with mean Similarly, T is K 2 × 1 multivariate normal with mean It is also easy to show that S and T are independent using b ′ 0 a 0 = 0. Under the null, which does not depend on Π since c β = 0. However, the distribution of T depends on Π under the null.
AssumeZ ′Z T n → D Z . The asymptotic distributions of S and T are given in the following theorem: Theorem 6 Suppose Assumptions 1 and 2, hold. We have and where S and T are independent with Hence, under the null, and for all values of δ. Note that true value of β, γ, and Γ is set as 0, 1, and 1 respectively. We set C = 0.5. Individual fixed effects µ and α are generated from independent standard normal distributions. To summarize, the data generating process (DGP) is given by and For each replication we estimate the model using W2SLS and LIML estimators of β. Table 1  but, unlike these measures, are assured to exist. We can see that LIML has a smaller bias than the W2SLS estimator, however, W2SLS has a smaller IQR and RMSE than the LIML estimator. Figure 1 shows the density function of W2SLS estimator for n = 100. As we can see in the graph, when δ = 0 or 0.2, the distribution tends to center at zero as T increases. when δ = 0.5 or 0.8, the distribution does not change much as T increases. Table 2 reports the size of the t-test for β = 0. Results from table 2 confirm that the t-tests using the W2SLS and LIML estimators are not robust with respect to weak instruments. Table 3 reports the results of the AR, LM and CLR tests. Table 3 indicates that the robust tests are indeed, robust to the weak instruments in this panel data design.
Proof. First we note that for 0 ≤ δ ≤ 1 This proves the lemma.

A Proof of Theorem 1
Proof. Consider and for 1 2 < δ < ∞, This proves the theorem.
To prove Theorem 2, we need the following Lemma.

C Proof of Theorem 3
Proof. Let us denote J, partitioned conformably withỸ * ⊥ , to be J 11 = I T , . By Lemma 2.1, we have andΣ is defined in Equation (10 . By Lemma 2.2 and 2.3, we have therefore converge to those of Ξ 1 − κΣ = 0, among them the smallest root is zero.
For 1 2 < δ < ∞, by Lemma 2.3, which is a Wishart distribution with K 2 degrees of freedom.