A Robust Hausman-Taylor Estimator

This paper suggests a robust Hausman and Taylor (1981) estimator, here-after HT, that deals with the possible presence of outliers. This entails two modifications of the classical HT estimator. The first modification uses the Bramati and Croux (2007) robust Within MS estimator instead of the Within estimator in the first stage of the HT estimator. The second modification uses the robust Wagenvoort and Waldmann (2002) two stage generalized MS estimator instead of the 2SLS estimator in the second step of the HT estimator. Monte Carlo simulations show that, in the presence of vertical outliers or bad leverage points, the robust HT estimator yields large gains in MSE as compared to its classical Hausman-Taylor counterpart. We illustrate this robust version of the Hausman-Taylor estimator using an empirical application.


Introduction
It is well known in the statistical literature that the presence of outlying observations can strongly distort the classical least squares estimator and lead to unreliable inference. Three types of outliers that in ‡uence the least squares estimator are vertical outliers, good leverage points and bad leverage points (see Rousseeuw and Leroy (2003)). Vertical outliers are observations that have outlying values for the corresponding error term (the y-dimension) but are not outlying in the design space (the X-dimension). They contaminate the estimation of the intercept but only mildly in ‡uence that of the regression coe¢ cients. Good leverage points are observations that are outlying in the design space but are located close to the regression line. They marginally in ‡uence the estimation of both the intercept and the regression coe¢ cients but they a¤ect inference. In contrast, bad leverage points are observations located far away from the regression line. They contaminate the least squares estimation for both the intercept and the slopes (see Dehon, Gassner and Verardi (2009)).
The focus of this paper is on panel data regression methods based on estimators such as …xed e¤ects or random e¤ects least squares that control for heterogeneity of the individuals, but are sensitive to data contamination and outliers like any least squares procedure (see Ronchetti and Trojani (2001)). This sensitivity can be characterized by measures of robustness such as the breakdown point, which evaluates the smallest contaminated fraction of a sample that can arbitrarily change the estimates (see Huber (1981), Donoho andHuber (1983), andHuber andRonchetti (2009) to mention a few). 1 Since the breakdown point of linear estimators such as least squares is asymptotically zero, the statistical literature has stressed the importance of robust and positive breakdown-point methods (e.g., Wagenvoort and Waldmann (2002), Maronna et al. (2006), and µ Cíµ zek (2008) to mention a few). Panel data also su¤er from data contamination and outliers. Besides paying attention to vertical outliers or bad leverage points, one has to pay attention to block-concentrated outliers (block-concentrated bad leverage points). In the latter case, most of the vertical outliers (bad leverage points) are concentrated on few individuals, but for most of the time period we observe these individuals. There are only few studies dealing with these problems using panel data. Wagenvoort and Waldmann (2002) and Lucas et al. (2007) 1 See appendix 1.
2 studied the bounded-in ‡uence estimation of static and dynamic panel data models, respectively. Wagenvoort and Waldmann (2002) developed two estimation procedures: the two stage generalized M (2SGM) estimator and the robust generalized method of moments (RGMM) estimator. Both estimators are B-robust, i.e. their associated in ‡uence function is bounded, consistent and asymptotic normally distributed. For dynamic panel data models, Lucas et al. (2007) proposed a variant of the GMM estimator which is less sensitive to anomalous observations. Positive breakdown-point methods for static and dynamic panel models were proposed by Bramati and Croux (2007), Dhaene and Zhu (2009) and Aquaro and µ Cíµ zek (2010). The Within MS (WMS) estimator proposed by Bramati and Croux (2007) is the robust counterpart of the least squares dummy variables representation of the Within group estimator 2 . Using Monte Carlo simulations, they observe that, without outliers, the e¢ ciency of the robust estimator is very close to that of the Within group estimator. However, the Within estimator performs badly when there are vertical outliers and even worse in the presence of bad leverage points. In contrast, the WMS estimator performs well and yields stable results over di¤erent sampling schemes. The Bramati and Croux (2007) WMS estimator yields large gains in MSE with respect to the classical Within estimator in the presence of outliers, and leads to very small e¢ ciency loss in the absence of outliers.
This paper proposes a robust version of the Hausman and Taylor (1981) estimator, hereafter HT. Brie ‡y, the HT panel data estimator deals with the common empirical fact that some of our explanatory variables are time varying, while others are time invariant. In addition, some are correlated with the individual e¤ects and some are not. HT proposed a two-step instrumental variable procedure that is (i) more e¢ cient than the within estimator and (ii) recaptures the e¤ects of time invariant variables which are wiped out by the within transformation. The structure of the paper is as follows: in section 2, we present the HT estimator and in section 3, we brie ‡y review the M, MS and GM robust estimators. Section 4 proposes a robust HT estimator that deals with the possible presence of outliers. This entails two modi…cations of the classical HT estimator. The …rst modi…cation uses the Bramati and Croux (2007) robust WMS estimator instead of the Within estimator in the …rst stage of the HT estimator 3 .The second modi…cation uses the robust Wagenvoort and Waldmann (2002) two stage generalized MS-estimate instead of the 2SLS estimate in the second step of the HT estimator. In section 5, we run Monte Carlo simulations to study the e¤ects of vertical outliers, bad leverage points, block-concentrated outliers or block-concentrated bad leverage points on the classical and robust HT estimators. We show that the robust HT yields large gains in MSE as compared to its classical Hausman-Taylor counterpart. In section 6, we apply our robust Hausman-Taylor estimator to the Cornwell and Rupert (1988) estimation of a Mincer wage equation. Finally, section 7 concludes.
2 The Hausman-Taylor estimator Hausman and Taylor (1981), hereafter HT, considered the following model where some of the explanatory variables are time varying (X it ); while others are time invariant (Z i ): i is IID(0, 2 ), it is IID(0, 2 ) independent of each other and among themselves. HT allowed some of the X and Z variables to be correlated with the individual e¤ects i . This is in contrast to the …xed e¤ects estimator where all the regressors are correlated with the individual e¤ects, and the random e¤ects estimator where none of the regressors are correlated with the individual e¤ects. Using the HT notation: X = [X 1 ; X 2 ] and Z = [Z 1 ; Z 2 ] where X 1 is (N T k 1 ) ; X 2 is (N T k 2 ) ; Z 1 is (N T g 1 ) and Z 2 is (N T g 2 ). X 1 and Z 1 are assumed exogenous in that they are not correlated with i and it , while X 2 and Z 2 are endogenous because they are correlated with i , but not it : HT proposed the following two-step consistent estimator 4 of and : 3 Aquaro and µ Cíµ zek (2010) use a …rst di¤erence rather than a Within transformation. Their simulations reveal superior performance over the median di¤erence estimator. However, di¤erencing eliminates the …rst wave, and in micro-panels that is a loss of N observations. Di¤erencing is usually not employed in panel data unless the model is dynamic. In keeping with the spirit of the Hausman-Taylor approach that uses the Within estimator in the …rst stage, and in order not to waist N observations, we use a robust Within approach rather than an approach based on …rst di¤erences or pairwise-di¤erences. 4 See Cornwell andRupert (1988), Egger andPfa¤ermayr (2004) and Serlenga and Shin (2007), to mention a few applications of the HT estimator.
1. Perform the …xed e¤ects (FE) or Within estimator obtained by regressing e y it = (y it y i: ), where y i: = P T t=1 y it =T , on a similar within transformation on the regressors. Note that the Within transformation wipes out the Z i variables since they are time invariant, and we only obtain an estimate of which we denote by e W : Then, HT average the within residuals over time To get an estimate of ; HT suggest running 2SLS of b It is clear that the order condition has to hold (k 1 > g 2 ) for (Z 0 P A Z) to be nonsingular. In fact, if k 1 = g 2 ; then the model is just-identi…ed and one stops here.
2. If k 1 > g 2 ; HT suggest estimating the variance-components as follows: and where 2 1 = T 2 + 2 : P = I N J T and J T = J T =T , with I N being a matrix of dimension N , and J T is a matrix of ones of dimension T: P is a matrix which averages the observation across time for each individual. Q = I N T P: Once the variance-components estimates are obtained, the model is transformed using b 1=2 where see Baltagi (2008). Note that y = b b 1=2 y has a typical element y it = y it b y i: where b = 1 (b =b 1 ) and X it and Z i are de…ned similarly. In fact, the transformed regression becomes: where u it = i + it . The asymptotically e¢ cient HT estimator is obtained by running 2SLS on this transformed model using A HT = [ e X; X 1 ; Z 1 ] as the set of instruments. In this case, e X denotes the within transformed X and X 1 denotes the time average of X 1 . More formally, the HT estimator under over-identi…cation is given by: where P A HT is the projection matrix on A HT = [ e X; X 1 ; Z 1 ]; see also Breusch, Mizon and Schmidt (1989).

A brief review of M, MS and GM robust estimators
To robustify the HT estimator for the possible presence of outliers, we used two MS estimators: the one proposed by Bramati and Croux (2007) for the …rst step of the HT estimator and a two stage generalized MS estimator inspired from Wagenvoort and Waldmann (2002) for the second step of the HT estimator. In this section, we brie ‡y review M and S and MS estimators from the robust statistics literature. Huber (1964) generalized the median regression to a wider class of estimators, called M-estimators, by considering other functions besides the absolute value of the residuals. This increases Gaussian e¢ ciency while keeping robustness with respect to vertical outliers. Consider the panel data …xed e¤ects regression disturbances: To guarantee scale equivariance (i.e. independence with respect to the measurement units of the dependent variable), residuals are standardized by a measure of dispersion . This can be implemented as an iterative weighted least-squares. The M-estimator of based on the function (:) is the vector b M of size (K 1) which is the solution of the following system: (u) = 0 (u) is called the in ‡uence function. If we de…ne a weight function W r (u) = (u) =u, then b M is a weighted least-squares estimator: and the …rst order condition which de…nes the class of M-estimators is given by: The loss function (:) is a symmetric, positive-de…nite function with a unique minimum at zero. There are several constraints that a robust M-estimator should meet: a) the …rst is of course to have a bounded in ‡uence function (r it = ); b) The robust estimator should be unique. This requires that the function (:) is convex in . The literature on robust statistics proposed several speci…cations for the -function. The choice of the loss function (:) is crucial to having good robustness properties and high Gaussian e¢ ciency. The Tukey biweight function is a common choice 5 : The associated in ‡uence function and weight function are de…ned as: In this case, the high breakdown point M-estimator is de…ned as: where y is the (N T 1) vector denoting the dependent variable, and X is the (N T K) matrix of the explanatory variables. W r is an (N T N T ) matrix with diagonal elements given by: For the tuning constant c = 2:937 (or c = 1:547), the corresponding Mestimator resists contamination up to 25% (or up to 50%) of outliers. In other words, it is said to have a breakdown point of 25% (or 50%). Unfortunately, this M-estimator su¤ers from some de…ciencies. If it is able to identify isolated outliers, it is inappropriate in case of the existence of clusters of outliers, i.e., where one outlier can mask the presence of another. Hence, it is not guaranteed to identify all leverage points. Furthermore, the initial values for the iterative reweighted least squares algorithm are monotone M-estimators that are not robust to bad leverage points and may cause the algorithm to converge to a local instead of a global minimum (see Croux and Verardi (2008)). 6 Rousseeuw and Yohai (1987) proposed minimizing a measure of dispersion of the residuals that is less sensitive to extreme values. They call this class of estimators the S-estimators. In order to increase robustness, they suggest …nding the smallest robust scale of the residuals. This robust dispersion, that will be called b S , satis…es where b = E [ (Q)] with Q N (0; 1). The value of that minimizes b S is then called an S-estimator de…ned as: b S M = arg min b S (r 11 ( ) ; :::; r N T ( )) with the corresponding b S being the robust estimator of scale. Rousseeuw and Yohai (1987) computed the asymptotic e¢ ciency of the Sestimator of a Gaussian model for di¤erent values of the breakdown point (see appendix 2). Unfortunately, this S-estimator has a Gaussian e¢ ciency of only 28:7%. If the tuning constant (c) of the Tukey biweight loss function (:) is high, for instance c = 5:182, the Gaussian e¢ ciency climbes to 96:6% but the breakdown point drops to 10%. 7 6 M-estimators are called monotone if the loss function is convex over the entire domain and are called redescending if the in ‡uence function is bounded. Redescending M-estimators have high breakdown points (close to 0.5) and their in ‡uence function can be chosen to redescend smoothly to 0 as for the Tukey biweight function. 7 Monotone M-estimators are robust to outliers in the response variable, but are not resistant to outliers in the explanatory variables (leverage points). In contrast, redescend-To cope with this, Yohai (1987) introduced M-estimators that combine highbreakdown point and high e¢ ciency. These estimators are redescending Mestimators, but where the scale is …xed at b S . The preliminary S-estimator guarantees a high breakdown point and the …nal M-estimate allows a high Gaussian e¢ ciency. Following the proposition of Rousseeuw and Yohai (1987), the tuning constant can be set to c = 1:547 for the S-estimator to guarantee a 50% breakdown point, and it can be set to c = 5:182 for the second step M-estimator to guarantee 96% e¢ ciency of the …nal estimator. Generally, the S and M-estimator use the algorithm of Salibian-Barrera and Yohai (2006) (see also Maronna and Yohai (2006)). The algorithm starts by randomly picking p subsets of K observations where K is the number of regression parameters to estimate. For each of the p-subsets, residuals are computed and a scale estimate b S is obtained. An approximation for the …nal scale estimate b S is then given by the value that leads to the smallest scale over all p-subsets 8 . Maronna and Yohai (2006) introduce the MS-estimator ing M-estimators are resistant to bad leverage points but are di¢ cult to implement from a computational point of view. S-estimation, which …nds an hyperplane that minimizes a robust estimate of the scale of the residuals, is highly resistant to leverage points, and is robust to outliers in the response. However, this method can be ine¢ cient. MM-estimation (not used here) tries to capture both the robustness and resistance of S-estimation, while at the same time gaining the e¢ ciency of M-estimation. The method proceeds in 3 steps: a) with a …rst loss function, we get an initial M-estimator, b) we obtain an M-estimate of the scale of the residuals, c) the estimated scale is then held constant whilst an Mestimate of the parameters is located with a new loss function. MM-estimators are robust and e¢ cient. 8 From equation (18), the algorithm calculates the hyperplane of K observations that …ts all points perfectly if all K points are regular observations and do not contain outliers. For each subset, the residuals are de…ned as the vertical distance separating each observation from the hyperplane. Using these residuals, a scale estimate b S is obtained as in (17) for each p-subset. Salibian-Barrera and Yohai (2006) proposed the following number of generated sub-samples N sub : where is the maximal expected proportion of outliers. P is the desired probability of having at least one p-subset without outliers among the N sub subsamples and dxe is the ceiling operator of x, i.e., the smallest integer not less than x. The number of sub-samples is chosen to guarantee that at least one p-subset without outliers is selected with high probability (see Salibian-Barrera and Yohai (2006), Maronna and Yohai (2006), Croux and Verardi (2008)). In our Monte-Carlo study, we use N sub = 500: As Croux and Verardi that alternates an S-estimator and an M-estimator, until convergence. This estimator has been adapted for the …xed e¤ects panel data case by Bramati and Croux (2007). They call this estimator the WMS (Within MS) estimator. This will be our estimator in place of the classical within estimator in the …rst step of our robust Hausman-Taylor estimator. Hinloopen and Wagenvoort (1997) proposed further protection against observations with a high leverage. They suggest using location weights indirectly proportional to the values of covariates: where 2 K;0:975 is the upper 97:5% quantile of a chi-squared distribution with K degrees of freedom, is a robust version of the Malahanobis distance (or Rao's distance) and b x and V x are the robust estimates of the location and variance matrix 9 of X it . Wagenvoort and Waldmann (2002) proposed the use of this class of generalized M-estimators (GM hereafter) 10 . The …rst order condition which de…nes this class of GM estimators is: In this case, the high breakdown point generalized M-estimator is de…ned as: where W x is the (N T N T ) matrix with diagonal elements given by W x (X it ).

The robust Hausman-Taylor estimator
To robustify the HT estimator for the possible presence of outliers, two MS estimators are successively used: the one proposed by Bramati and Croux (2007) for the …rst step of the HT estimator and a two stage generalized MS estimator inspired from Wagenvoort and Waldmann (2002) for the second step of the HT estimator.

The WMS estimator
The Within MS (WMS) e W M S estimator proposed by Bramati and Croux (2007) is then de…ned as: e W M S = arg min b S (r 11 ( ) ; ::::; r N T ( )) with Given an initial estimate 0 , they use an iterative algorithm to get closer to the minimum of eq.(23). This algorithm is based upon the generation of random subsamples suggested by Maronna and Yohai (2006) to compute the robust scale estimate of the residuals b S . They suggest iterating a …xed number of times (max m = 20), and to choose the e (m) W M S which produces the minimum value of the objective function in (23). Unfortunately, for the HT model, the WMS estimator, like the within estimator, gives us only an estimate of and not . Once again, the Z i variables drop out as they are time invariant. The variance-covariance matrix of the WMS estimate e W M S is given by: where D 1 and D 2 are diagonal matrices with diagonal elements given by:

The two stage generalized MS estimator
Instead of averaging the within residuals over time as HT suggest (see eq. (2)), we take the median of the resulting residuals over time: and instead of the 2SLS procedure suggested by HT, we propose a two stage generalized MS-estimate (2SGMS) following Wagenvoort and Waldmann (2001). More speci…cally: 1. (a) Stage 1 : suppose that there are m 1 instrumental variables A it which are correlated with the explanatory factors Z i but independent of the error term " it (= i + it ) : The explanatory variable Z k (the k th column of Z) is regressed on the instrumental variables A = [X 1 ; Z 1; ]: Z it;k = A it k + it;k . The high breakpoint generalized M-estimate (GM) and the prediction of the k th column of Z is computed according to 11 : where W A (A) and W r (r 1;k ) are the diagonal matrices comprising the weight functions W A (A it ) and W r (r 1it;k ) : r 1;k are the …rst stage GM residuals associated with Z k (r 1;k = Z k Ab k ) and 11 In our speci…c case W r (r 1;k ) di¤ers for every distinct column of Z. Thus (g 1 + g 2 ) separate GM regressions are performed if dim(Z) = g 1 + g 2 . Contrary to Wagenvoort and Waldmann (2001), we suggest using the residuals (r 1;k ) to estimate a new robust scale estimator of the residuals b S , which is then used to re-estimate a new weight function W r (r 1it;k ), and so on. Following the suggestion of Maronna and Yohai (2006), we compute this iterated MS procedure using a maximum of 20 iterations.
(b) Stage 2 : replacing the explanatory variables of the original equation by their robust projection on A: This returns the high breakpoint generalized MS-estimator, called the 2SGMS estimator: where W Z b Z and W r (r 2 ) are diagonal matrices containing the second step GMS weights and r 2 are the second stage GMS residuals r 2 = y b Ze 2SGM S :

The second step: a two stage generalized MS estimator
The variance-components estimates are obtained as follows: and (31) where 2 1 = T 2 + 2 : Once the variance-components estimates are obtained, we compute where e = 1 e =e 1 and X it and Z i are de…ned similarly. The 2SGMS procedure applied to this transformed model can be described as follows: The k th explanatory variable V k is regressed on the instrumental variables: This returns the GM estimate, and the pre- W A HT (A) and W r (r 1;k ) are the diagonal matrices comprising the weight functions W A HT (A HT;it ) and W r (r 1it;k ) : r 1;k are the …rst With these residuals (r 1;k ), we estimate a new robust scale estimator of the residuals b S which is used to re-estimate a new weight function W r (r 1it;k ), and so on. Following the suggestion of Maronna and Yohai (2006), we compute this iterated MS procedure up to a maximum of 20 iterations.
(b) Stage 2 : replacing the explanatory variables of the original equation by their robust projection on A HT and applying the GM technique one more time provides the 2SGMS estimates: V and W r (r 2 ) are diagonal matrices containing the second step GMS weights and r 2 are the second stage GMS residuals Following Wagenvoort and Waldmann (2001), the variance-covariance matrix of the 2SGMS estimate e 2SGM S is given by: Km 2 ), (Km 2 N T ) matrices de…ned as follows: where C ij is a (Km 2 1) vector given by For ease of comparison, the next table gives the steps for the Hausman-Taylor and the corresponding robust Hausman-Taylor estimator. > > > > > > < > > > > > > : Step 2 Robust Hausman -Taylor Step 1 Step 2 We …rst simulate a Hausman-Taylor world (see below) and in the next step, contamination is carried out on the y's only (vertical outliers) and then on both the y and the X variables (leverage) and last on y, X and the Z variables (leverage).
The X j;i;t variables are generated by: . It is clear that X 2 is correlated with i by construction. The cross-sectional time-invariant (0; 1) dummy variable Z 12;i has been generated randomly such that its mean is 0:2.
The Hausman-Taylor world is de…ned with Z 2 correlated with i , X 11 , X 12 and X 2 : where i is uniform on [ 2; 2] : So, the Z 2;i variable is correlated with X 11;i;t (by the term i ) with X 12;i;t (by the term i ) and with X 2;i;t (by the term i ).

Contamination
Once the observations are generated for our model in (44), contamination is carried out as follows: the y's only (vertical outliers).
both y and the time-varying explanatory variables (X) by introducing bad leverage points.
y; X and Z 12 by introducing bad leverage points.
y; X, Z 12 and Z 2 by introducing bad leverage points.
Contamination is generated in two di¤erent ways: either completely randomly over all observations (random contamination), or concentrating the contamination in a number of blocks such that half of the observations in the a¤ected time-series are contaminated (concentrated contamination). In other words, few individuals in the sample have 50% of their data corrupted while the other individuals have clean observations 12 .
Outliers generated by random contamination are either vertical outliers or leverage points, whereas in the case of concentrated contamination, they are either block-concentrated vertical outliers or block-concentrated leverage points. Vertical outliers are obtained by adding a term N 5y; 2 y =40 to the y's originally generated. Bad leverage points are obtained by replacing X-values (and the Z 12 and Z 2 ) corresponding to the observations already contaminated in the y-direction, by points coming from a K-variate normal distribution N (e K ; 0:5I K ), where e K is a K 1 vector of ones and I K is a K K identity matrix. We use the Tukey biweight functions W r (:) for the WMS estimator (eq. 23), for the …rst stage (eq. 28) and the second stage (eq. 29) of step 1 of the 2SGMS, and for the …rst stage (eq. 33) and the second stage (eq. 34) of step 2 of the 2SGMS. 13 For all these functions, we need to de…ne the breakdown points and the associated tuning constants. We used the same breakdown point of 25% with a tuning constant c = 2:937 yielding an asymptotic e¢ ciency of 76%: 14 The percentages of contamination considered are 5% and 10%. We report results for the case of no outliers as well as 8 di¤erent cases of contamination: case 1 vertical outliers (y) case 2 leverage points (y, X 1 , X 2 ) case 3 concentrated vertical outliers (y) case 4 concentrated leverage points (y, X 1 , X 2 ) case 5 leverage points with Z 12 (y, X 1 , X 2 , Z 12 ) case 6 concentrated leverage points with Z 12 (y, X 1 , X 2 , Z 12 ) case 7 leverage points with Z 12 and Z 2 (y, X 1 , X 2 , Z 12 , Z 2 ) case 8 concentrated leverage points with Z 12 and Z 2 (y, X 1 , X 2 , Z 12 , Z 2 ) Table 1 reports the MSE of the coe¢ cients for the Hausman-Taylor (HT) estimator and its robust counterpart (robust HT) based on 1000 replications 15 .

The results
The results in Table 1 pertain to N = 100, T = 5; with no outliers as well as 8 di¤erent cases of contamination, where the level of contamination is 5% or 10%. When no outliers are present, the robust HT shows loss in MSE relative to the standard HT estimator. The absolute magnitudes are small (except for 12 , the coe¢ cient of Z 12 ); but the relative MSE of robust HT with respect to classical HT could be as small as 1 and as big as 2, depending on the coef-…cient. Contrasting that to the various types of 5% and 10% contaminations considered, it is clear that the gain in absolute as well as relative MSE is huge for the robust HT estimator compared to the classical HT estimator. Note also that the largest absolute magnitude of this MSE is for 12 (the coe¢ cient of Z 12 ; which is the exogenous time-invariant dummy variable). This is 0.12 for the HT estimator compared to 0.20 for our robust HT estimator in case of no outliers. However, when we introduce 5% contamination and vertical outliers, the MSE of HT rises to 0.62 compared to 0.20 for the robust HT estimator. In case of bad leverage points, the MSE of HT rises to 0.48 compared to 0.20 for the robust HT estimator. But when you add bad leverage points in Z 12 , the MSE of HT becomes really bad 31.7 compared to 0.39 for the robust HT estimator. This is true for contamination cases 5,6,7 and 8 with bad leverage points and concentrated leverage points. The gains in absolute and relative MSE of robust HT over HT can be huge. For example, in the presence of vertical outliers, the robust HT estimator with 5% contamination, yields large gains in MSE with respect to the classical HT procedure. The HT MSE is 8 to 9 times higher than its robust counterpart for the coe¢ cient estimates of X 1 , X 2 ; 23 times higher for the intercept (Z 11 ) and 3 to 5 times higher for the coe¢ cient estimates of Z 12 and Z 2 . Similarly, for bad leverage points, these MSE are respectively 74 to 107 times higher for the coe¢ cient estimates of X 1 , X 2 ; 18 times higher for the intercept (Z 11 ) and 2 to 24 times higher for the coe¢ cient estimates of Z 12 and Z 2 . When the outliers are block-concentrated vertical outliers, we get similar rewe also refer to our measure as MSE. It is de…ned by: where bias is the di¤erence between the median and the true value and IQ is the interquantile range Q 3 Q 1 where Q 3 is the 0:75 quantile and Q 1 is the 0:25 quantile. If the distribution is normal, the median is the mean and, aside from a slight rounding error, IQ=1:35 is the standard deviation. 20 sults but when the outliers are block-concentrated leverage points, the gain in MSE of the robust HT estimate becomes more pronounced. Whatever the sampling scheme, the MSE of 12 -the parameter of the dummy variable Z 12 -is always more a¤ected than that of the other parameters. Of course, the robust version yields better results than HT no matter what type of contamination.
When we increase the level of contamination from 5% to 10%, the classical HT estimator yields much larger MSE and the gains from relative MSE using the robust HT procedure are much larger than the 5% contamination case no matter what sampling scheme is used. When we increase the size of N and T , we get similar conclusions. Table  2 keeps N …xed at 100, but double T from 5 to 10, while Table 3 keeps T …xed at 5 and doubles N from 100 to 200. These results may be conditional on the fact that we only have 10% contamination. What happens if we increase the percentage of corrupted data? In order to investigate this, we used the largest allowable values of the breakdown points (i.e, 50% and c = 1:547) for each estimator (WMS and 2SGMS). We ran simulations for N = 100, T = 5; for case 2 (leverage points (y, X 1 , X 2 )) and for 5%; 10%, 15%, 20%; 25%; 30%; 35% and 40% contamination. 16 Results in Table 5 show that the robust HT estimator resists quite well the increase in the percentage of contamination up to 35%. When the level of contamination is even higher, the gain in relative MSE decreases quickly even if the robust HT estimator is a little bit better than the classical HT estimator. However, when 40% of the data are corrupted, the MSE for the time invariant variable Z 2 converges and even exceeds that of the MSE of HT. Figures 1 to 4 show average HT and robust HT estimates with their 95% con…dence intervals for the coe¢ cients of X 11 , X 2 ; Z 12 and Z 2 respectively. For the time varying variables (X 11 and X 2 ), Figures 1 and 2 show that the robust HT estimator is stable with narrow con…dence intervals showing a small bias and a good precision of the estimators leading to a relatively small MSE. For time invariant variables (Z 12 and Z 2 ), Figures 3 and 4 show good stability of the robust HT estimator associated with narrow con…dence intervals up to 25% of contamination. When the percentage of data pollution is higher, the con…dence interval for the dummy variable Z 12 widens appreciably. For the time invariant variable Z 2 , both bias and con…dence interval increase remaining always lower than the corresponding magnitudes for the standard HT estimator. In order to evaluate the potential impact of the breakdown point values on the …nal 2SGMS estimator, we run simulations for N = 100, T = 5 for case 2 (leverage points (y, X 1 , X 2 )) for 10% and 25% contamination but with several breakdown point (bdp) values. We de…ne bdp W M S;S (bdp W M S;M ) as the breakdown point for the S-estimator (M-estimator) of the WMS. We also de…ne bdp 2SGM S_1_j;S (bdp 2SGM S_1_j;M ) as the breakdown point for the …rst stage of step j (j = 1; 2) for the S-estimator (M-estimator) of 2SGMS; and bdp 2SGM S_2_j;S (bdp 2SGM S_2_j;M ) as the breakdown point for the second stage of step j (j = 1; 2) for the S-estimator (M-estimator) of 2SGMS. We studied the following 5 cases to check the sensitivity of our results to di¤erent breakdown point values: 17 case A  In case A, we used di¤erent values of the breakdown points only for WMS. As suggested by Rousseeuw and Yohai (1987), we set the tuning constant to c = 1:547 for the S-estimator to guarantee a 50% breakdown point and we set the tuning constant to c = 2:937 for the second step M-estimator to guarantee a higher e¢ ciency of 76% for the …nal estimator. Results in Table  6 for 10% contamination are similar to those of Table 1 for which all the breakdown points are 25%: When only 10% of the data are corrupted, there is no signi…cant di¤erences between cases A to E. But, when we increase the percentage of data pollution up to 25%, the results deteriorate for cases A and C. In these two cases, the breakdown points values of the second stage of step j (j = 1; 2) for the S-estimator of 2SGMS bdp 2SGM S_2_j;S are small (0.25 as compared to 0.5 in the three other cases). Cases B, D and E give 17 We thank a referee for this suggestion.
22 similar results showing that the crucial values are those of bdp 2SGM S_2_j;S and not necessarily those of bdp W M S or bdp 2SGM S_1_j . What about the interesting case where outliers only exist in the time invariant variables (for instance in Z 2 )? 18 To check this potential negative in ‡uence, we run simulations for N = 100, T = 5; and for 20% contamination for leverage points. First, we suppose that only Y 2 and Z 2 are randomly contaminated or block-contaminated and, second, we suppose that only Z 2 is randomly contaminated or block-contaminated. In Table 7, results for leverage points for both Y 2 and Z 2 show that the robust HT estimator yields better results than HT no matter what type of contamination. In contrast, when we simulate contamination only on Z 2 , the impact of this contamination appears to be marginal. Last, we tried an hybrid setup where we have a quasi-robust HT estimator where only one of the two robust estimators is deployed. 19 Two cases are possible: either the robust within estimator is followed by the generic IV regression, or either the classic within estimator is followed by the two stage generalized MS estimator. We only run these two quasi-robust HT estimators for N = 100, T = 5; for case 2 (leverage points (y, X 1 , X 2 )) and for 10% contamination. The results in Table 8 show that the second quasi-robust estimator (labeled quasi-robust HT (Within, 2SGMS)) gives similar results as compared to the robust HT estimator whatever the type of contamination (vertical outliers, leverage points, random or block-contamination). In contrast, the …rst quasi-robust estimator (labelled quasi-robust HT (WMS, HT)) does not seem to clean e¤ectively the negative e¤ects of contamination as there seems to be no signi…cant gain in absolute or relative MSE as compared to the standard HT estimate. This gives further evidence that a robust version of the second step of the Hausman-Taylor estimator is necessary and highly recommended.
6 An empirical example: the Cornwell-Rupert (1988) Mincer wage equation Cornwell and Rupert (1988) Table 9. For the robust HT estimator deleting these two dummy variables, the returns to education is about the same and the FEM coe¢ cient is smaller but statistically signi…cant.

Conclusion
This paper applies the useful robust panel data methods suggested by Bramati and Croux (2007) and Wagenvoort and Waldmann (2001) to the Hausman and Taylor (1981) estimator. We demonstrate using Monte Carlo experiments the substantial gains in e¢ ciency as measured by MSE of this robust HT estimator over its classical counterpart. The magnitude of the gains in MSE depend upon the type and degree of contamination of the observations. We illustrate this robust HT method by applying it to the classical Mincer wage equation using the empirical study of Cornwell and Rupert (1988). For this empirical study, the returns to education seem to be robust to outliers, while the magnitude and signi…cance of the female coe¢ cient is sensitive to robusti…cation of the HT estimator. We performed several sensitivity analysis but there remains a lot of questions for future research. For example, we did not prove that the proposed robust Hausman-Taylor estimator is scale, regression and a¢ ne equivariant. There is also a need to derive formal tests or metrics to use in applied panel data setting to determine the presence of outliers. This analysis can also be extended to dynamic HT type models, where one can check the sensitivity of using the di¤erence transformation, rather than a within transformation that subtracts a mean or a median to get rid of the individual e¤ects, on the performance of the contaminated classical dynamic panel data estimators.

Appendix 1
De…nition of the breakdown point As Bramati and Croux (2007, pp.523) noted, the breakdown point of an estimator is de…ned as the smallest fraction of outlying observations that can cause a 'breakdown' of the estimator. Let our panel data sample be composed of N T observations = fy it ; X it g and let ( ) be our estimator. The breakdown point of the estimator at is the smallest proportion of observations replaced by outliers which can cause the estimator to take on values arbitrarily far from ( ). (see Bramati and Croux (2007) and Croux and Verardi (2008)). In case of block-contaminated data, we suppose that for some individuals half of the time, the data is contaminated. In particular, e = n e y it ; e X it o where for some individuals, e y it ; e X it are contaminated for t = 1; 2; :::; T =2 and not for t = (T =2)+1; :::; T ). 21 For instance, if N = 100, T = 5 and 10% of the observations are corrupted, it means that 10 individuals each have 5 time observations which are corrupted.    with corresponding 95% confidence intervals