Multiple Comparisons with the Best, with Economic Applications Multiple Comparisons with the Best, with Economic Applications

SUMMARY In this paper we discuss a statistical method called multiple comparisons with the best , or MCB. Suppose that we have N populations, and population i has parameter value . Let the parameter value for the ‘best’ population. Then MCB constructs joint confidence intervals for the differences It is not assumed that it is known which population is best, and part of the problem is to say whether any population is so identified, at the given confidence level. This paper is meant to introduce MCB to economists. We discuss possible uses of MCB in economics. The application that we treat in most detail is the construction of confidence intervals for inefficiency measures from stochastic frontier models with panel data. We also consider an application to the analysis of labour market wage gaps


INTRODUCTION
Empirical research often involves comparisons. For instance, one may wish to compare the effectiveness of several different drugs in the treatment of a disease, or crop yields for a variety of fertilizers, or productivity levels of different firms, or earnings of individuals exposed to a variety of types of training. Typically this would involve the calculation and comparison of some outcome measure or measures for the various populations of interest. Statistically speaking, calculation of an outcome measure is naturally viewed as the estimation of some parameter indexing outcomes, such as mean lifetime or mean earnings, and the comparison of these estimates would often involve statistical hypothesis testing and/or the construction of confidence intervals, so as to reflect the statistical uncertainty about the parameter values.
To be more precise, suppose that there are N populations, with parameter value and estimate for population . Then, for given populations i and j, under standard assumptions one can generally test the hypothesis , or construct a confidence interval for the difference . For example, under the assumption of random samples from independent normal populations with equal variance, such a hypothesis test or confidence interval would involve the Student-t distribution.
In this paper, our interest will be on multiple comparisons rather than on individual ones. The simplest type of multiple comparison procedure is referred to as multiple comparisons with a control (or MCC for short), and constructs a joint confidence interval for the vector of differences where population N is the `control' population, such as the current drug treatment. The motivation for such a multiple comparison procedure is what Hochberg and Tamhane (1987) refer to as the multiplicity effect, which is simply the fact that, if enough comparisons are made, some of them are more or less sure to be `significant'. That is, with multiple individual comparisons, it is difficult or impossible to assess the `significance' of an individual difference without knowing how many comparisons were made and how the particular individual difference was selected from all those considered. For example, if we read in the newspaper that left-handed individuals who eat broccoli for lunch on Tuesday have rates of colon cancer that are significantly different from the overall population, we would not be impressed by the `significance' of this difference at usual confidence levels like the 5% level if we knew that 10,000 similar comparisons were made and only the most `significant' differences were reported. From the perspective of hypothesis testing, recognition of the multiplicity effect argues for the use of a test of the joint hypothesis to control the size of the overall testing procedure. From the perspective of the construction of confidence intervals for differences, recognition of the multiplicity effect argues for the construction of a joint confidence interval for the vector of differences, which MCC provides. This paper deals with an extension of MCC called multiple comparisons with the best (or MCB), which constructs a joint confidence interval for the vector of differences from the unknown `best' population parameter. That is, suppose that the problem is parameterized in such a way that bigger is better than smaller , in the sense of a more favourable outcome, such as longer expected lifetime or greater expected earnings. Now define so that (N) is the index of the best population and is the best value of in the N populations. Then MCB constructs a joint confidence interval for the vector of differences . This is different from and more difficult than the MCC problem, because the relevant control population is assumed to be the best population, and the identity of the best population is not assumed to be known. MCB also provides a subset of populations which (at the specified confidence level) contains the best population. Thus MCB tells us, with a specified level of confidence, which populations may be best, and it provides upper and lower bounds on the deviations of all N values of from the best value.
The purpose of this paper is to introduce MCB to economists and to illustrate its use in a few applications. One application is to the `wage gap' literature, in which earnings of various groups are compared to the earnings of the highest-earning group. Another application is to the measurement of productive efficiency from stochastic frontier models with panel data. MCB has been applied to the stochastic frontier model in an earlier paper (Horrace and Schmidt, 1996), which was concerned with confidence statements for efficiency measures from a wider variety of stochastic frontier models, but the earlier paper did not attempt to develop or explain MCB methods in the way that the present paper does.
This paper is organized as follows. Section 2 gives a brief historical account of MCB and presents the main results for MCC and MCB under the `standard' assumptions of balanced random sampling from independent, normal populations with equal variance. Section 3 discusses MCB in the more difficult and less well-explored case that the estimates have a general covariance structure. Section 4 discusses issues in applying MCB in the regression model with panel data. Section 5 gives empirical examples of MCB. Finally, Section 6 draws some conclusions and suggests areas for additional research.

Introduction
As in Section 1 above, we suppose that we are interested in a vector of parameters , where the interpretation is that each corresponds to some distinct population. Correspondingly we suppose that we have a vector of estimates Generally the estimate will be based on some number of observations, say on population i, so that we effectively have a panel data setting. However, at the present level of generality we will simply make assumptions about the distributional properties of , without explicit reference to the nature of the sample or method of estimation.
In this section we will describe multiple comparisons with a control (MCC) and multiple comparisons with the best (MCB). We will focus on the simplest and most commonly used versions of these techniques, which we will refer to as standard MCC and MCB. We make the following assumptions.
Assumptions 1 and 2 will be maintained throughout the paper. MCC and MCB are generally viewed as exact (finite-sample) procedures, but we could consider an asymptotically valid version that replaced the normality assumption in Assumption 1 with asymptotic normality. From the panel data perspective, this would correspond to the case of with N fixed. Similarly, in thinking about Assumption 2, in virtually all applications we can envisage, there will be enough degrees of freedom that the scale parameter can effectively be taken as known.
Assumption 3 is the defining assumption for standard MCC and MCB. It will be maintained throughout this section, but will be relaxed in the next section and in some of our subsequent empirical work.
A simple model to which standard MCC and MCB are applicable, and which is the basis for the usual discussions of standard MCC and MCB, is the `balanced one-way model' (e.g. Hsu, 1996, p. 43). Here we have N independent normal populations with equal variance but different means, and a random sample of size T from each population. Thus we have mutually independent observations , where is distributed as Then Assumptions 1, 2 and 3 hold, with equal to the usual pooled variance estimate, and . This is (perhaps) a model of broad empirical applicability, but it is not necessarily general enough for applications to the regression model, as we shall see.

Standard MCC
MCC is a type of multiple comparison procedure. There are several other types, including multiple comparisons with the best, which is the subject of this paper, and all pairwise comparisons, which we will not discuss. The literature on multiple comparisons first evolved during the late 1940s and early 1950s, primarily due to David Duncan, S. N. Roy, Henry and John Tukey. Harter (1980) gives a good historical account. Shortly thereafter a related body of literature on ranking and selection surfaced with the work of Bechhofer (1954). Additional ranking procedures followed due to Gupta (1956Gupta ( , 1965, Fabian (1962) and Desu (1970). MCC procedures as described in this section were primarily due to Dunnett (1955Dunnett ( , 1964. A good textbook treatment can be found in Hsu (1996, Chapter 3).
MCC is concerned with the following vector of differences: For any integer m, let be the identity matrix, and be an vector of ones. We define the following matrices: These matrices arise naturally in considering and its estimate . We have . Since is distributed as by Assumption 1, it follows that is distributed as . That is, B is (up to proportionality) the variance matrix of ; and R is the corresponding correlation matrix.
For the case of standard MCC, we also impose Assumption 3, so that (with k known). Then simple algebra reveals that . This is a matrix with diagonal elements equal to 2k and off-diagonal elements equal to k. It follows that the corresponding correlation matrix R has an equicorrelated structure with all correlations equal to . That is, , where is the matrix defined as follows: That is, for the case of standard Under Assumptions 1, 2 and 3, is distributed as multivariate Student-t, with dimension , degrees of freedom v, and correlation matrix , which we will represent with the notation . (If is assumed known, which is also possible under Assumption 2, we have a multivariate normal distribution of dimension and variance matrix This is subsumed in the previous discussion as the case corresponding to .) Now, for a given confidence level , we define the critical value as the two-sided critical value for the maximal value (over its components) of the Student-t random variable. That is, if z is an dimensional random vector distributed as , we define as the solution to Then we have the following set of joint confidence intervals: These are the MCC confidence intervals. Obviously the intervals in equation (5) are two-sided. One-sided intervals are also possible but seem less likely to be of interest in econometric applications. They would be constructed in essentially the same way; of course, would have to be a one-sided critical value. The critical values for standard MCC are widely tabulated (see e.g. Dunnett, 1964;Dunn and Massey, 1965;Hahn and Hendrickson, 1971;Bechhofer and Dunnett, 1988;or Hochberg and Tamhane, 1987). The MCC literature has typically considered rather small values of N, such as Horrace (1998) gives values of for larger values of N, with (which is not restrictive for large values of N). The values of in Horrace (1998) are calculated by simulation, directly from the definition in equation (5). In the MCC literature, is typically calculated by evaluating the probability in equation (5) numerically. This is feasible because, given the equicorrelated structure of R, the dimensional probability statement in equation (5) can be reduced to a double integral. Specifically, is the solution in t to the following equation: where is the standard normal cdf, and is the cdf of the distribution of . For economic applications, there is no compelling reason to prefer the numerical solution of equation (6) to a simulation, or vice versa; our simulations reproduced earlier tabulations where they overlapped. The earlier MCB literature emphasized deterministic (as opposed to stochastic) approximations for critical values because MCB was originally developed in a biomedical setting, and the nature of FDA regulation made it desirable for repeated analysis of the same data to reach exactly the same conclusions.
Naturally, the values of increase with the number of comparisons . Essentially, it takes a larger deviation to be judged significant if more comparisons are made. For example, for 0.05 and 1, we have the usual value of 1.96 for 2, but 2.44 for 5, 2.81 for 15, 3.29 for 100 and 3.64 for 500.

Standard MCB
The point of MCB is to construct joint confidence intervals for the differences of each from the best, and also to construct a set of populations which could be the best. More precisely, we consider the populations in (unknown) rank order: so that, as in Section 1, (N) is the index of the best population and is the standard of comparison for the . Then we seek upper and lower bounds and a set S of indices of those populations in contention for the best, such that: In this section we consider the construction of the set S and the bounds for the case of standard MCB, which corresponds to the imposition of Assumptions 1, 2 and 3 above. As with standard MCC, the primary example is the balanced one-way model discussed in Section 2.1.
MCB evolved in the early 1980s with the work of Jason Hsu. Hsu (1981) constructed parametric and nonparametric simultaneous one-sided confidence intervals for the balanced one-way model. Hsu (1984) constructed simultaneous two-sided MCB confidence intervals. Edwards and Hsu (1983) provided a general technique for adapting MCC intervals to MCB intervals. Hochberg and Tamhane (1987, Section 5.3) summarize the main results of these papers. A more detailed discussion is given in Hsu (1996, Chapter 4).
There is room for possible confusion over different kinds of confidence intervals found in the MCB literature. The intervals that we will discuss, as given in equation (8), are two-sided intervals for . These are the intervals considered in the early MCB literature, and they seem most natural to us; they correspond directly to the parameters of interest in the applications we consider in this paper. An alternative considered in the more recent MCB literature is to consider confidence intervals for . Hsu (1996, pp. ) argues that these intervals are easier to interpret and that they simplify some of the derivations. Ultimately this is a matter of taste. However, Hsu also features confidence intervals for that are constrained to contain the point zero. These constrained intervals correspond to one-sided intervals for , with an upper bound but with . We prefer two-sided intervals because, for example, we want to be able to place upper and lower bounds on the technical efficiency of inefficient firms.
To give the basic result of standard MCB, we define the following notation: with defined by equation (4): † Edwards and Hsu (1983) showed that, given these definitions, equation (8) holds; that is, . This is the fundamental result for standard MCB. It provides, at a confidence level of at least , a set S of populations that contains the best population, and upper and lower bounds for each difference from the best.
We will not give a detailed proof, which can be found in Hsu (1996, pp. ). However, the idea behind the proof is simple. We begin by considering the standard MCC problem in which the control population is the unknown best, indexed by (N). Thus we have , where E is the event: Now, it is easy to see that the event E implies the event , since E implies which implies . Also, the condition can be converted into simply since the inequality clearly holds for , since so long as . Thus E is equivalent to the event : In this event is unknown since (N) is unknown. (In particular, is the estimate of , the value of for the best population; it is not necessarily equal to max However, given , we know that (N) is in the set S. Thus we can replace on the left-hand side of the inequalities in equation (11) by , and we can replace on the right hand side of these inequalities by , to obtain an event that has probability at least as large as . All that remains then is a little logic to ensure non-negativity of the bounds, and to avoid setting a needlessly high bound for the population that corresponds to the maximal Recall that the MCC confidence interval for , an interval of width 2h. The MCB intervals may be narrower or wider, depending on whether and/or = 0.When both and are positive (the conditions for which will be discussed further in the next section), the MCB intervals are wider, with a width of . That is, the uncertainty about which population is best adds to the width of the confidence intervals. However, if S consists of a single population, then the MCB intervals for the other populations are exactly the same as the MCC intervals with that population as the control. This occurs because in this case, at the specified confidence level, there is no uncertainty about the identity of the best population.
It should be noted that the MCB confidence intervals are conservative. Whereas the MCC statement (10) holds with a probability of exactly , the MCB statement (8) holds with a probability of at least , and the inequality occurs because of uncertainty about the identity of the best population. This uncertainty can be large when the are estimated imprecisely and when is not much larger than one or more of the other . In such cases the MCB statement would actually hold with much higher probability than , and correspondingly the MCB intervals are (in principle) needlessly wide. There does not seem to be published evidence on the actual coverage probabilities of the MCB intervals. Ongoing work by Yangseon Kim and one of the authors provides Monte Carlo evidence on this question. For what seem to be empirically plausible parameter values, the MCB 90% probability statement often holds with a probability of over 99%. Thus one might hope to be able to improve on the MCB intervals, but how to do so is a matter for future research.
While the focus of the current paper is on multiple comparisons, we note in passing that it is also possible to make marginal (or univariate) comparisons with the best. Kim and Schmidt (1999, unpublished manuscript) show how to construct a set S and upper and lower bounds such that , where this statement holds for a single, given value of i.

Possible Results from MCB
The easiest case to discuss is the one in which the set S of possibly best populations has a single element, say This occurs if , with h defined in equation (9a); that is, if is the largest value of in the sample and it exceeds the second-largest value by at least h. In this case, at confidence level , population i is clearly the best. As noted in the previous section, this implies that the MCB intervals for populations are exactly the same as the MCC intervals with population i as the control.
When it is easy to see that . In fact, if and only if , because the condition for is which is the same as the condition for . Similarly, is equivalent to as is easily seen from the definition of in equation (9c). Thus the following three conditions are equivalent: (a) S (b) . (c) Things are a little more complicated when S contains multiple populations. In this case for all i, since for any i would imply that i was the only population in S. Furthermore, it is easy to see that, if . Interestingly, however, the converse is not true. We can have even for i that are not in S. The condition for is that which is weaker than the condition for , which is that . That is, population i is in S if is within h of the maximal value of whereas is within h of the minimal value of ^y for populations in S.
Thus, when S contains multiple populations, our sample is split into possibly three groups: (a) populations in S; these have (b) populations not in S, but close to S; these also have ; (c) populations not in S and not close to S; they have .

MCB WITH GENERAL COVARIANCE STRUCTURE 3.1 Mechanics of MCB with General Covariance Structure
In this section we consider MCC and MCB when has a general covariance structure. We will call this general MCB. Thus we maintain Assumption 1, that is distributed as with C known; but we do not maintain Assumption 3, so we do not assert that C is proportional to . As before we also maintain Assumption 2 so that a suitable estimate is available.
General MCB seems to be much less well explored than standard MCB. For example, Edwards and Hsu (1983), Hochberg and Tamhane (1987) and Hsu (1996, Chapter 4) discuss the generalization of standard MCB in terms of the unbalanced one-way model, which differs from the balanced one-way model discussed in Section 3.1 only in that the sample sizes in the different populations are allowed to be unequal. Unequal error variances across populations would fit the same pattern. This imposes some structure on the matrix C, whereas in this section we will assume only that C is known.
We consider general MCB in two steps. The first step is to perform N general MCC problems, with each of the N populations taking its turn as the control. The second step is to convert these MCC intervals into MCB intervals. All the difficulty relative to the case of standard MCB lies in the first step.
We begin with the problem of performing MCC using population j as the control; we will repeat this for We define as the vector whose typical element is ; explicitly, , We can write where has as its jth column, and the remaining columns are the columns (in order) of . Suppose that is the estimate of so that is the estimate of . Then, if the variance matrix of is , the variance matrix of is , where . Let be the correlation matrix corresponding to . In the special case of standard MCC, did not depend on j, and it took the form of the equicorrelation matrix In the general case has no specific form, though it is easily calculated. Finally, we define the standard deviation of the element of that equals . The MCC confidence intervals with population j as a control will take the form: where and where is the critical value from the appropriate multivariate t distribution. Specifically, if z is an -dimensional random vector distributed as , then is defined as the solution to: The solution generally depends on j (the control population), because the correlation matrix generally depends on j. This will generally need to be calculated via a simulation, since the -dimensional integral implicit in equation (13) will simplify only in special cases. Tabulation is impossible except in special cases that dictate the form of in terms of a small number of parameters. To calculate the by simulation, one possibility is to do N distinct simulations, one for each value of j. When N is large, this will be computer time-consuming both because the number of simulations is large and because each of the individual simulations is complicated (involving dimensional probability statements). A computationally more efficient procedure would calculate all N values of the in one large simulation (that is, based on one set of pseudo-random draws). Specifically, suppose we make a large number of draws of the N-dimensional random variable from the distribution. Now we can calculate the -dimensional draws whereas before is the differencing matrix with respect to the jth entry. Thus is a draw from where . We require draws from where is the correlation matrix corresponding to Since where is the diagonal matrix with ith diagonal element equal to , the required draws are . This just corresponds to dividing the ith element of by . Then, for given j, sort as before to find the appropriate quantile. While most of the calculations must be done separately for each j, they are all based on the same set of random draws. This procedure is computationally efficient because most of the computer time is used in drawing the basic random variables, not in manipulating them after they are drawn.
Once we have calculated the allowances that are part of the general MCC procedures, it is easy to move on to MCB intervals. The logic is essentially as before, and indeed the attraction of the argument of Edwards and Hsu (1983) is that we can convert MCC intervals into MCB intervals independently of how we constructed the MCC intervals; all that is needed is that they are valid.
Define the following notation: With this notation the MCC confidence intervals (12) can be rewritten as . That is, and are the lower and upper MCC bounds for , when MCC is performed with population j as the control. The last phrase, about which population is the control, matters for general MCC, although it did not matter for standard MCC. In the case of standard MCC, where all the are the same, we have so that the lower bound for when j is the control is the negative of the upper bound for when i is the control. In the general case, however, and so We now define the further notation: Then we have the MCB result: For the proof, see Edwards and Hsu (1983, p. 966). As in the standard case, the intuition is straightforward. The lower bound is taken by minimizing the MCC lower bounds over all control populations that might be best (are in S), while the upper bound is taken by maximizing the MCC upper bounds over the populations in S.
These results are similar in form to the corresponding results for standard MCB, as given in equation (9). They reduce to the results for standard MCB in the special case that the do not depend on i or j.

Results from MCB with General Covariance Structure
The condition for population i to be in the set S (so that population i is possibly best) is that for all That is, in the MCC confidence intervals with population i as a control, the upper bounds for are all non-negative; the prospect of is not ruled out for any j. As in the case of standard MCB, if then . (This is easily seen from equation (15b), since if , Thus implies that i is not in S. However, as in the case of standard MCB, we may have for populations i that are not in S. Thus, in general, MCB once again splits the sample into potentially three groups: populations in S, with ; and populations not in S, but with . The biggest difference between general and standard MCB is in the characterization of the case in which there is a single population in the set S of potentially best populations. As in Section 3.4, consider the following three conditions: These three conditions are equivalent for standard MCB but not for general MCB. To further explore the relationships between these three conditions, observe that the condition for is that i is in S but no other j is in S. Thus for we have so that i is in S; and also, so that j is not in S. Since the set S cannot be empty, in fact equation (16) is necessary and sufficient for the event The population k that dominates population j may be different for different values of j. A sufficient condition for is that condition (19) holds with ; that is, However, from equation (15c), this is a necessary and sufficient condition for . Thus implies but not conversely. This is not a sensible outcome, since if one population is uniquely identified as best (at the given confidence level) we logically should have . Thus the definition of in equation (15c) should be modified to require if This point appears to have been missed in some of the MCB literature, including Edwards and Hsu (1983), but is recognized by Hsu (1996, p. 112).
Similarly, the event implies the event , but not conversely. To see the first implication, we simply observe that S cannot be empty, and implies that j is not in S. To proceed further, we observe that if and only if Since . implies that S contains only the single population i, a necessary and sufficient condition for or, equivalently, . It is evident that this is not implied by equation (16), so that does not imply that Finally, there is no apparent connection between the events and . To see this, compare equations (17) and (18), which are identical except for that appears in (17) where appears in (18). Thus these conditions would be the same if for all j, as is the case in standard MCB, but not in general.

The Equicorrelated Case and Other Special Cases
For this section we return to the simplified notation of Section 2.2. Thus is the variance matrix of , is the variance matrix of , and R is the correlation matrix corresponding to B. The essential requirement for the applicability of standard MCB is that , as defined in equation (3). That is, the differences of the must be equicorrelated, with correlation equal to Our Assumption 3, that , is sufficient but not necessary for and thus for the applicability of standard MCB. A weaker condition than that still implies is that the be equicorrelated, with arbitrary correlation. More formally, we have the following result. Result 1: The following conditions are equivalent, and imply , for some scalars a and b.
for some scalar w. (c) for some scalar w. This result is well known in the MCB literature and proof is given by Hsu (1996, p. 187). The scalars w in (b) and (c) are the same, and equal 2b, where b is the scalar in (a). Hsu (1996, p. 187) refers to this as the variancebalanced case.
The conditions in Result 1 are weaker than Assumption 3, but they are still sufficient but not necessary for . It is well known, and easy to verify, that is equivalent to the following structure for B: A condition on C that is equivalent to equation (19), and which is therefore necessary and sufficient for , does not seem to be given in the MCB literature. To give such a condition, recall the definition of D in equation (2a) above, and define , the Moore-Penrose inverse of D (which exists in this form because is non-singular). Now consider the following condition: for some scalar a and some B of the form given in equation (19). Then we have the following result: Result 2: The following are equivalent. (a) . (b) B is of the form given in equation (19). (c) C is of the form given in equation (20).
To prove this result, we note first that and . Then (c) implies (b) because, if C is of the form in equation (20), is of the form of equation (19). To prove that (b) implies (c), define so Then This implies that E must lie in the null space of D, or E for some scalar a. Results 1 and 2 are of theoretical interest, and may be of use in practical applications in deciding whether standard MCB is applicable, or approximately applicable. However, they are of somewhat limited practical use because, given the matrix C (the variance matrix of , the matrix R is easily calculated. To see whether standard MCB is applicable or approximately applicable, it is an easy matter to calculate R and see whether , either exactly or approximately.
Two additional special cases that are featured prominently in the MCB literature are one-way structure and product structure (see e.g. Hsu, 1996, Section 7.1). The model is said to have product structure if the matrix R (the correlation matrix of , the vector of differences of estimated has the form: for some set of scalars The model is said to have one-way structure if and for some set of positive scalars One-way structure is sufficient but not necessary for product structure. Specifically, oneway structure implies product structure with The motivation for the terminology is that one-way structure arises when C, the variance matrix of is diagonal with ith diagonal element This occurs in the unbalanced one-way model, in which we are estimating the means of a set of independent normal populations, but the populations have different variances or different numbers of observations (or both). It may also occur in the regression model with panel data, as we will discuss in the next section. In the presence of product structure, some significant simplifications are possible relative to the purely general case. Specifically, the integral defining the MCC critical values can be expressed as a double integral (instead of an -dimensional integral) (see e.g. Hsu, 1996, p. 63, equation (3.15) for the form of this integral).

Approximations
If we cannot appeal to any special structure of the variance matrix C, the general MCC and MCB procedures described in Sections 3.1 and 3.2 apply, but are computer time-intensive. An alternative is to replace the exact but complicated general MCC procedure with a conservative approximation. Such approximations provide an easily calculated upper bound on the significance point for the distribution. That is, they provide a value such that , whereas the exact MCC procedure finds such that this probability equals . Hsu (1996, Section 7.2.1 and Appendix A) discusses a number of conservative approximations, based on the Bonferroni inequality, Scheeffé's inequality, Slepian's inequality, an inequality due to Sidák (1967), an inequality due to Hunter (1976) and Worsley (1982), and a so-called `factor analytic method' due to Hsu (1992), for which the one-factor case corresponds to product structure as discussed in Section 3.3; in this case exact results are possible. Matejcik (1992, unpublished manuscript) discusses some of these inequalities as well as a method suggested by Tamhane (1977) based on Banerjee's inequality, a procedure using a moment-based approximation to the Behrens-Fisher problem, also suggested by Tamhane (1977), and a new technique based on a heteroscedastic selection procedure. McCann and Edwards (1996) provide a brief survey and a new procedure based on a path length inequality due to Naiman (1986). The above-mentioned approximations are all numerical (deterministic) and, as stated earlier, there is no compelling reason to prefer numerical approximations over simulated (stochastic) ones in an economic setting. Some techniques that employ Monte Carlo Methods to approximate critical points are provided in Foutz (1981), Edwards and Berry (1987) and Naiman and Wynn (1992). The salient feature of these techniques is that when N is large they produce approximate critical values quickly, compared to the potentially time-consuming process of generating critical values directly from simulation of equation (13).
Many of these techniques are available using standard commercial software. For example, see Tobias (1996) for a discussion of MCB software in SAS.

MCB IN THE PANEL DATA REGRESSION MODEL
In this section we consider the panel data regression model: We suppose that the parameters of interest for the MCB analysis are the intercepts, which we express as the vector a (In Section 5 we will give some empirical examples in which these are indeed the parameters of interest.) These can be ordered in the usual way as . Then we denote the differences of intercepts from the best as , for which MCB will provide joint confidence intervals. Throughout this section we assume that the errors are i.i.d. , and we treat the as fixed (independent of v). Also, unless stated otherwise, we assume a balanced panel (T observations for each value of i). This model might be estimated in a number of different ways depending on what one is willing to assume about the . We will consider the simplest and most generally relevant case, which is to treat the as fixed parameters, in which case the model would typically be estimated by the within (fixed-effects) estimator.
We begin with a simple point. we could write This is the balanced one-way model to which standard MCB applies.
Since standard MCB would be applicable if were known, it is reasonable to presume that standard MCB is a good approximation if is estimated sufficiently precisely. More precisely, suppose that is the within (fixedeffects) estimator, and we ignore its variability and apply standard MCB. As is well known, we can obtain the fixed effects estimates of and the by regressing on and a set of N dummy variables representing populations (values of i). Equivalently, we can obtain the within estimate from the regression in deviations from individual means (i.e. by regressing and then This leads to the following expression for The variance of is of order while the variance of is of order As a result, we would expect standard MCB to be approximately valid when N is large, since the variability in will be small relative to the variability in (Similarly, in the case of an unbalanced panel, we would expect the one-way structure, and therefore the product structure, to hold approximately when N is large.) This is an important observation, because general MCB will be numerically very complicated when N is large.
MCB is not designed as an asymptotic procedure. Indeed, the problem of comparing N populations is hard to conceptualize unless N is fixed. However, since econometricians often think in terms of asymptotics, the following two comments may be helpful. First, as just noted, standard MCB may be approximately valid when N is large. Second, MCB assumes that the are normally distributed. This should be so if the errors are normal or if T is large.
Whether or not N is large, general MCB can be applied. Let be the matrix of x's expressed in deviations from individual means, so that its typical row is of the form . Let be the matrix whose ith row is . Then the variance matrix of is and correspondingly Given the matrix C, we can calculate the matrices B and R, and we can see whether or not R approximately equals . If it does, standard MCB is approximately applicable. If not, general MCB or an approximation will be required.
Interestingly, in our empirical work we often found that C approximately had an equicorrelated structure, so that R was approximately equal to . To understand why this is not a surprising outcome, consider the following argument. For the variance matrix of we have Given independent populations, we would expect terms like to be small for since they essentially measure the covariance of the xs across populations i and j. However, the same type of terms with should not necessarily be small, since they essentially measure a variance. Each element of for contains only one `variance' term, namely . However, contains two `variance' terms, namely and Thus, heuristically, diagonal elements of B should be roughly twice as large as off-diagonal elements, and R should be roughly equal to . We will discuss this point further in our empirical section.

EMPIRICAL APPLICATIONS 5.1 Stochastic Frontier Models
In this section we demonstrate the use of MCB to construct confidence intervals for measures of technical efficiency in stochastic frontier models with panel data. We use three previously analysed panel data sets. For these data sets, Horrace and Schmidt (1996) constructed confidence intervals for efficiency measures using a number of different procedures (i.e. under a number of different assumptions), including standard MCB. Here the emphasis will be on whether standard MCB was appropriate, and whether general MCB would make a difference. We find that general MCB is called for in one of the three cases, and that it then makes a moderate difference.
Our three data sets possess rather different characteristics, most notably in the relative numbers of firms and time periods. In the first data set, the number of time periods is much smaller than the number of firms. In the second, the number of time periods is approximately equal to the number of firms, and the number of time periods is different across firms. In the third data set the number of time periods is larger than the number of firms. These characteristics affect the results in predictable ways. For example, only when the number of time periods is large and the number of firms is small are we able to identify a unique best firm.
We begin with a brief discussion of the stochastic frontier model with panel data. Stochastic frontier models were originally due to Aigner et al. (1977) and Meeusen and van den Broeck (1977). These models were based on cross-sectional data and strong distributional assumptions. Models with alternative distributional assumptions have been presented by Stevenson (1980), Greene (1990) and many others. The use of panel data in frontier models was first suggested by Pitt and Lee (1981) and Schmidt and Sickles (1984). More recent treatments include Cornwell and Schmidt (1995) and Greene (1997). This discussion will follow Schmidt and Sickles (1984) and Horrace and Schmidt (1996).
Consider the following logarithmic specification (e.g. Cobb-Douglas or translog) of a production function: Here i indexes firms or productive units and t indexes time periods.
is the scalar dependent variable representing the logarithm of output for the ith firm in period t, is a scalar intercept, is a vector of functions of inputs (e.g. in logarithms for the Cobb-Douglas specification), is a vector of coefficients and is an i.i.d. error term with zero mean and finite variance. The satisfy and is an indication of technical inefficiency. Note that is time-invariant. For a logarithmic specification such as this the technical efficiency of the ith firm is given by so technical inefficiency is . Different treatments of this model follow from different assumptions. For example, Pitt and Lee (1981) discussed maximum likelihood estimation, under the assumptions that the are fixed, the are i.i.d. normal, the are i.i.d. according to a specified distribution (e.g. half-normal), and the and are independent. Further discussion can be found in Schmidt and Sickles (1984) or Horrace and Schmidt (1996). In this section, we treat the as fixed. If we let , equation (25) becomes the standard fixed effects regression model given in equation (21) above. Then and are obtained as described in Section 4 above. Given the estimates one can then define and Loosely speaking, these estimates are consistent as N and T both approach infinity. We require so that whereas in equation (22) the term as either or , the term is the average of T values of and is required for it to converge in probability to zero. Furthermore, we require (and some restrictions on the distribution of the ) so that . Since we basically need the density of u to be non-zero in a neighborhood of zero so that the minimum of the converges to zero as the number of increases. These arguments were given by Schmidt and Sickles (1984). A more rigorous treatment of the asymptotics for this model is given by Park and Simar (1994), who show that, in addition to and , we need to require in order to ensure the consistency of This latter requirement limits the rate at which N can grow relative to T, in order to ensure that the upward bias induced by the `max' operation disappears asymptotically. If N increased too rapidly relative to T, the increased bias due to maximizing over more could dominate the convergence of each individual to the corresponding The above discussion regards zero as the absolute minimum value of the , and hence as the absolute maximal value of the , over any possible sample (essentially, as ). This can be distinguished from the minimal value of the and maximal value of the in a given sample of size N, and this distinction is relevant when N is small, and therefore treated as fixed for purposes of asymptotics. For fixed N, let the ordering of the be as before: , so that with . Similarly, let the ordering of the in reverse order be so that with . Then clearly or equivalently . This corresponds to comparing the to the absolute standard . The alternative is to compare the to the withinsample standard . This leads to the definition: so that (Note that corresponds to of Section 4.) The relevance of this distinction to the present discussion is straightforward. MCB naturally takes the number of populations (N) as fixed, and generates a confidence interval for the vector of differences relative to the best population. That is, MCB will construct joint confidence intervals for the , not for the . Similarly, the joint confidence intervals for the can be easily converted into joint confidence intervals for the values of , which we distinguish from . The intervals for are a monotonic transformation of the intervals for . It should be noted that this transformation will cause lower bounds to become upper bounds, and vice versa. Koop et al. (1997) have provided a Bayesian analysis of the stochastic frontier model that has some strong parallels to MCB. They consider four different models, depending on different assumptions about the effects. Their Standard Individual Effects (SIE) model uses an uninformative prior for and and is therefore very similar in spirit to the usual classical fixed effects model. They describe how to calculate the marginal posterior distribution of , in their notation) and so that they make marginal rather than multiple comparisons with the best, but presumably multiple comparisons could also be made. An interesting question that we have not yet pursued is how different such a Bayesian analysis based on an uninformative prior would be from MCB, in typical applications. Incidentally, both Koop et al. (1997) and Horrace and Schmidt (1996) note that construction of confidence intervals for rather than fundamentally requires an assumed distribution for . The reader is referred to those papers for more discussion of inference on given such a distributional assumption.

Empirical Examples of MCB in Stochastic Frontier Models
Indonesian rice farms We analyse data previously analyzed by Erwidodo (1990, unpublished manuscript), Lee (1991, unpublished manuscript), Lee and Schmidt (1993) and Horrace and Schmidt (1996). For a complete discussion of the data see Erwidodo (1990, unpublished manuscript). One hundred and seventy-one rice farms in Indonesia were observed for six growing seasons, so 171 and 6, in our previous notation. The data were collected by the Agro Economic Survey, as part of the Rural Dynamic Study in the rice production area of the Chimanuk River Basin, West Java and obtained from the Center for Agro Economic Research, Ministry of Agriculture, Indonesia. The 171 farms were located in six different villages and the six growing seasons consisted of three wet and three dry seasons.
The model is a Cobb-Douglas (loglinear) production function, with some additional dummy variables. Output is measured as kilograms of rice produced. Inputs included in the specification are seed (kg), urea (kg), trisodium phosphate (TSP) (kg), labour (labour-hours) and land (hectares). The specification also included the following dummy variables. DP equals 1 if pesticides were used and 0 otherwise. DV1 equals 1 if high yield varieties of rice were planted. DV2 equals 1 if mixed varieties were planted; the omitted category represents that traditional varieties were planted. DSS equals 1 for the wet season. There were several other time invariant dummy variables which had to be excluded from the analysis to preclude multicollinearity.
The estimated coefficients are not presented here, because they are not relevant to this discussion; they can be found in Horrace and Schmidt (1996, p. 268). The estimate of a, while relevant, consists of 171 elements and is too large to be presented in its entirety. Some salient facts concerning are: Some of the results from standard MCB are presented in Table I. They are based on the 90% confidence level ( 0.10), and a simulated critical value for of 3.18. This implies an MCB `allowance' of 0.6022. The set S of possibly efficient firms contains 92 of the 171 farms. The value of was 4.955. Because all the remaining 79 farms had values of that were within 0.6022 of 4.955, all the farms had upper bounds for to one (i.e. lower bounds for equal to zero). The lower bounds for ranged from 0.5874 for 145 to 0.2001 for 45. It would take up too much space to present results for all 171 farms, so Table I presents the results for only eight farms: the three with the largest the farm with the median value of the farm with the smallest value of for and the three farms with the smallest For each of these eight farms, we give the value of the standard error of and the MCB lower and upper bounds for the confidence intervals for .
These confidence intervals are very wide and indicate that the point estimates of efficiencies contain too much statistical noise to be taken very seriously. Horrace and Schmidt (1996) performed some sensitivity analyses to try to understand the reason for such wide confidence intervals. The intervals are shortened, but not by a large amount, if we do MCC with farm 164 as a control (i.e. treat the identity of the most efficient firm as known); if we reduce the number of farms for which we construct confidence intervals; or if we change the confidence level from 0.90 to 0.75. The main reason why the confidence intervals are wide is that the are not estimated very precisely, and this is due to a small value of T and a relatively large value of . 1 Our conclusion is that analysis of technical efficiency for these data is difficult, and would require stronger assumptions than the fixed effects model makes.
We now turn to the question of whether standard MCB is appropriate for this analysis. As discussed in Section 4, we might hope that standard MCB will be appropriate, because N is fairly large. A direct check on the applicability of standard MCB is to calculate the correlation matrix R of the vector of differences and see how close it is to When we calculated R with farm 164 as the control, we obtained an average correlation of 0.4967, with a standard deviation of 0.0139. The correlations of elements of were close enough to to convince us that standard MCB was approximately valid. This is a good thing, because with 171, general MCB would have been extremely time-consuming.

Texas utilities
In this section we reanalyse data previously analyzed by Kumbhakar (1996) and Horrace and Schmidt (1996). Kumbhakar estimated a cost function, whereas we will estimate the production function. The data set consists of observations on 10 major privately owned Texas electric utilities observed annually over 18 years from 1966 to 1983, and includes information on annual labour, capital and fuel (inputs) for electrical power generation (output).With 18 periods of observation per firm we have T larger than N, the opposite of the case with the Erwidodo rice farm data.
The model is a Cobb-Douglas production function. The within-estimates of the regression coefficients of this model are given in Horrace and Schmidt (1996, p. 272). We note that the estimate of is 0.0029, which is much smaller than in the case of the Erwidodo data. Table II gives the intercepts their standard errors, and the results of standard MCB at the 90% confidence level. The critical value of 2.42 implies an MCB allowance of 0.0434. Because exceeds all the other by at least 0.0434, {5}; that is, firm 5 is identified as the best firm. Because there is a unique best firm, the upper and lower bounds for the efficiency levels of the other firms are not 1 It may be noted that the standard errors of the give a good idea of the precision of estimation of the , but not of the precision of estimation of differences of the which is what is relevant for MCB. Because the are strongly positive correlated, the standard error of the difference is actually typically smaller than the standard error of or For example, for 164 and 118, Table I reports standard errors of 0.260 and 0.253, while the difference has a standard error of 0.192. Nevertheless, the standard error of the difference is large relative to the difference (5.556 5.486 0.07), so that there is considerable uncertainty about the level of inefficiency even if it were known that firm 164 was most efficient. See Horrace and Schmidt (1996) for more detail. equal to one. The bounds for the efficiency levels of the firms are much tighter than they were for the Erwidodo data. For example, firm 8 has confidence interval for of [0.7847, 0.8562], which is arguably tight enough to be informative. The bounds are tighter in this case than in the previous analysis primarily because differences of the are estimated more precisely, and this is so because T is larger and is smaller. 2 Of course, while a larger value of T improves precision of estimation, it also makes the assumption of timeinvariant efficiency less plausible. Various models that allow time-varying efficiency exist (e.g. Kumbhakar, 1990;Battese and Coelli, 1992;Lee and Schmidt, 1993) and MCB would in principle be applicable; for each time period, we simply need to be able to calculate the joint distribution of the N estimated intercepts. As a rough test of the need for such a model, we did a Chow test of the hypothesis that the slopes and intercepts are all constant (against the alternative that one or more differs between the first nine observations and the last nine).We found which is significant at any reasonable level, and indicates the need to consider seriously that something has changed over time. Further pursuit of these considerations is empirically relevant but not within the scope of this paper.
We now ask whether standard MCB is in fact appropriate for this data set. In fact, it is not, because the correlation matrix R is not close to . For defined with firm 5 as the control, the average correlation between elements of equalled 0.8599, which is not at all close to , and ranged from 0.7374 to 0.9520, with a standard deviation of 0.0429. Similar results hold for defined with other firms as the control. As a result, standard MCB should not have been expected to be a good approximation to general MCB.
While it is not the purpose of this paper to be a corrigendum for Horrace and Schmidt (1996), it may be worth discussing the argument they used (erroneously) to justify standard MCB, so that others do not follow their example. They presented the following facts about Based on these facts, they argued that the equicorrelated structure for approximately held, and that standard MCB was applicable. The approximate equality of the elements of is indeed relevant, but the relevant comparison is of the differences of these elements to The mean of the elements of is irrelevant, because the variance matrix of , and the differencing matrix D removes the mean from and leaves only the differences. For standard MCB to apply these differences need to be small relative to . Although the variation in the elements of was not large relative to the mean value of , it was quite large relative to 2 We might note that the themselves are not estimated more precisely for this data set than for the previous one. The standard errors of the are very similar in Tables I and II. However, the standard error of the difference is only 0.0653, compared to 0.192 for the comparable difference for the rice farm data, and this comparison does not depend much on choice of firms. In the present data set more of the variability of is due to error in the estimation of , and this largely differences away.
, whose diagonal elements equal only 0.00032. As a result, the correlation matrix R is not very nearly equal to . As the preceding discussion makes clear, the moral of the story is simply to compute R and compare it to before proceeding with standard MCB. There is no obviously preferred metric for this comparison. Table III reports the results of general MCB applied to the Kumbhakar data. The major difference from the standard MCB results is that the set S of possibly efficient firms now contains two firms ( {5, 3}) rather than just one ( {5}). For the firms not in S, this results in only minor changes in the lower bounds for technical efficiency, but it leads to significant increases in the upper bounds. For example, the confidence interval for was [0.7847, 0.8562] from standard MCB, but is now [0.7595, 0.9731] from general MCB. The increase in width is fundamentally due to the enlargement of the set of possibly efficient firms, and is not a general consequence of the difference between standard and general MCBÐ general MCB could yield a larger or smaller set S, or narrower or wider intervals, than standard MCB. What is clear is that, for this data set, standard versus general MCB matters. Given the lack of similarity of R to , we trust the general MCB results, not those from standard MCB. Given that standard MCB was not applicable, we could have considered the applicability of other procedures that are simpler (less computer time-consuming) than general MCB. For example, a referee pointed out correctly that we could have checked how close the correlation matrix R came to having the product structure discussed in Section 3.3 above. This comparison can be made using standard software, such as PROC FACTOR in SAS. Product structure would result in considerable simplifications relative to general MCB.

Egyptian tileries
In this section we analyse data previously analyzed by Seale (1990) and Horrace and Schmidt (1996). The data were collected by the Non-Farm Employment Project in 1982±3. The firms were located in Fayoum and Kalyubiya, Egypt.
25 Egyptian small-scale floor tile manufacturers were observed over 3-week periods for 66 weeks, for a total of 22 separate observation periods. The data contain some missing observations, so the number of separate observation periods varies across firms, making this an unbalanced panel. The total number of observations is 484, so is on average about 19. For a complete discussion of the data see Seale (1990).
Inputs to the production of cement floor tiles are labour (labour-hours) and machines (machine-hours). Output is in square metres of tile. The model is a Cobb-Douglas production function. The within-estimates of the regression coefficients are given in Horrace and Schmidt (1996, p. 276). The estimate of equals 0.1147, which is comparable to the value for the Erwidodo data. Horrace and Schmidt (1996) performed MCB under the assumption of the product structure discussed in Section 4.3. However, their results are incorrect due to an error in the calculation of the appropriate critical values. The corrected results for MCB assuming one-way structure and using the 90% confidence level are given in Table  IV. All but five of the firms are in the set S, so we are not very sure about the identity of the best firm. The confidence intervals for the individual technical efficiencies are also fairly wide; they are much wider than for the Kumbhakar data, though not as wide as for the Erwidodo data.
The values of range from 0.0587 to 0.1106, with a mean of 0.0857 and a standard deviation of 0.00941. This degree of variation does not appear small compared to the average value of which is approximately 0.012 (based on an average value of Ti of approximately 19), and so we might suspect that general MCB would be called for. As explained in the previous section, we could have checked how close the correlation matrix, R, was to possessing a product structure. This would have involved finding a matrix, say which possessed a product structure or a one-way structure and minimized some Euclidean norm criterion such as or Given we could have compared it to R, to determine if general MCB was indeed called for. Table V gives the results for general MCB (i.e. not assuming the one-way structure). They are not very different from the results in Table IV, so in this case general MCB does not make much difference, despite the fact that the one-way structure did not appear to hold very well.

Estimation of Labour Market Wage Gaps
To illustrate another potential application of MCB in economics, consider the following example given in more detail in Horrace (1997, unpublished manuscript). The labour market wage discrimination literature is concerned with explaining differences in wages across gender, race or ethnicity. The idea is to decompose an individual's wage into a human capital component and a discriminatory component using some form of regression analysis. The discriminatory component is called the `wage gap' or `wage differential'. Empirical studies are often concerned with estimating these wage gaps for different employment classifications and ranking them to determine which employment classifications are the most or least discriminatory. For example, Fields and Wol (1995) estimate wage gaps across 284 industry classifications such as public utilities, retail trade and agriculture. They then rank these industry wage gaps to show which ones possess the largest wage gaps and are, hence, the most discriminatory. However, they do so without performing any inference on the ranking. Horrace (1997, unpublished manuscript) performs an analysis similar to the Fields and Wol study, but includes MCB inference on the wage gaps. More specifically, let represent the male-female wage gap in industry Order these in the usual way: so that represents the smallest male-female wage gap. Then MCB confidence intervals can be constructed for all , a measure of relative non-discrimination. The analysis provides not only the MCB confidence intervals themselves, but the set S of industries that are `least non-discriminatory' at a prespecified confidence level. The latter result has obvious policy implications for monitoring discriminatory industries for equal opportunity compliance, if the wage gaps are indeed interpretable as reflecting discrimination.
The basis of the empirical exercise is a sample from the March 1988 Current Population Survey. There are 14 industries selected, and a total of 53,669 individuals. The number of individuals per industry differs across industries, so this is essentially an unbalanced panel. A regression equation is estimated for which the dependent variable is log wage, while the explanatory variables are a large number of demographic variables (including variables representing educational and employment backgrounds), plus 13 dummy variables for industries and 14 interactions between the industry dummies and a dummy variable for gender. We view this basically as a datadescriptive model; Fields and Wol used a similar specification. The wage gap is defined as the coefficient of the interaction between the dummy for gender and the dummy for industry j. Because of the logarithmic specification, the measure of relative non-discrimination is actually Table VI gives the MCB results, for the 95% confidence level. These are the results from general MCB, since the covariance structure of the did not support standard MCB. The set S of potentially best industries contained six members: Personal services; Transportation, communication and public utilities; Professional and related services; Business and repair service; Mining; and Agriculture, forestry and fisheries. While the point estimates of relative non-discrimination ranged from 0.838 to one, the MCB intervals for all industries had an upper bound of one. That is, at the 95% confidence level, the possible range of the value of the relative nondiscrimination parameter may be one for every industry. Even with an extremely large sample size there is much uncertainty in the ranking of the wage gaps, and differences in the measured discrimination levels across different industries may be nothing more than statistical noise.

CONCLUSIONS
This paper has provided an introduction to multiple comparisons with the best (MCB) for an audience of economists and econometricians, and has given some examples of its use. Our examples deal with the frontier production function (or efficiency measurement) problem, where MCB is very naturally applicable, and with the estimation of labour market wage gaps. However, we believe that MCB has other uses in economics. One potential application that comes to mind is the cross-country comparison of growth rates. Arie Kapteyn reports (personal communication) using MCB to construct confidence intervals for differences in intercepts in welfare expenditure functions, with municipalities in the Netherlands as the cross-sectional unit of observation. We expect that numerous similar applications may exist.
It is interesting to speculate about the possible connections between construction of confidence intervals and the point estimation problem. In the frontier production function literature, the point estimate of It is clear that this estimate is biased upward, since is biased upward as an estimate of How to correct this bias is a challenging question. An intriguing fact is that the MCB confidence interval for is not centred on the above point estimate, and in that sense implicitly recognizes the bias inherent in the max operation. Edwards and Hsu (1983) suggest the midpoint of the MCB confidence interval as a point estimate, and argue but do not prove that it should be less biased than the usual point estimate. This idea bears further investigation.
The MCB literature also includes so-called non-parametric intervals, based on Wilcoxon statistics. These avoid the normality assumption, and may be useful when we do not wish to assume the normality of the errors and when T is small enough that asymptotic normality is not relevant for Some empirical experience with these procedures would also be useful.