What Makes a Classmate a Peer? Examining Which Peers Matter in NYC Elementary Schools

We identify and estimate the effects of student-level social spillovers on standardized test performance in New York City (NYC) elementary schools. We leverage student demographic data to construct within-classroom social networks based on shared student characteristics, such as a gender or ethnicity. Rather than aggregate shared characteristics into a single network matrix, we specify additively separate network matrices for each shared characteristic and estimate city-wide peer effects for each one. Conditional on sharing a classroom, we find that the most important student peer effects are shared ethnicity, gender, and primary language spoken at home. Identification of the model is discussed.


Introduction
Birds of a feather flock together. This concept is called homophily, and we see its effects in schools, work, and social circles -both personally and in popular culture. The effects of homophily can be quite stark. For example, compared to their male counterparts, female investors are three times as likely to invest in companies with a female CEO. 1 Homophily may arise if external similarities correlate with shared cultural experiences, making potential relationships less costly through improved information flows and efficient communication. 2 On the other hand, homophily could simply be the result of taste, which may correlate with prejudice or fear of the other. 3 This paper asks two questions related to homophily in elementary school classroom in New York City.
First, which demographic characteristics (if any) are important for social spillovers in standardized test performance, and how do they compare in sign and magnitude? For example, is shared gender more influential than shared neighborhood of residence? Second, what are the effects of such homophily-induced spillovers on the distribution of student academic performance within and between demographic groups? Specifically, how do these spillovers affect existing achievement gaps within the classroom?
Understanding how homophily works, and which networks are important in which contexts, is important for education policymakers wishing to harness peer effects (or, at least, to better understand them). The promise of peer effects is twofold: a near costless improvement to educational outcomes through optimal assignment of students to classrooms and a "social multiplier effect" for exogenous policy interventions and investments. 4 A major challenge to policy intervention is that students sort into groups within the classroom, and this sorting behavior may bring unintended consequences when combined with an intervention. 5 This paper sheds light on this complication, 1 Abramson et al. (2019) discusses gender diversity in venture capital, and its implications. 2 See Hegde and Tumlinson (2014) for a model of this in venture capital firms. They discuss how ethnic homophily plays into both selection of which companies to invest in, as well as influence after investment.
3 See Leszczensky and Pink (2019) for a discussion of how different taste preferences for homophily (high and low identifiers) interact to form groups of varying levels of homogeneity. 4 Bennett and Bergman (2018) provide a nice example of social multipliers in action using an attendance intervention. A good survey of the related theory can be found in Epple and Romano (2011). 5 See Carrell et al. (2013) for an example. The authors conducted an experiment in which they manipulated peer groups such that it was expected to help the lowest performing students. The target group of students ended up being 2 by identifying and empirically testing for characteristics which are important for within-classroom spillovers and then by exploring how these effects change under differing classroom compositions.
We develop a spatial autoregressive econometric model to estimate and determine which shared characteristics are most influential for academic (test score) spillovers and apply it to NYC elementary school classrooms. To fix ideas, the "student" is the level of observation, and the universe of students is partitioned into "classrooms." Within each classroom, students may be partitioned in different ways based on K = 7 "demographic partitions:" by gender, ethnicity, neighborhood, bus stop, bus route, language spoken at home, and country of birth. Within each "demographic partition" students are divided into "demographic groups" corresponding to the possible categories within each partition, and these groups form the basis of seven homophily networks within each classroom. For example, in any classroom there may be two gender groups (boys and girls) and four ethnicity groups (Black, White, Hispanic, and Asian/Other). We leverage variability in the size and composition of demographic groups in each classroom to identify peer effects through each demographic partition. 6 That is, we simultaneous identify and estimate a gender peer effect, an ethnicity peer effect, a neighborhood peer effect, etc. for elementary school students in NYC.
We later partition the data into the five boroughs (The Bronx, Brooklyn, Manhattan, Queens and Staten Island) and re-estimate these seven peer effects at the borough level. Our group interaction model is a multi-network version of the group interaction model of Lee (2007), and identification and asymptotic arguments follow similarly. Empirically, our model finds the strongest networks are shared ethnicity, gender, and primary language spoken at home. Ethnicity groups are most important for mathematics test scores, and gender groups are most important for reading test scores. To our knowledge, this is the first paper to explore bus spillovers in the classroom, but we do not find them to be an important factor in classroom performance. In general, peer effects based on homophily appear to be stronger in mathematics than reading test scores. We explore different classroom compositions to understand hurt by the experiment because the way peer groups formed changed when the set of potential peers changed. how network influence of demographic partitions change when peer group composition varies. We provide evidence that low ethnic diversity leads to lower relevance for the ethnicity network.
In our model, group interaction along multiple demographic partitions produces very general and interesting reduced-form dynamics compared to existing models. First, positive within-group interaction may decrease existing within-classroom performance gaps between students belonging to the same groups (i.e. boy or girl peer groups) by lifting low performing students more than high performing students. This is a well-known effect of social interactions, and is a common feature of these models. A second and unique feature of our group interaction model is that peer effects may increase gaps between students in different demographic groups by limiting direct classroom spillovers across group boundaries. That is, when boys interact more with boys, and girls with girls, spillovers are stronger within demographic groups than between them -potentially exacerbating any existing gender performance gap. This is an important implication of our model: classroom exposure to other demographic groups is important, but facilitating cross-group interaction may be equally important to close cross-group performance gaps.
To further motivate our model, the next section is a brief literature review of social networks in education. Section 3 details the data used in this paper and how we construct our sample. Section 4 introduces the model and discusses identification and estimation. Section 5 presents the main empirical results, and Section 6 explores how these effects change as the classroom composition varies. In particular, we explore how the ethnicity network's impact changes as classroom diversity varies. Section 7 concludes. In appendices, we conduct simulations to examine the finite sample performance of our proposed model using datasets simulated from the empirical moments of the NYC school data, and perform some robustness checks including alternative specifications.

Social Networks in Education
Despite their potential importance, econometric estimation of peer effects in education remains difficult for a number of reasons. The primary challenge in any peer effect analysis is that the 'true' network is almost never observed, and can only be approximated. 7 Empirical specification of networks involves determining the shape of the network (who is connected to whom?) and the strength of the network (the magnitude of the individual connections within the network). Neither of these empirical choices is trivial, which may explain the varied results in the literature. 8 Previous work uses a variety of sources to construct peer networks: college roommate (Sacerdote, 2001), classroom seating assignments (Hong and Lee, 2017), squadrons in the US Air Force Academy (Carrell et al., 2009;Carrell et al., 2013), number of shared college classrooms (De Giorgi et al., 2010), student lunch lines (Presler, 2020) among others. However, these approaches use variation unique to specific situations that is not readily generalized.
A reasonable alternative to the aforementioned approaches is using shared student characteristics (homophily) as a proxy for the network. Race and gender are the most common shared characteristics we see used (Arcidiacono and Nicholson, 2005, Renna et al., 2008, Lavy et al., 2012, Hsieh and Lin, 2017, Ananat et al., 2018, Billings et al., 2019, but other characteristics such as immigration status (Damm, 2014) also have been considered. Our paper follows this line of research but considers a broader set of homophilous factors simultaneously in explaining peer effects among elementary school students. Specifically, we examine the relative importance of within-classroom networks based on shared neighborhood of residence, bus route, bus stop, native language, country of birth, gender, and ethnicity. 9 Peer networks constructed from demographic data are often strictly exogenous and likely to be constant over time -two features that may simplify peer effect estimation. Moreover, school administrators are usually in possession of demographic data, avoiding the need to conduct costly 7 An exception may be production networks where worker interactions may be observed. See Horrace et al. (2016) and Horrace et al. (2020) for examples. 8 The survey of the empirical effects in Sacerdote (2001) highlights this variety. More recently, the survey Paloyo (2020) confirms that this is still true, although we have progressed in our understanding of these effects -particularly in the difficulty entailed in implementing policy based on peer group manipulation. 9 An implicit assumption in our model is that peer effects occur only in the classroom. This assumption is not uncommon in the literature (e.g. Fruehwirth, 2013), and in our context of elementary education, it may be reasonable to assume the interactions beyond the classroom don't create academic spillovers in appreciable ways. Burke and Sass (2013) show that classroom peers are much more influential than cohort-level peers. This is intuitive, as we should expect classmates who have opportunities to interact during the entire school day to be much more influential on one another than schoolmates with minimal opportunities for interaction. 5 student friendship surveys like the "Add Health" data (Patacchini et al., 2017), or to mine social media data like Facebook to understand how students sort into friendships (Mayer and Puller, 2008).

Data and Sample
This paper uses proprietary student-level longitudinal administrative data on student academic performance, socio-demographic characteristics, school and classroom codes, and program participation, which we link to student bus route and bus stop assignment data.
Student-level demographic data comes from the New York City Department of Education (NY-CDOE). These data include socio-demographic characteristics such as gender, race, age, grade, residential neighborhood, country of birth, primary language spoken at home, an indicator of eligibility for free or reduced-price lunch, and an indicator of full-time special education. Data also include class assignment as well as both current year and lagged mathematics and reading test scores. We compile this data for academic years 2013-2015.
Transportation assignment data are provided by the NYCDOE Office of Pupil Transportation (OPT). Data include whether a student is assigned a bus, bus route number, bus stop location (latitude and longitude), and bus pickup time all at the student level.
Our sample consists of students in grades 4-5 in general education classrooms where at least two students are assigned a bus. Table 1 shows summary statistics for our sample. Notice that the sample is whiter, less poor, and skewed towards Staten Island than the overall population of students in NYC public schools. This is primarily due to our interest in bus riders and the higher bus ridership in Staten Island. What follows is a discussion about our sample and its characteristics.
We exclude students in K-3 because students start taking standardized tests in grade three, and we include a lagged test score in our model. We focus on elementary students for three reasons.
First, in middle school and beyond, students tend to switch classes as they move between subjects, but elementary school students typically remain with the same group of students. Second, elemen-6 Mean class size is calculated from the pool of classrooms (2,181) such that a classroom in 2014 is distinct from a classroom in 2015 (even if the physical location is identical). All other means are calculated using the pool of 55,767 students.
tary school students generally do not choose their classes, and so while the assignment method by which students are placed in a class together is unknown (decided by the principal) class assignment is not a result of students choosing their peers. Finally, bus eligibility in NYC is uncommon for students beyond grade six (only in limited schools in Staten Island). 10 Our sample is restricted to classrooms with at least two bus riders (they do not have to share a route). Table 1 shows that bus assignment is approaching 25%, which is much higher than the nearly 10% we see over all NYC schools. There may be structural differences in schools that serve more than our target grades (such as K-12 or K-8 schools), so we omit schools serving students above grade five. 11 We limit our analysis to the general education population. 12 Because we are concerned that some classrooms which appear small in our sample may be Integrated Co-Teaching (ICT) classrooms, we remove students in classrooms of less than 20 students (5.8%).
Demographic information includes the language spoken at home and the country of birth. Most students report English as the primary language spoken at home (58.9%) and were born in the US (89%). We construct a list of the most common languages spoken at home and most common countries of birth and utilize this subset of languages in constructing our networks. Table 2 shows which languages and countries are most common and the percentage of the sample each group represents. 13 10 Students between grades three and six are eligible for subsidized (or free) transportation if they live more than a mile from school. Whether the bus is offered as a transportation choice is determined by the school principal. Weinstein et al. (2021) predicts whether a school offers the bus in NYC. They find that the largest predictors of schools offering the bus are when schools are larger, whiter, and in Staten Island or Queens. This is consistent with how our sample compares to the NYC population of fourth and fifth graders. Schools that do not offer the bus instead offer metro cards. 11 For example, there may be older students who ride the bus with the students in our sample. While our approach ignores these out-of-class peer effects, having students on the bus whom we do not observe dilutes the chance for students who ride the same bus to be in the same class. This can be systematically different for portions of the sample, and if the group of students choosing to attend K-8 schools is different from those attending K-5 schools, we introduce bias into our peer effect estimates by making the weighing matrix a function of school type (and the differences in student characteristics that comes with that). We avoid this issue by ignoring schools serving higher than grade five. 12 Specifically, we remove full-time special education students, students who attend a fully special education school, and students who ride a special education bus. 13 As we shall see in Section 4, when constructing peer networks based on homophily, we exclude 'other' categories. That is, students in these categories are not connected to one another and are isolated from the network. For example, a student who speaks French at home does not share a home language connection with a student who speaks Indonesian. 8 In our model, student, i, is the unit of observation and the classroom is the environment in 14 Most studies find that peer effects are stronger within gender or racial groups than across them. See Soetevent and Kooreman (2007), Fruehwirth (2013), Nakajima (2007), Mayer and Puller (2008), Hsieh and Lin (2017), Xu and Fan (2018), among others. 9 which students interact and perform academically. Therefore, social interactions with students in other classrooms are considered negligible. As such, our model is well-suited for estimating peer effects in elementary schools, where students rarely leave the "homeroom." Let the number of students in each of c = 1, ..., C classrooms be n c , so C c=1 n c = n is the total number of students in the sample or universe. Suppose there are k = 1, ..., K ways to demographically partition students in a classroom (i.e., gender, ethnicity, etc.). If student assignment to classrooms is random, it ensures exogeneity of the partitions to academic outcomes and also that the peer interactions due to pre-existing peer networks (i.e., the social network from the previous grade and year) are negligible.
In practice, the method of student assignment to classrooms is unknown and may not be random.
To address the issue, we include classroom-by-year fixed effects in our empirical model (later) to control for any correlated effects among students in a classroom. 15 To construct our demographic peer networks, we endow each student i in classroom c with K scalar, unordered categorical variables, h k,ic ∈ {1, 2, ..., r, ..., R k } for k = 1, ..., K, that are each selected from one of R k distinct (non-intersecting) homophily groups to which any student may only belong to one for each k. The implication is that each student belongs to K groups, and the total number of groups is K k=1 R k . For example, if the k th partition is gender, then assuming only two possible genders (boy and girl), R k = 2 and h k,ic = 1 may indicate student i is a boy, and h k,jc = 2 may indicate that student j is a girl. In our empirical analysis, the ethnicity partition has R k = 4 homophily groups (Black, White, Hispanic and Asian/other), so in this case h k,ic ∈ {1, 2, 3, 4}, and we may ascribe Black students a 1, White students a 2, Hispanic students a 3, and Asian students a 4. Variability in the the number of groups, R k , and the size of groups are important source of variability for parameter identification in our model.
Then, our peer network matrix (before row-normalization) for classroom c and demographic partition k, W * k,c , is specified as W * k,c = w * k,ijc = 1(h k,ic = h k,jc ) for i = j and zero otherwise 16 ij 15 We also examine the randomness of classroom assignment for our sample using a series of multinomial logit analyses, which we report and discuss in Appendix B.1 and which show no systematic patterns in classroom assignment associated with demographic variables. This is not direct evidence for random assignment, but indicates that classroom assignment is not a function of the associated demographic partitions. 16 Note that we are using 'exclusive averaging' where individual students are not their own peers. So, the diagonal entries of our network matrices are all zero (i.e. zero-diagonal matrices).
where 1(·) is an indicator function that equals one if the argument is true and zero otherwise. 17 Let W k,c be the row-normalized version of W * k,c . That is, each row of W k,c sums to 1, so W k,c y c is a vector of mean peer outcomes where y c is an n c × 1 outcome vector and peer groups are defined based on k th partition. Given these K homophilous weighting matrices, one may sum them to a single, aggregate network such that W * c = W * 1,c + ... + W * K,c , row-normalize it, and estimate a single peer effect on the aggregate network. This is the approach of De Giorgi et al. (2010) where k indexes different sections of nine courses to which college students are randomly assigned. This approach implicitly assumes equal weighting for all partitions (i.e., each course is equally important in creating social connections and enhancing student productivity), and it appears to be an assumption of convenience.
However, our goal is to determine the relative importance of these partition networks in explaining peer interactions. Therefore, we do not aggregate the networks and consider a more complex group interaction model such that X c is an n c ×p matrix of exogenous variables consisting of a matrix of variables Z c that vary within demographic groups and a matrix of dummy variables D c generated from the partition variables, h k,ic for all i and k. That is, D c consists of K − k=1 (R k 1) columns associated with K k=1 R k demographic groups at our disposal. Thus, X c = [Z c , D c ] and β = [ β 1 , β 2 ] where β 1 and β 2 are coefficient vectors on Z c and D c , respectively. ι nc is an n c × 1 vector of ones, and u c is an error vector satisfying E(u c |X c , δ c , {W k,c } k=1,...,K ) = 0. 18 17 As an example, consider a classroom c consisting of four students {A, B, C, D} where the gender partition contains two groups: a girl group, {A, D} (say), and a boy group, {B, C} (say). Then, There is perfect multicolinearity in the RHS variables if we include the partition dummies, D c , in the contextual Parameter λ k and θ k capture endogenous and exogenous (contextual) peer effects, respectively, associated with homophily network k, and δ c captures correlated effects among students in classroom c (Manski, 1993). Endogenous peer effects measure the extent to which one's outcome is influenced by her peers' outcomes; exogenous peer effects measure the extent to which one's outcome is influenced by her peers' characteristics; and correlated effects, in our context, may arise from self-sorting or common environmental factors faced by students in the same classroom (e.g. teacher quality). All three effects cause correlations in the outcome between students in a classroom, but only the first two are the result of peer/social interactions.
Endogenous peer effects are of particular interest since these effects produce a social multiplier.
This can be seen from equation (1) where any changes in student i's outcome affect her peers' outcomes through the endogenous peer effect term, but the change in her peers' outcomes affects her peers' peers' outcomes and also i's outcome again. These feedback and chain effects imply that the effect of any policy targeting a subpopulation can be socially shared and multiplied if such an indirect effect exists. However, separately identifying the three effects in the model is challenging since the endogenous peer effect is some function of the other effects, which can be seen from the reduced form of equation (1) (see equation (3) in the next section). Therefore, there may be strong correlations between the terms, which raises an identification issue, termed the reflection problem in Manski (1993). Various identification conditions/restrictions have been studied in the literature to separately identify the effects. In the next section we build on existing identification results to understand how our group interaction model is identified.

Identification: Variation in Group Sizes and Partition Structures
Our model is a generalization of the group interaction model of Lee (2007), who considers a linearin-means model with group fixed effects where individuals interact in groups with equal intensities. effects term, Z c . A scalar form of equation (1) is given by Indeed, Lee's model is a special case of our model where K = 1 and and R k = 1. That is, each class is completely homogeneous in terms of the homophily factor (a single demographic group).
He shows that the model is identified if there is sufficient variations in classroom sizes.
The same identification strategy applies to our generalized group interaction model, but the model partitions classmates further into multiple demographic groups, which creates much richer variation in group sizes that helps identification even though the model is more complex, as we shall see. 19 This is practically useful given the fact that schools tend to equally distribute students into classrooms. Lee (2007) also shows the rate of convergence of the peer effect estimator is slower when average group size is large relative to the number of groups. 20 Our demographic groups within a classroom are generally small, so our estimates may be more precise than a single classroom-level estimate, which is another advantage of our approach. In the sequel, following Lee (2007), asymptotic arguments are for large C, fixed K and n c being bounded away from some constant (consequently, the sizes of demographic groups within a classroom are bounded).
To understand how our model is identified, we first apply the within transformation to equation (1) to remove the correlated effects. That is, we premultiply it by Q = I − c nc ι nc ι nc /n c where I nc is the identity matrix of dimension n c such that where the second equality is because W c is row-normalized and symmetric. Let y * c = Q c y c , and X * , Z * , and u * are similarly defined. Note that the typical element of y * c c c so the model is stable and has an equilibrium. Then, the reduced form of equation (2) can be written 19 We show that model identification stems not only from variation in the sizes of demographic groups in each classroom, but also from variation in how each partition intersects with the other partitions across classrooms. 20 This is clear from equation (4) in the sequel. 21 Note that a sufficient condition for the invertibility is In the sequel, let S c (k, r) = {i : (h k,ic = r)} for k = 1, ..., K, and r = 1, ..., R k (i.e. the index set of students in classroom c whose k th partition variable is r) and n c (k, Remember h k,ic , defined in the previous section, is the k th covariate which partitions students in a classroom into R k groups. To have a better intuition for equation (3), we first consider the case K = 1. When K = 1, after some matrix algebra on equation (3), student i's outcome function (in scalar form) can be written as c is the average of z ic (the i th row of Z c ) of the demographic group based on the first (K = 1) demographic partition to which student i belongs. So (1,h ) z ic − z 1,ic c represents the difference in attributes between and her demographic group average, and the difference in attribute between her demographic group average and the classroom average. The equation is essentially the same as that derived in Lee (2007), but ours includes the additional terms in (**), which appear because R k > 1 for all k (i.e., because each classroom contains multiple demographic groups within each demographic partition). Boucher et al. (2014) provide an intuitive explanation about how the structural parameters in (*), λ, β, and θ, can be identified based on variation in classroom sizes. For ease of discussion and without loss of generality, let us assume 14 for now that all peer effects are positive. Since individuals are excluded from their own peer groups (i.e. exclusive averaging), individuals with attributes above the average essentially have a group of peers whose attributes are below average, which implies a perfect negative correlation between individual attributes and their mean peer attributes. If endogenous and exogenous peer effects exist, it is clear from equation (1) that such negative correlation reduces the dispersion in outcomes (over the case where peer effects are zero). However, the degree of the smoothing is not constant across demographic groups of different sizes. The smoothing effect is larger in smaller groups since exclusive averaging doesn't greatly effect peer means when group size is large. This is seen in equation (4) where the marginal effect of (1,h c is reduced by θ (i.e. exogenous peer effects) linearly and by λ (i.e. endogenous peer effects) nonlinearly, and the reduction decreases as demographic group size, n c (1, r), increases. This implies that, as group size varies across classrooms, so does the impact of each structural parameter on the outcome in the model. Thus, these variations can be used to separately identify the parameters. Note that the variation in group sizes in our model is richer than that in the model of Lee (2007). The sizes of multiple demographic groups within a classroom vary across classrooms, which allows us to use the variation in n c (k, r) for k = 1, ..., K and r = 1, ..., R k , not just n c . 22 Another important feature of our model is the additional term (**). Interestingly, the social effects (λ and θ) in this term no longer reduce the dispersion in outcomes, but amplify it. When there are disparities in attributes across demographic groups, the social effects exaggerate the gap.
This is intuitive since the social effects work within demographic groups in our model, not across them. This gives us an important implication that peer interactions may equalize students outcomes by improving the performance of low ability students more than that of high ability students, but it may also increase the gap across different peer groups when students interact exclusively within their peer groups. Therefore, in this setting, if there is a heterogeneous social shock that affects demographic groups disproportionately (or if there is a pre-existing gap in performance between groups), peer interaction will exaggerate the gap. 23 This suggests that if a gap exists in average performance for members of different groups within a partition, and students interact within their group, mere exposure in the classroom to other groups is not be sufficient to close performance gaps. 24 Unfortunately, when K > 1, it is not possible to obtain a simple reduced form equation analogous to (4). Instead, we resort a more general identification result from the literature to understand how additional demographic partitions and their distinct demographic groups help identify the parameters of the model. 25 From an instrumental variable perspective, identification of the within-transformed equation (2) reduces to whether we can find valid instruments for the endoge- . Note from the reduced form equation (3) that W k,c y c can be written For an instrumental variable approach to work, however, the deterministic parts of {W * K k,c y c } k=1 , which simply exclude the error term, u * c , from (5), should not be perfectly collinear with the exogenous regressors in the model, X * c and {W k,c Z * c } K k=1 . Otherwise, identifying instruments for the endogenous terms that satisfy the exogeneity, relevance, and rank conditions don't exist in the system. Therefore, identification of (2) requires the deterministic parts of {W k,c y * c } K k=1 to be linearly independent of X * c and {W k,c Z * c } K k=1 for some c. If it is satisfied, the elements of the deterministic part of W k,c y * c that are independent of the existing regressors can be used as valid instruments for them. This is the insight of Bramoullé et al. (2009) and others, and we show below that the condition is in general satisfied in our case if there is sufficient variation in group sizes and partition structures.
First, observe that W W Z * k,c l,c c for k, l = 1, ..., K, are elements of the deterministic parts of the number of students in the intersection of group s on partition k and group r on partition l. The first equality implies that the term includes the attributes of student i's peers' peers where direct peers are based on the k th partition and indirect peers are based on the l th partition, and the peer sums are divided by (one less than) the sizes of the demographic groups associated with them. Note that i is similarly given. Then, from the l,ic j=i second equality, it is clear that W k,c W * * l,c Z c for k, l = 1, ..., K includes terms similar to Z c , W l,c Z * c , and W k,c Z * c , but they won't be linearly dependent as long as there is variation in group sizes, i.e.
n c (k, r), and also variation in how each partition intersects with the others, i.e. S 26 c (k, s) ∩ S c (l, r).
Since the deterministic parts of {W k,c y * c } K k=1 include W k,c W * l,c Z c , this ultimately implies that identification of equation (2) is possible when such variation exists. When the model is identified, it is clear that W W Z * for l = 1, ..., K can be used as identifying instruments for W * 27 k,c l,c c k,c y c .
The role of variation in demographic group sizes and partition structures in the identification of the model is intuitive. Consider the case with two partitions, gender and ethnicity. In this example, assume each partition has two groups (girl and boy and races 1 and 2). If peer effects work through 26 Using the second equality, one can show that the terms are perfectly linearly dependent if i) some partitions completely overlap (e.g. all boys are race 1 and all girls are race 2, so W gender,c = W race,c for all c) or ii) groups sizes are constant across partitions and classrooms, and also the sizes of the intersections of the groups, n c ([k, s] ∩ [l, r]), are constant across s or r.
27 One can easily verify that these instruments satisfy the exogeneity, relevance, and rank conditions for valid instruments. More formal sufficient identification condition for our model in this regard may be derived using the identification results for high-order spatial autoregressive models in . both gender and ethnic lines, peer effects within girl and boy groups will first reduce the dispersion in outcomes within the two gender groups, but increase the gap between the two, as equation (4) implies. However, peer interaction within groups in the ethnic partition (ethnic groups) may play against these effects, lessening the gender gap within the ethnic groups but increasing dispersion in outcomes within gender groups by enlarging racial gap in outcomes there. The size of all of these smoothing and dispersing effects and how one effect interplays with the others depend on the sizes of groups and how groups of one partition intersect with groups of the other partitions.

Empirical model
There are a few differences between equation (1) and the model that we ultimately estimate. First, we have a panel of data, so we utilize both cross-sectional and time variations in peer networks and covariates (i.e. W k,ct and X ct ) to estimate model parameters. This also allows us to add a one period lag of the dependent variable, y c,t−1 , on the right-hand side of the model to control for student's persistent heterogeneity in test scores. Due to the addition of the time dimension, classroom fixed-effects, δ c , are now expanded to classroom by time fixed-effects, δ ct .
Second, we have assumed so far that each partition is complete. That is, we have assumed that for each partition, every student must belong to exactly one demographic group in each partition.
However, there are groups in some partitions that do not facilitate a connection between students.
For example, some students ride the bus while others do not. Those that ride the bus together may form stronger in-classroom connections. These connections may be the result of friendships strengthened while traveling on the bus together, or they may reflect homophily based on shared experiences due to shared geography. Neither of these reasons for increased interaction occur for students who do not ride the bus. In this case, the weighting matrix for bus ridership may include a block of zeros for students who do not have connections with other students in this partition (i.e. those who do not ride the bus are excluded from the bus ridership network). These serve as additional exclusion restrictions that helps identify the model. Therefore, our empirical model is: where n ct is the number of students in classroom c at time t. We include seven homophily peer  et al. (2010). Estimation details are in Appendix C. The estimation approach requires that the determinant of I K ct − k=1 λ k W k,ct is strictly positive for any c and t in equation (6) to ensure the log-likelihood function is well defined. When weights matrices are row-normalized (e.g. our case), the condition is satisfied if K | k=1 λ k | < 1. Therefore, we only consider parameter values satisfying K | k=1 λ k | < 1 in our estimation procedure. This ultimately implies that not all peer effects of different partitions can be large, but only some of them can. As we are mostly interested in the relative importance of these factors, this does not pose a problem.
When the peer effect estimates are ranked to examine the relative importance of the partitions in explaining peer interaction, the ranking statistics are not deterministic, but contain uncertainty due to sampling variability. We apply the method of 'ranking and selection' to properly infer the most important partitions from the ranked estimates. The procedures select a subset of λ 1 , ..., λ K that are statistically larger than the others at a pre-specified error rate, α ∈ (0, 0.5), while accounting for the inherent multiplicity and uncertainty in the ranked estimates,λ k 's. 29 Let λ be the population rankings of peer effects, so λ [K] = max K k=1 λ k . The rankings of peer effects estimates are similarly denoted asˆ( K) ≥ˆˆλ λ K (K−1) ≥ ... ≥ λ (1) , so λ (K) = max k=1 λ k . Ranking and selection procedures recognize the uncertainty that the partition that is estimated to have j th largest peer effects, (j), may not correspond to the partition that has the j th largest peer effects in the population, [j], in general.
To account for such uncertainty, with assuming (asymptotic) normality of the estimatesˆλ 1 , ..., λ K and general variance-covariance structure, the procedures identify a set of peer effect indices, ζ ⊂ {1, 2, ..., K} that satisfies P r{[K] ∈ ζ} ≥ 1 − α where α = 0.05, typically. In other words, the procedures estimate the (minimal) set of the population indices that includes the unknown population index that is associated with the largest parameter value, [K], with probability at seven homophily weights matrices and four covariates in Z ct , twenty eight contextual effects are estimated in our model. We do not report the contextual effect estimates in our results below to save space (these estimates are available upon request from the authors). 29 See Horrace and Parmeter (2017), who recently apply ranking and selection to economics journal citation counts to determine a subset of the 'best' journals. least 1 − α. If the inference is very sharp, ζ may be a singleton, but ζ may include all the indices if the inference is very weak. We employ the procedures to account for multiplicity and uncertainty in the ranked peer effect estimates when determining the most important homophilous networks in peer effects among NYC elementary school students. 30

Main Results
We begin with our results for mathematics and reading scores for fourth and fifth graders in Table   3. The test scores are normalized Z-scores. To save space we do not report the contextual effect results, only the peer effects (λ k , k = 1, ..., 7) and the marginal effects (β) of the covariates (X ct ) in equation (6). Table 3 shows the results for four models. Models (1) and (2) use mathematics test score as the outcome, and models (3) and (4) use reading test score as the outcome. The first model for each outcome defines a student's neighborhood as their zip code of residence (models (1) and (3)) and the second model for each outcome defines a student's neighborhood as their census tract of residence (models (2) and (4)). Notice that this change has very little effect on estimates for other variables in the model.  (2) where we define Neighborhood using census tract rather than zip code, the estimates are very similar, with a slight increase in the point estimate for Neighborhood. The four networks that were significant in model (1) remain significant in model (2), and the insignificant estimates remain insignificant as well. 30 The cardinality of ζ is increasing in K since the inference needs to make more pairwise comparisons as K increases, which makes it harder to distinguish [K] at a fixed error rate, α. This is the concept of 'multiplicity.' To conduct the inference we need critical values drawn from a k-dimensional multivariate normal distribution with covariance structure determined by the hessian of the converged log-likelihood. These critical values were all around 2.5, larger than the usual critical value of 1.96 from a univariate normal distribution and, hence, accounting for the multiplicity. 31 We should note that we have not included fixed effects for specific bus stops and bus routes due to computational constraints, although we have included an indicator for whether students ride the bus. As a result, these estimates may be biased by correlated effects related to the bus. However, these results are already much weaker than other networks, and the additional controls are unlikely to significantly increase the influence of these networks. Models (1) and (3) define the neighborhood using student's residential zip code. Models (2) and (4) define the neighborhood using census tract. Models include 28 contextual social effects, constructed using the lag test score, age, age squared, and FRPL status along with each of the seven networks included in the model, as well as neighborhood and classroom fixed effects. Hispanic is the omitted reference group for ethnicity marginal effects.

21
Turning to the results for Reading Test Z-Scores, even though the sample selection process is identical to that for the mathematics scores, there are some notable differences in peer effects across the two sets of results. First, the mathematics peer effects are typically larger than the reading peer effects except for Gender (compare 0.054 for mathematics to 0.150 for reading) and Bus Route (compare statistically significant 0.028 in reading to insignificant 0.008 in mathematics). Second, the peer effects rankings are about the same for reading and mathematics scores. The biggest difference is that Gender is ranked third in mathematics (0.054), but is ranked first in reading (0.150). Additionally, Bus Route is significant in reading (0.028) and ranks before Neighborhood.
Country of birth and Bus Route remain statistically insignificant for both outcomes. Only the census tract definition of Neighborhood is statistically significant in reading. In both outcomes, the estimate is larger for the census tract definition of Neighborhood, suggesting that the census tract is a better representation of an elementary student's neighborhood peer group. 32 The ranked peer effect estimates contain sampling errors, so we examine if the the rankings of peer effect estimates are statistically significant using the ranking and selection procedures. We find that only Ethnicity is in the subset of the best in case of mathematics test scores while both Ethnicity and Gender are in the best set in the case of reading test scores. This is essentially because the Ethnicity and Gender peer effect estimates are considerably larger than the others, so they remain in the best category even after accounting for sampling errors.
A few remarks follow: first, the three strongest networks are consistent between both outcomes (Ethnicity, Language, and Gender). Second, our estimates imply that social effects are larger in mathematics than in reading. 33 This is consistent with the story that students learn language and reading skills both at home and at school, but mathematics is learned primarily at school. Next, the primary Language spoken at home is strong for both outcomes, but stronger in mathematics than reading. The smaller effect and importance of Language in reading may come from two mechanisms. The first is that schools teach English specifically, so other language groups may play diminished roles. The second is the aforementioned idea that students primarily learn mathematics at school, such that we might expect a smaller coefficient for reading estimates in general. We explore the primary Language spoken at home and Country of birth networks further in Section B.
Additionally, these results suggest that influential peers are generally those that share obvious It is also important to notice the dimensions along which we observe performance gaps. There are significant performance gaps along both ethnicity and gender, which we observe in the marginal effects (β). Because these are also important dimensions of the peer interaction as seen in the peer effects estimates, equation (4) implies that these performance gaps are widened by classroom peer effects. Recall that endogenous peer effects mitigate performance gaps between high and low performing students as low performing students are more helped by their higher performing peers. However, they may increase existing gaps between groups if peer interactions exist within the boundaries of groups. Therefore, within ethnicity and gender groups, there may be significant peer effects, and these effects may exacerbate performance gaps between different gender or ethnic groups. Using the point estimates from  Estimates shown are the parameters of interest from the respective single-network models. Models include classroom fixed effects, four contextual effects, lagged test score, age, age squared, and fixed effects for gender, poverty, ethnicity, census tract, country of birth, language spoken at home, and whether student rides the bus. Ethnicity is defined with Hispanic as the reference group. Table 4 shows peer effects estimates when the model includes only one of our seven homophilous peer networks. The goal of this exercise is to show stark differences in peer effect estimates when they are estimated individually. 34 Nearly all estimates are larger and statistically significant, and the rankings of the estimates are quite different than before. This implies the current 34 These models include the same contextual effects and marginal effects as in Table 3. 25 practice of considering only a few characteristics individually when studying homophily may suffer from omitted variable bias, and it is indeed important to simultaneously control for all networks to tease out the individual contribution of each homophilous factor on peer interaction.

The Effect of Classroom Composition
To this point, we estimate peer effects for the seven networks of interest given the observed distribution of students in our sample. The NYC public school system is a large and diverse place containing five boroughs each with their own characteristics. We split our sample by borough in order to explore whether our estimates change based on context and by how much. Table 5 report the results for mathematics and reading respectively when we split our sample by borough. For both outcomes, the top three effects are consistent with the full sample in most splits, but there is some significant variation in the estimates that is worth unpacking.
Starting with mathematics, we notice that in all models the Ethnicity network remains the most important, but point estimates range from 0.136 in the Bronx to 0.285 in Queens. There are likely two causes to this, both of which are tied to the ethnic composition of the boroughs. The first is that the relevance of the Ethnicity network may vary between ethnic groups. Second, whether a student is participating in a large or small network may affect the importance of the network. We can think of this second mechanism as a question of the level of diversity in a classroom, and we discuss this mechanism further in Section 6.1. The second and third most important networks are usually primary Language spoken at home and Gender, although there is more variation than there was in Ethnicity, and our estimates may be less precise due to smaller samples in these splits.

An Exploration of Diversity
In this section, we ask how the effect of the Ethnicity network changes as the classroom becomes more diverse. For this analysis, we construct a measure of diversity. Notice that there are four groups of interest in our Ethnicity network: Hispanic, Black, White, and Asian/other. To deal with more than two groups, we use Theil's entropy index (Theil, 1972, White, 1986, which we normalize such that values fall between zero and one: where p ck is the proportion of students in classroom c who are members of that ethnicity k. h c is the normalized entropy index for classroom c, and we will call it the diversity index. The index is maximized when there is equal representation of all groups (i.e. ∀ k, p ck = 1 K ), and minimized at zero when only one group is represented (i.e. ∃ k, p ck = 1). We calculate this diversity index for each classroom in our sample, and then split the data based on diversity quantile. Table 6 shows the results from the subsamples above and below the median diversity index.
In both outcomes we see that the Ethnicity network is more influential in the more diverse environment. The difference is substantial. In mathematics, the peer effect drops from 0.233 to 0.192 (percent difference of 19%) and in reading it drops from 0.130 to 0.092 (34%). Table 7: Panel A divides the mathematics sample into quartiles by diversity. Column (1) is the most diverse, and column (4) contains the least diverse classrooms. Notice that columns (1)-(3) have similar estimates for the Ethnicity network, but column (4) is much smaller. Comparing column (1) to column (4) we see a drop from 0.224 to 0.153 (38%). This pattern is consistent in Panel B, for reading. Columns (1)-(3) have similar estimates of the Ethnicity peer effect, but the estimate in column (4) is starkly different. Again comparing columns (1) and (4) we see a drop from 0.128 to 0.058 (75%).
There may be two potential mechanisms at play, which we discuss with an example of two groups for simplicity: a majority and minority. If the two groups are equal sized, there is no majority, and diversity is maximized. This is analogous to column (1) in Table 7. On the other hand, column (4) occurs when the minority group is quite small, so that there are few to no peers from their group. In this case, the majority group is large enough that their group identify is largely irrelevant as a tool for subdividing the classroom and selecting friends. However, minority students may be isolated from the classroom and their group peer effect is near-zero in such a Models include 28 contextual social effects, constructed using the lag test score, age, age squared, and FRPL status along with each of the seven networks included in the model, as well as neighborhood and classroom fixed effects. Hispanic is the omitted reference group for ethnicity marginal effects. Models include 28 contextual social effects, constructed using the lag test score, age, age squared, and FRPL status along with each of the seven networks included in the model, as well as neighborhood and classroom fixed effects. Hispanic is the omitted reference group for ethnicity marginal effects.
situation. These two channels are likely to lead to the full drop that we observe: majority students find ethnicity to be a less influential network when most of their peers share the same ethnicity, 35 and minority students become isolated. Despite the difference in context, these results may help to explain why Arcidiacono and Nicholson (2005) finds peer effects along gender lines but not ethnicity lines in US medical schools, as medical schools are known to lack diversity (Lett et al., 2019).
It is important to note that this exploration is correlational. 36 We have not randomly assigned students into diverse classrooms, and school diversity is likely tied to neighborhood choices and other endogenous factors. Nonetheless, these results are striking and suggest that Ethnicity (and possibly other networks) retain their importance over most of the distribution, but drop off in importance when diversity is low. There are two questions this exercises raises which are beyond the scope of this paper. The first is whether unrelated networks are substitutes for one another. That is, if one network (ex: Ethnicity) is not diverse, does the importance of another group (ex: Gender) increase? 37 We see related behavior in Carrell et al. (2013), where USAFA students from top and bottom ability quartiles were placed in squadrons without the middle performers. This decrease in diversity lead to more within-squadron segregation along ability lines. Second, do different groups within the network place differing values on membership to their group? Hsieh and Lin (2017) finds that middle and high school females are more affected by their peers than their male counterparts. Similarly, they find that white students are more affected by their peers than other racial groups.
It is useful to look at the story told by Table 5. In Table 5, we see that primary Language spoken at home is much more important in Queens -the borough that has the highest proportion of non-35 This story is consistent with the findings of Mayer and Puller (2008). They find that minority students in particular are more likely to make connections with other minorities at their university. This effect is larger when the minority group makes up a smaller portion of the student body. 36 That said, we see some similar patterns to those presented here when we split by borough, which can be thought of as causal (attending a school outside a student's borough of residence is very rare). 37 The results in Table 6 are consistent with the idea that Ethnicity and Gender are substitutes based on above and below the median diversity level. However, this relationship is less clear in Table 7 when broken into quartiles based on diversity. The pattern shown in Table 5 is similarly noisy. Further exploration of this question is desirable.
English speakers as well as the largest borough in our sample. 38 Coupled with the findings in Table   7, the change in importance that we see in primary Language suggests a similar pattern -diversity matters for network importance.

Conclusion
In this paper, we find that shared ethnicity, gender, and language spoken at home networks are important sources of spillovers for mathematics and reading test scores in NYC elementary schools.
This supports the assumption made by much of the literature that gender and ethnicity networks are important. When modeled with these characteristics, neighborhood, bus stop, bus route, and country of birth appear less important for academic peer effects. Even though we find no significance effect, to our knowledge, this is the first study to explore the importance of bus peers in the classroom. These findings may increase our understanding of network formation within the classroom and our ability to predict group behavior after a change in the network. 39 In the standard social interaction model, endogenous spillovers mitigate performance gaps by lifting the performance of low performing students more than the performance of high performing students. In this paper, we show that if such spillovers work within the boundaries of groups, the performance gap between students in different groups increases (i.e. increased gender gap or racial disparity in academic performance). Since ethnicity is an important within-classroom network for social spillovers, this implies that known performance gaps, such as the racial performance gap, may be exacerbated even in diverse classroom settings. 40 We also show that altering the make-up of the classroom changes the impact of these networks.
In particular, we show that low ethnic diversity is correlated with low impact for that network. This is likely due to the irrelevance of the network for the majority group, but the negative impact of isolation on the minority group may also contribute to the effect. Further research is needed to disentangle these mechanisms. We also provide suggestive evidence that this is true for diversity in other networks, such as primary language spoken at home.
Lastly, note that the results we present in this paper are multiplicative peer effects which control for contextual effects. This means that interventions targeting a subset of students in the classroom may produce spillovers to those not targeted by the intervention. Our results inform which groups receive the strongest effects of these spillovers in the presence of groups. For example, a program providing additional mathematics tutoring will produce the strongest spillovers to students sharing ethnicity with the tutored students; extra tutoring in reading will result in the strongest spillovers for students sharing either gender or ethnicity. Students who share a primary language spoken at home will also see important spillovers for both outcomes. This type of information should be useful for designing optimal interventions not only in education but also across a variety of fields, and we provide a roadmap for how to utilize demographic data to obtain such useful information.
sorting along these lines, and highlight the importance of facilitating cross-group interaction as a means of mitigating these gaps.

Appendix A Simulation Experiments
We conduct two experiments to examine 1) if the NYC elementary school data contains enough variations in the sizes of demographic groups across classrooms to consistently estimate model parameters, and 2) if the relative sizes of the estimates of peer effects based on different partitions reflect their relative importance in friendship formation, when friendship is formed based on homophily. Specifically, in the first experiment, we simulate datasets using the proposed model and For the first experiment, we create C classrooms with classroom sizes randomly drawn from the empirical distribution of class sizes in the NYC data. Then, we create three partition networks by randomly assigning a gender, ethnicity and bus ridership to each student in a classroom. The three networks represent three distinctive network types in terms of the number of related groups and the level of variation in their sizes. First, the gender network (W 1,c ) includes only two groups, so the size of each gender group in a classroom tends to be large, 1 and it may not vary much across classrooms in part due to the common school practice that roughly matches the proportions of male and female students. This implies identification of gender peer effects may be challenging. Second, the ethnicity network (W 2,c ), generated based on the empirical distributions of the four largest ethnic groups in the NYC data, contains a moderate number of groups, and their size variations are relatively large in Table 1 in the main text. So, this type of network may not pose the identification difficulty that the gender network may have. The last network, bus rider network (W 3,c ) is very sparse as only about 10% of students are bus riders and there may be several bus routes that further partition the riders. 2 It is well known that this type of network sparsity or exclusion restrictions help identification of peer effects, which is well born out in our simulation results. Regressors include dummies created from the gender, ethnicity and bus ridership variables, a continuous variable, z ic ∼ N (0, 1), and its interactions with weights matrices for exogenous peer effects. The errors are drawn independently and identically from N (0, 1). With all variables generated, the outcome variables are generated from the reduced form of equation (1) in the main text for each classroom. This exercise includes thirteen parameters in total: three endogenous peer effects (λ), three exogenous peer effects (θ), and six coefficients for the rest of the regressors and the variance of the error (β and σ 2 ). We set λ 1 = λ 2 = λ 3 = 0.1, and the rest of the parameter values to be one. We simulate 1,000 times for each C = {50, 100, 500}.   Table A.1 reports the bias and standard deviation (SD) of the estimates over 1,000 replications for each case. 3 Overall, both bias and variance of the estimates decrease fast as C increases, and the bias is considerably smaller than the standard deviation, confirming that the empirical distributions of the sizes of groups in the NYC data provide sufficient variations for the consistent estimation of the model parameters. Particularly, the biases and variances ofλ 1 andθ 1 (i.e. gender-endogenous and exogenous peer effects) are always larger than that of the others peer effects estimators as we expected before.
For the second experiment, we simulate a friendship network, W c , for each classroom using a network formation model frequently used in the literature (e.g. Goldsmith-Pinkham and Im-bens, 2013) and w * ijc = 0 otherwise. U c i (j) is the utility for i of forming a link with j and is specified as jc ) + α E 1 (E ic = E jc ) + ijc and U jc (i) similarly defined. G ic and E ic are the gender and ethnicity variables simulated in the first experiment, and ijc ∼ iid N (0, 1).
Then, W c is obtained by row-normalizing W * c . The network formation model implies that there is a higher chance for two students to form a link if they share the same gender or ethnicity. After including the same set of regressors used in the first experiment, the outcome variable is generated from the reduced form of equation (1) in the main text where λW c now substitutes for K k=1 λ k W k,c . Then, we apply our partitioned peer effect model with three networks as in the first experiment to these data to investigate if the peer effects estimates deliver any information about the determinants of friendship formation. 5 We set C = 1, 000, λ = 0.3, and simulate 1,000 times for each (α G , α E ) ∈ {(1, 1), (0.5, 1), (1, 0.5), (0.1, 0.1)} where we vary the parameter values to change the relative importance of the two factors in network formation. One key assumption for the causal interpretation of educational peer effect models is that classroom assignment is random (e.g., De Giorgi et al., 2010). While we cannot prove this random assignment, we provide evidence to support this claim for this sample. For each homophilous peer network we include in our model, we would like to show that classroom assignment is not a function of the associated variable. To do this, we consider a series of multinomial logit models of the form: 7 We limit the sample for each regression to a single grade (g) in a single school (s) in a single year (t), which includes 4.44 classrooms on average, and repeat regressions until we cover the whole sample. 8 Class i is a categorical variable for assignment of student i to one of these classrooms, and x ikr is a binary indicator for group r based on demographic characteristic k of student i that we use as a partition variable in our analysis. For example, if k is the ethnicity network and r is the Hispanic group, x ikr is an indicator of whether student i is Hispanic. We convert demographic indicators with many sparse categories (i.e. 'language spoken at home' and 'bus route') to binary indicators (i.e., 'English speaker' and 'bus rider,' respectively). The result is a series of estimates (β krc ) and tstatistics, capturing the marginal effect and significance of the relevant demographic variable (x ikr ) on class assignment. Then, for each school-grade-year we randomly re-assign the students to the classrooms using a uniform distribution, and re-run the regressions. Using a quantile-quantile plot of the simulated and actual t-statistics, we compare the distributions of t-statistics. Each dot in the panels of Figures B.1 and B.2 represents a pair of t-statistics (one actual, the other simulated) for each grade-school-year. Each panel represent a single demographic indicator and contains a line 7 The variables and coefifcients in the equation are subscripted with gst, but it is suppressed for notational simplicity. 8 We exclude school-grades for which there is only one classroom available from this process, as there is no assignment decision to be made. 39 with a slope of one (a 45 degree line), so deviations in the dot patterns from this line represent a difference in the significance of the non-randomized and randomized regressions, and (perhaps) a departure from random assignment to classrooms. Figure B.1 contains the plots for female, 'English speaker,' 'foreign born' and 'bus rider,' while figure B.2 contains plots for Hispanic, black, white and Asian. All plots appear to support random assignment of students to classrooms.  We run a series multinomial logits to estimate the importance of gender, whether English is spoken at home, whether a student is foreign born, or whether bus ridership is important for class assignment. In each graph, we plots the t-statistics from these regressions against t-statisitics from a similar exercise in which we randomly assign students to classrooms. Thus we are plotting these two distributions against one another, and if the distributions are the same, we expect a straight line of slope one. We argue that these provide evidence that class assignment is consistent with a random process. We run a series multinomial logits to estimate the importance of each ethnic group in class assignment. In each pair of graphs, the left plots the t-statistics from these against the t-statisitics from a similar exercise in which we randomly assign students to classrooms. Thus we are plotting these two distributions against one another, and if the distributions are the same, we should expect a straight line of slope one. We argue that these provide evidence that class assignment is consistent with a random process.

B.2 Alternative Specifications
Tables B.1-B.3 modify the specification in Table 3 in the main text to see if our main results are sensitive to different specifications. 9 The general finding of this exercise is that the relative rankings of the peer effects tend to be robust, while the magnitudes vary slightly. For this reason, the baseline model reported in Section 5 and Table 3 in the main text is our preferred specification. The baseline model provides a clear understanding of the relative importance of all partitions of interest, and the impact of the characteristics these partitions represent on elementary school education test scores. Table B.1 contains the same model as in Table 3 in the main text (equation 6), but removes the bus networks. The bus network tends to be sparser then the other networks, and we want to be sure that this sparseness is not corrupting the other estimated peer effects. For mathematics test scores, both the ranking and estimates of the peer effects are relatively unchanged without the bus networks.
Only Zip Code increases in both models (although it is never significant in the reading models), and Ethnicity increases for reading. But these changes are slight and not meaningful. The increase in Zip Code estimate is consistent with the idea that the bus networks are small geographic networks and removing them from the model slightly increases the influence of the Zip Code network. Like most large cities, New York is geographically segregated by ethnicity and income, so it may be surprising that Ethnicity is not more affected by the removal of the bus networks. 10 Next, we further explore the primary Language spoken at home and Country of birth networks.
Nearly 65% of our sample speaks English at home, and over 90% of our sample was born in the US. We remove connections due to these networks in order to test whether their inclusion is driving our estimates for these networks. Table B.2 shows estimates of the baseline model with connections for English language and birth in the US removed. We see that the importance of Ethnicity declines in both models and the network strength for both primary Language spoken at home and Country of birth increases. For reading, the increase in the Country of birth network is enough that it becomes statistically significant (at the 5% level), which was not in the baseline model. In mathematics, primary Language was already the second most important network, and we see the estimate increase while the Ethnicity estimate decreases. This suggests that part of what makes the Ethnicity network so important is shared cultural elements such as language.
Because gender and ethnicity are the most commonly investigated demographic determinants of educational outcomes in the literature, we run the model with just these two networks for comparison. Table B.3 contains the results with only Gender and Ethnicity peer networks. In these models Ethnicity peers are important for mathematics scores (0.247) with Gender being only mildly important (0.055), but not for reading scores where Gender (0.150) and Ethnicity (0.129) are much more similar in importance. This is consistent with the findings in Table 3 in the main text, but we notice that the inclusion of other networks such as primary Language spoken at home reduces the effect of Ethnicity, but not Gender. This suggests that Ethnicity is a proxy for some of these other important networks, but Gender is orthogonal to them. About 25% of the sample is assigned a bus, but only a subset of these students share a bus route o stop with a classmate. We run the baseline model without the bus networks, as we may be concerne about sparse networks. Models include 20 contextual social effects, constructed using the lag tes score, age, age squared, and FRPL status along with each of the seven networks included in th model, as well as neighborhood and classroom fixed effects. Hispanic is the omitted referenc group for ethnicity marginal effects.
44 r d t e e Models include 28 contextual social effects, constructed using the lag test score, age, age squared, and FRPL status along with each of the seven networks included in the model, as well as neighborhood and classroom fixed effects. Hispanic is the omitted reference group for ethnicity marginal effects. Models include eight contextual social effects, constructed using the lag test score, age, age squared, and FRPL status along with each of the seven networks included in the model, as well as neighborhood and classroom fixed effects. Hispanic is the omitted reference group for ethnicity marginal effects.

Appendix C Estimation procedure
Our empirical model is given by λ k W k,ct y ct + X ct β + K k=1 W k,ct Z ct θ k + y c,t−1 γ + δ ct · ι nct + u ct (C.1) In order to apply a quasi-MLE, we assume that each element in u ct is iid(0, σ 2 ). To remove the time varying classroom fixed effects, following , we consider an transformation that eliminates the classroom fixed effects while maintaining interdependency between the errors. Let the orthonormal matrix of the with transformation matrix Q ct = I nct − ι nct ι nct /n ct be √ [P ct , ι nct / n ct ]. The columns in P ct are eigenvectors of Q ct corresponding to the eigenvalue one, such that P ct ι nct = 0, P ct P ct = I nct−1 and P ct P ct = Q ct . Then, premultiplying equation (C.1) by P ct yields y ct = K k=1 λ k W k,ct y ct + X ct β + K k=1 W k,ct Z ct θ k + y c,t−1 γ + u ct (C.2) where y ct = P ct y ct and X ct , y c,t−1 , Z ct , and u ct are similarly defined, and W k,ct = P ct W k,ct P ct for k = 1, ..., K. In equation (C.2), we use the fact P ct W k,ct = W k,ct P ct .
If no one is isolated from any of the networks, following Lee et al. where S (Λ) = I K λ W , and ( ) = y K λ W K ct nct − k=1 k k,ct ct ψ ct − k=1 k k,ct y ct −X ct β− W Z θ − k=1 k,ct ct k y c,t−1 γ. The parameter space for Λ needs to be restricted to ensure that |S ct (Λ)| and 1 K − k=1 λ k are strictly positive for all c and t, so that the likelihood is well defined. |S ct (Λ)| will be strictly positive when K | k=1 λ k | < 1 since our network matrices are row-normalized.
To simplify the estimation, we concentrate out B and σ 2 in (C.4) using the fact that the QMLE of The QMLE of Λ, Λ , is the maximizer of (C.6), and the QMLE of B and σ 2 are B (Λ ) and σ 2 (Λ ), respectively.