Network Competition and Team Chemistry in the NBA

Abstract–We consider a heterogeneous social interaction model where agents interact with peers within their own network but also interact with agents across other (non-peer) networks. To address potential endogeneity in the networks, we assume that each network has a central planner who makes strategic network decisions based on observable and unobservable characteristics of the peers in her charge. The model forms a simultaneous equation system that can be estimated by quasi-maximum likelihood. We apply a restricted version of our model to data on National Basketball Association games, where agents are players, networks are individual teams organized by coaches, and competition is head-to-head. That is, at any time a player only interacts with two networks: their team and the opposing team. We find significant positive within-team peer-effects and both negative and positive opposing-team competitor-effects in NBA games. The former are interpretable as “team chemistries” which enhance the individual performances of players on the same team. The latter are interpretable as “team rivalries,” which can either enhance or diminish the individual performance of opposing players.


Introduction
We consider a world with R interrelated networks where agents interact with peers within their own network but also interact with non-peers from other networks, but in different ways. For example, we can think of teams of individual agents that cooperate within their network but compete across networks. Competition between two or more R&D alliances comes to mind. A given firm may cooperate with an R&D ally to achieve an intellectual property discovery, but firms across alliances compete. Airline alliances (e.g., SkyTeam, Star Alliance, and OneWorld) cooperate within their networks but compete across networks. In these examples, multiple networks or teams may simultaneously compete, but in some instances, such as sports, team competition is head-to-head. In most cases, within-network peer interaction is complementary, and cross-network interaction is competitive. However, our model allows for the converse to be true. For example, in sports competition a team's performance may be worsened or enhanced when they face a better team. We restrict attention to models where an agent's single outcome (e.g., sales performance) is a function of the simultaneous outcomes of their peers and competitors. In particular, we are not concerned with the case of Liu (2014) or Cohen-Cole, Liu, and Zenou (2017), where there is a single peer network (no competitors) with multiple outcome variables (e.g., a single network where peers allocate effort to simultaneous outcomes, such as labor and leisure hours).
In these examples, social interaction decisions are likely to be guided by a central planner for each network (e.g., a sales mananger), and the choices of the planner may induce what Manski (1993)  same group tend to behave similarly because they… face similar institutional environments. " The usual solution to the correlated effects problem is to include a network-level fixed or random effect in the model specification. However, if the planner selects the network strategically and simultaneously with output (Olley and Pakes 1996), then the network itself may be endogenous. Following Horrace, Liu, and Patacchini (2016), we augment the outcome equation with a team selection equation that models the decisions of the central planners' network choices. We consider both parametric (Lee 1983) and semiparametric (Dahl 2002) approaches to the selection problem. Horrace, Liu, and Patacchini (2016) considered a network production function where a manager selects workers into a network to produce output, but they ignore cross-network competition. In this sense, our article generalizes their study. It should be noted that the conceptual foundations for selection correction in this way can be traced to articles by Brock and Durlauf. Brock and Durlauf (2006) generalized the multinomial-choice model with social interactions proposed in Brock and Durlauf (2002) and suggested that multinomial control functions may be used for selfselection corrections.
Social network interactions have been studied extensively. 1 However, simultaneous cross-network interactions remain unexplored. A few articles model simultaneous activity for a single, time-invariant network, and are multivariate extensions of the single equation spatial autoregressive model (Cliff andOrd 1973, 1981) to simultaneous outcomes. Some are for a cross-section of data, and others are for panel data. For example, Kelejian and Prucha (2004) generalized the spatial autoregressive (SAR) model to a simultaneous system for crosssection. Baltagi and Deng (2015) extended the model to panel data with random-effects, while Cohen-Cole, Liu, and Zenou (2017) extended it to a simultaneous system with fixed-effects. Yang and Lee (2017) studied identification and quasi-maximum likelihood (QML) estimation of the model of Kelejian and Prucha (2004) for cross-section. Empirical examples of these simultaneous models include the effect of regional networks on migration and housing prices (Jeanty, Partridge, and Irwin 2010); on migration, employment, and income (Gebremariam, Gebremedhin, and Schaeffer 2011); on rents for studio, onebedroom, and two-bedroom apartments (Baltagi and Bresson 2011); on simultaneous fiscal policies (Allers and Elhorst 2011); among others.
All these simultaneous peer-effect models are clearly related, but none consider multiple peer networks that may be engaged in simultaneous competition around a single outcome variable, nor do they attempt to model the actions of a manager. Our model does both and in certain cases could be applied to traditional panel data, where outcomes of peers and competitors are observed in every period. Here, however, only the actions of the manager are observed in every period, and she controls which peers will compete and be observed in any period to produce the outcome (e.g., sales). Therefore, our networks are time-varying, the model is conceptualized for repeated crosssections (not panel), and asymptotic arguments are for large n r , the number of peers competing in network r = 1, . . . , R, with a fixed number of networks, R, following Yang and Lee (2017). Nonetheless, our networks vary over time (t = 1, . . . , T), so large T helps with consistent QML estimation of peer-effects, as our simulations show. Simulations also show that when the networks are small (n r = 5), bootstrap bias correction improves finite sample performance of the estimator.
Team chemistry (a within-team peer-effect) receives substantial empirical attention as a factor influencing performance in sports and business. Unfortunately, team chemistry is difficult to measure. 2 McCaffery and Bickart (2013) estimated team chemistry as a function of biological synchrony among players, while Kraus, Huang, and Keltner (2010) find evidence that early-season, on-court tactile communication is a predictor of later-season success at both the individual and team levels. Horrace, Liu, and Patacchini (2016) developed a network production function model that estimates within-team peer-effects on player productivity in men's college basketball, but without competitor-effects. 3 Our research contributes to this empirical literature on team chemistry by controlling for the play and strategy of the opposing team, and may be conceptualized as a generalization of their model.
We apply a restricted version of our model (i.e., where competition is head-to-head) to the 2015-2016 National Basketball Association (NBA) regular season to simultaneously estimate within-network and cross-network peer-effects for all 30 teams in the league, a total of 30 × 30 heterogeneous effects. We find within-team peer-effects that are statistically significant and positive in the NBA. These are performance multipliers that enhance the individual performance of players on the same team and may be interpreted as "team chemistries. " We also find both positive and negative opposing-team competitor-effects. That is, the team chemistry of your opponents may either enhance or diminish the individual performance of players on your team. These effects may be loosely interpreted as "team rivalries. " We correct for managerial selectivity bias using both the semiparametric Dahl (2002) and parametric Lee (1983) approaches, and estimate the coach's selection probabilities from a selection equation in two ways: standard multinomial logit (MNL) and a random forest (RF) algorithm from the machine-learning literature.
The rest of this article is organized as follows. The next section introduces the econometric specification and estimation approaches, and Section 3 provides a simulation study of the proposed estimator. Section 4 presents the result of the empirical exercise, and Section 5 concludes.

Outcome Function
We begin with a general statement of the outcome function, but consider a restricted version in our application. There are R networks (alliances, chains or teams), and each network, r = 1, . . . , R contains n r peers with N = r n r . Peers cooperate within their own network but compete with members of the other networks around a common outcome. The outcome function for the rth network is where y rt is an n r × 1 outcome vector for the rth network, X 1rt is an n r × p 1 exogenous input matrix that varies over peers i = 1, . . . , n r and t (peer-level variability), ι n r is an n r × 1 vector of ones, x 2rt is a p 2 × 1 exogenous input vector that varies over t only (network-level variability), and u rt is an n r × 1 disturbance vector. The W rrt is an n r × n r weight matrix for interaction within the rth network, while W rkt for k = r is a n r × n k matrix for the effect from the kth network to the rth network. We assume the matrices have network structure and are rownormalized, so that λ rr is the average within-network peereffect for the rth network, and λ rk is the average cross-network competitor-effect for k = r to the rth network. The terms β 1r and β 2r are vectors of input coefficients for the rth network. The existing literature assumes that λ rk = 0 for k = r. We allow within-network and cross-network effects to be positive or negative. We may also refer to W rrt as the peer network and W rkt as the competitor network. 4 Following Horrace, Liu, and Patacchini (2016), each network r has a manager, who populates her network with peers in each period t, selecting n r peers from a larger group of peers at her disposal. Peers work together within their networks and compete across networks to produce y rt . 5 The strategic decisions of the managers have implications for the econometric model. First, we only observe the outcome y irt (say) for peer i in period t, when he is selected into network r by his manager. As such our data are not a panel per se and should be considered repeated cross-sections. This is an important distinction between our model and other social network models for panel data, where each peer is observed in each period and networks are often fixed over time (e.g., Lee andYu 2010, 2014). Indeed, lack of a true panel is fundamental to the need for managerial selection correction.
Also, the simultaneous actions of the managers induce a network-level correlated effect (Manski 1993). Without loss of generality we assume that all managers have the same number of workers at their disposal, n 0 > n r . Let d irt be an indicator variable such that d irt = 1, if worker i is assigned to network r in period t, and d irt = 0, otherwise. Then all managerial decisions at time t can be characterized by the n 0 × R matrix The correlated effect then implies E(u rt |D t ) = 0, and D t is clearly correlated with all the variables and networks on the righthand side of (1). Ultimately, we use arguments from the game theory literature for static games of incomplete information with multiple equilibria (i.e., Bajari et al. 2010) to model managerial decisions and their effects on network peers and competitors, but for now we assume that the actions of the managers affect all active peers in network r in the same way. Hence, the correlated effects reduce to time-varying network fixed-effects. That is, E(u rt |D t ) = ι n r α rt , where α rt is a scalar. Let u * rt = u rt − ι n r α rt . Also, let X rt = [X 1rt , ι n r x 2rt ] be a n r ×p matrix with p = p 1 +p 2 and β r = [β 1r , β 2r ] , then the entire system at time t is, where y t = (y 1t , . . . , y Rt ) , X t = Diag(X 1t , . . . , X Rt ) , α t = (ι n 1 α rt , . . . , ι n R α Rt ), u * t = (u * 1t , . . . , u * Rt ) , and β = (β 1 , . . . , β R ) . G rkt is an N × N block matrix with R row blocks and R column blocks. The blocks in G rkt are all blocks of zeros except for rth row block in the kth column block position, which equals W rkt . For example, if there are R = 2 networks, then , and G 21t = 0 0 W 21t 0 where 0 is a conformable matrix of zeros. We require the following regularity conditions. Assumption 1. Let u * irt be the ith element of u * rt . The u * irt are iid(0, σ 2 r ) and a moment of order higher than the fourth exists.
Assumption 3. The W rkt are uniformly bounded in both row and column sums in absolute value. Also, S t ( ) −1 is uniformly bounded, uniformly in ∈ L.
Assumption 1-3 are standard and follow Lee, Liu, and Lin (2010) and Yang and Lee (2017) with a few differences. 6 Following Yang and Lee (2017), consistent QML estimation of the outcome function in (2) proceeds as n r → ∞ (with n 0 → ∞ necessarily). However, due to independence across t in Assumption 1, the time dimension helps with consistency. 7 In particular, it allows us to appeal to the arguments of Lee, Liu, and Lin (2010). They consider a cross-section (T = 1) of a large number of independent networks with a homogeneous peer-effect, and prove that the effect may be consistently estimated with QML, as either the number of peers in a network or the number of independent networks increases. Similar to Lee, Liu, and Lin (2010), the system-wide network R r=1 R k=1 λ rk G rkt in (2) can be seen as a single network at time t. The network changes over time, as managers change peer composition, and can be seen as T networks in a single period, so asymptotic arguments similar to Lee, Liu, and Lin (2010) apply. That is, QML is consistent as n r T → ∞ for fixed R with u * irt independent across t, and this is borne out in our simulations study. 8 Relative to Lee, Liu, and Lin (2010), the only complication that QML estimation of (2) presents is that we have multiple peer-effects (λ rr ) and competitor effects (λ rk ). However, they are time invariant, so time variation in the networks W rrt and W rkt helps reduce the mean squared errors of their estimates.
The compact parameter space of Assumption 2 is also standard and ensures consistency and desirable asymptotic properties of QML. The implication is that S t ( ) can be inverted, ensuring stability and an equilibrium where y t = S t ( ) −1 (X t β + u t ). A positive determinant ensures the likelihood function is well-defined. Following Yang and Lee (2017) where τ it is an eigenvalue of M t ( ), a sufficient condition satisfying this assumption is max i τ it < 1. From the spectral radius theorem, max i |τ it | is less than any of its induced matrix norms, so the condition is satisfied when ||M t ( )|| ∞ < 1 or ||M t ( )|| 1 < 1.
, a sufficient condition for Assumption 2 is that L is restricted such that either R k=1 |λ rk | < (max r,k ||W rkt || ∞ ) −1 or R r=1 |λ rk | < (max r,k ||W rkt || 1 ) −1 for all r, k. The first condition reduces to R k=1 |λ rk | < 1 when the network matrices are rownormalized. 9 This means the sum of the absolute values of the within-and cross-network effects from or to a network must be bounded, while the row or column sum of W rkt is also bounded.
These conditions have important implications for our model. First, should be sparse if R is large, so the effective number of network effects from or to a network is bounded. Second, the number of linkages for each peer in W rkt for all r, k should be fixed as n r increases. That is, W rkt should be sparse as n r increases. This is discussed in Lee (2004) and is borne out in our simulations, as the network effect cannot be consistently estimated if W rkt is not sparse when n r increases. Here, R is fixed. Increasing R will produce richer network interaction structure, but it increases model complexity. Moderate to small R may be appropriate in our context as network-pairwise interactions may only exist when there are few networks in the market. In a market with many networks, the effect of a single network on other networks may be negligible.
Assumption 3 is standard and limits spatial dependence to a manageable degree (e.g., Kelejian and Prucha 1999). This assumption will be met if both the sufficient conditions for Assumption 2 are satisfied. We discuss sufficient conditions for identification of the model in the next section.

QML Estimation of the Outcome Function
We focus on QML estimation of the outcome function in (2). 10 Following Lee, Liu, and Lin (2010), we remove the bias term α t to avoid the incidental parameter problem (Neyman and Scott 1948) by transforming the model with projector J Q = Diag(Q 1 , . . . , Q R ), where Q r is the "within" transformation matrix, Q r = I n r − 1 n r ι n r ι n r . Now, Q r ι n r = 0 and Q r u * r = Q r u r . Therefore, we have, where X 1t = Diag(X 11t , . . . , X 1Rt ) .
Extensive study of identification conditions for network models and multivariate SAR models can be found in Bramoullé, Djebbari, andFortin (2009), Cohen-Cole, Liu, andZenou (2017), and Yang and Lee (2017). Let r be a n r × N matrix consisting of R horizontally concatenated blocks of size n r , with I n r in the rth position and zeroes in the other R − 1 positions. Hence, 1 = (I n 1 , 0), R = (0, I n R ), and r = (0, I n r , 0) for r = 1, R, where the 0 matrices are appropriately conformable. Let rkt = r G rkt S −1 0t X 1t β 1,0 and S 0t = I N − R r=1 R k=1 λ rk,0 G rkt . In the online supplementary materials, we show that the true parameters 0 (as defined in Assumption 2) and β 1,0 can be identified from (3) under the condition, Assumption 4. The matrices [Q r X 1rt , Q r r1t , . . . , Q r rRt ] have full column rank ∀r and some t.
The identification condition corresponds to the conditions in  and Yang and Lee (2017), and it will be generally satisfied here, because we have multiple network matrices and exogenous regressors from each r, which produces enough variation to identify the coefficients in our model. 11 The disturbances in (3) are linearly dependent because the covariance matrix σ 2 r Q r is singular. Following Lee, Liu, and Lin (2010, p. 150), we consider "an essentially equivalent but more effective transformation" to eliminate the network fixed-effects. Let the orthonormal matrix of Q r be [P r , ι n r / √ n r ]. The columns in P r are eigenvectors of Q r corresponding to the eigenvalue one, such that P r ι n r = 0, P r P r = I n r −1 and P r P r = Q r . Then, premultiplying (2) by J P = Diag(P 1 , . . . , P R ) leads to Letȳ t = J P y t ,X 1t = J P X 1t ,ū t = J P u t ,Ḡ rkt = J P G rkt J P . Now, J P G rkt =Ḡ rkt J P . 12 This implies, We derive the QML function and the concentrated QML function of (5) in the online supplementary materials. Call the QML estimatesˆ andβ 1 the "first-step" estimates. If all we care about is the peer-and competitor-effects, then this is the only estimation step. However, if estimation of β 2r and managerial selection biases are important, then there are two additional estimation steps to consider. In the "second-step, " we estimate probabilities that the manager and opposing managers select teams of peers and competitors into the peer-and competitornetworks, respectively. Estimation of these "selection probabilities" proceeds with MNL or RF, and allows us to calculate the network-level selection bias of each manager as a single index. 13 These procedures include exogenous state variables, z rt , which explain the managers' selections and will be discussed later. In a "final-step, " we estimate β 2 with a regression of the residuals from the first-step on network-level covariates, x 2t and the index.

Bias Due to Strategic Interactions
Thus far, we have assumed that the endogeneity induced by the manager's choices can be controlled with a network level fixedeffect. The fixed-effect is then removed from the model with the within transformation, so that the peer-effects and competitoreffects can be consistently estimated as n r → ∞ with QMLE.
11 An example that does not satisfy the condition is when the W rkt are complete, which is the case for our empirical application. In Section 4, we discuss an exclusion restriction in the weighting matrices to address the issue. When exogenous network effects (W rrt X rt , say) are included in the model, the R matrices in Assumption 4 include additional sub-matrices that are higher orders of G rrt , which will make the condition more difficult to satisfy.
12Ḡ rkt J P = J P G rkt J Q = J P G rkt (I − Diag(ι n 1 ι n 1 /n 1 , . . . , ι n R ι n R /n R )) = J P G rkt , because G rkt is row-normalized so J P G rkt Diag(ι n 1 ι n 1 /n 1 , . . . , ι n R ι n R /n R ) = 0. 13 Other flexible estimation methods may be used, such as kernel smoothing or local polynomial regressions (e.g., Bajari et al. 2010). RF is considered here as is known to be effective in handling high dimensional data and in predicting nonlinear relationships.
We may also estimate coefficients on the explanatory variables that vary at the peer-level (X 1t ). Unfortunately, coefficients on the explanatory variables that vary at the network-level (x 2t ) are not identified. To address this issue, we set up a static game in period t to formulate and estimate the selection bias. 14 We follow and adapt the basic methodologies in the game theory literature for static games of incomplete information with multiple equilibria. Arguments follow Bajari et al. (2010). 15 Each network r = 1, . . . , R has a network manager who takes action a rt at time t from her finite and discrete choice set of actions, a rt ∈ A = [0, 1, . . . , K], so a t = (a 1t , . . . , a Rt ) is a vector of actions for all managers. 16 Ideally, A would be the set of all possible combinations of n r workers from the n 0 workers at the manager's disposal (i.e., all possible values of d rt , so the set of observed actions of all managers corresponds to a mapping from the matrix D t ). However, in practice considering all combinations may be infeasible and econometrically undesirable. One solution would be to restrict A to only those choices that are observed in the sample (see Horrace, Liu, and Patacchini 2016). For example, an NBA coach may never choose a lineup of his five worst players, so we can exclude this action from the choice set. Alternatively, if there is some variable that maps the characteristics of each potential combination of workers into fewer choices, then this may be more practical. This may also allow us to specify adjacency matrices that satisfy Assumption 4. For example, an NBA coach may be interested in creating positional mismatches (e.g., guards defending forwards and vice versa) between his players and the opposing team's players. Here, the choice of the number of guards may summarize the coach's actions, so K = 6, and it may imply that network interactions should be based on player position types. We use this action set and a network adjacency matrix based on player position in our NBA application.
Let a −rt = (a 1t , . . . , a (r−1)t , a (r+1)t , . . . , a Rt ) be the vector of actions for all managers, excluding manager r. Each manager has a vector of exogenous state variables z rt , so the vector of state variables (i.e., market conditions) for all R managers is z t = (z 1t , . . . , z Rt ) . 17 The state variable z t is common knowledge and observable to the econometrician. Manager r is also subject to an idiosyncratic shock over her possible actions, e rt (a rt ) ∈ [e rt (0), . . . , e rt (K)]. These shocks, e rt , are iid over a rt and over r and t with density G e (e rt ). The shocks are private information to manager r, and are unobservable to the econometrician. Let e rt be the K + 1 vector of e rt (a rt ) for all a rt ∈ A. Then, managers simultaneously choose their actions based on their individual information sets {z t , e rt }, so manager r's decision rule is a scalar 14 There may be other econometric remedies to address the endogeneity issue. Recently, two categories of methodologies have been proposed to address endogeneity in formation of spatial or network links: one is a (Bayesian) "one-step full information" approach by Goldsmith and Imbens (2013) and Hsieh and Lee (2016), and the other is a "multiple-step control function" approach by Qu and Lee (2015) and Horrace, Liu, and Patacchini (2016). 15 Generalizing the following model to a dynamic game is left for future research. 16 Following Bajari et al. (2010), we assume without loss of generality that A is the same for all managers. 17 Per Hoshino (2019), it is desirable that z rt contain variables excluded from (2).
function a rt = η r (z t , e rt ). Under the chosen actions, networks produce single outcome, y t . Let manager r's scalar utility function be v r (a rt , z t , e rt ) = π r (a rt , a −rt , z t ) + e rt (a rt ). Then, the conditional choice probability of r choosing a r at a given realization of z t is given by which may be interpreted as the beliefs formed by r's opponents regarding r's decision. Here, δ r is a scalar function. Since network manager r does not know the other managers' decisions at the time of her decision, her strategy is based on her expected payoff for choosing action a rt , Here, ϕ r is the deterministic part of the expected payoff function and is a scalar function. We can see that the expected payoff function is similar to the standard random utility model. The only difference is that the probability distributions over other managers' actions are affecting manager r's utility. Then, it is straightforward that a rt will be chosen for a given realization of z t , if and only if in equilibrium. 18 The equilibrium can be seen as a Bayesian-Nash equilibrium (BNE) in a probability space with measure δ r , r = 1, . . . , R, and summarized by a best-response mapping that maps a compact set to itself (i.e., [0, 1] R → [0, 1] R ). It is also continuous in δ r , so equilibrium existence follows from Brouwer's fixed-point theorem. 19 The next section formulates the selection bias from δ r .

Formulation of the Selection Bias
A standard approach to formulate the selection bias is to assume the outcome error, u rt , and the payoff function error, e rt (a rt ), are statistically dependent. This idea can be traced back to Heckman's (1979) selection model and Lee's (1983) is the joint distribution of u irt and * rt , which depends on ϕ r (z t ) and may differ across i, in general. Following Bourguignon, Fournier, and Gurgand (2007), if the two errors are dependent, (9) The correlation may exist when the two errors contain a common component, unobserved by the econometrician but observed (or predicted) by the manager. In (9), the joint distribution g ir (·) varies by peers in a network, which may be true if individual peers respond to the common shock differently (or the manager selects workers based on individual performances). However, in team production it may be reasonable to assume that the manager is optimizing team performance and not individual performance. Therefore, peer-level heterogeneity may be negligible, conditional on her selection. Also, note again that we do not have a true panel of data, so it is difficult to model and account for peer-level heterogeneity, in general. Therefore, we assume the following to arrive at a network-specific selection bias.
Assumption 5. The joint distributions of u irt and * rt conditional on ϕ r (z t ) are identical for every peer i = 1, . . . , n r in network r at time t.
Then E(u rt |a rt , ϕ r (z t )) = ι n r α r (ϕ r (z t )), which are network specific fixed-effects due to the strategic actions of the network managers. This is what Manski (1993) called a correlated effect. We simply follow the standard solution to the correlated effects problem by including a network-level fixed-effect, but we explicitly model the correlated effect by modeling a managerial selection equation. This allows us to separately identify the managerial effect from other network-level coefficients (β 2r ) in (2).
Per Dahl (2002), estimation of the unknown control function α r (ϕ r (z t )) suffers from "the curse of dimensionality" due to the presence of a large number of alternatives and its dependence on the unknown function ϕ r (z t ). To make estimation feasible, restrictions need to be imposed on the control function. We consider parametric and semiparametric approaches following Lee (1983) and Dahl (2002), respectively.
1. Lee's approach: Let the distribution of * rt be F r . Following Lee (1983) and Horrace, Liu, and Patacchini (2016), we can reduce the dimensionality of the selection bias by the transformation J r (·) ≡ −1 (F r (·)), where −1 is the inverse of the standard normal CDF. Then, J r ( * rt ) becomes a standard normal random variable. For notational simplicity, let J r ( * rt ) ≡ rt . Per Schmertmann (1994), Dahl (2002), and Bourguignon, Fournier, and Gurgand (2007), it is implicitly assumed in Lee's approach that the joint distribution of u irt and rt does not depend on ϕ r (z t ). That is, 21 To obtain an explicit 21 See Schmertmann (1994), Dahl (2002), and Bourguignon, Fournier, and Gurgand (2007) for more detail about the implications of the restrictions on the joint distribution of the errors in Lee's and Dahl's approaches. The parametric form of the bias, we further assume rt and u irt are iid over i and t with joint normal distribution, then it can be shown that, a scalar, where δ r (a rt |z t ) is the selection probability from (6). 2. Dahl's approach: We make the index sufficiency assumption of Dahl (2002) such that where ψ r (·) is an unknown scalar function that may be estimated non-parametrically. Per Dahl, this assumes that the selection probability δ r (a rt |z t ) exhausts all information about the behaviors of the two errors. That is, the joint distribution of u irt and * rt depends on ϕ r (z t ) only through δ r (a rt |z t ).

Estimation of the Selection Bias
To estimate the strategic bias induced by the managers and the coefficients on network-varying exogenous variables (x 2rt ), we consider a three-step estimation procedure as follows: 1. First step: Estimateˆ andβ 1 from (5), and for r = 1, . . . , R and t = 1, . . . , T compute residuals, υ rt = ι n r (y rt − R k=1λ rk W rkt y kt − X 1rtβ1r )/n r . 2. Second step: Estimate the selection probabilityδ r (a rt |z t ) for r = 1, . . . , R and t = 1, . . . , T using a parametric or nonparametric model. We use MNL for a parametric approach and RF for a nonparametric approach. The statistical properties of the MNL are well-known, and one can use the Murphy and Topel (1985) correction to account for sampling variability of the estimated probabilities in calculating standard errors in the final-step estimates below. For RF we use the bias reduction method proposed by Chernozhukov et al. (2018) which uses Neyman-orthogonal scores to remove bias in semiparametric models, where the nonparametric component is estimated with a machine learning method. The method allows the final-step estimators to be √ T-consistent, even when RF converges more slowly. To ensure that RF consistently estimatesδ r , we restrict the maximum number of splits in a tree (equivalently, the depth of a tree) based on the asymptotic results of Scornet, Biau, and Vert (2015) and Wager and Walther (2015). 22 In simulations, we use two types of cross-validation methods to tune the maximum number of splits, and examine their effect on the mean squared error of the final-step estimates. 23 assumption implies that the correlations between u irt and e rt (a rt )−e rt (a rt ) must have the same sign for all a rt = a rt , which means in our context that the random shock should have the same implication to the different actions (i.e., it can have a positive or negative effect, but should be the same for all the actions). This is relaxed slightly in the Dahl's approach where the bivariate covariances can differ in sign but only as a function of the δ r (a rt |z t ). 22 The consistency results in Scornet, Biau, and Vert (2015) and Wager and Walther (2015) are for regression trees, while we use classification trees. We conjecture their regularity condition is appropriate here. Simulations support this conjecture. 23 In our simulations and application, RF is implemented in MATLAB using the command fitcensemble.
3. Final step: The selection bias and β 2r can be estimated from the OLS regressionυ rt = x 2rt β 2r + γ r (δ r (a rt |z t )) + ξ rt where ξ rt is an iid error term and γ r (δ r (a rt |z t )) is either given by the right-hand side of (10) withδ substituted for δ (Lee's approach) or by ψ r (·) in (11) (Dahl's approach). For Dahl's approach, we use polynomial approximations of ψ r and select polynomial order using the Akaike information criterion (AIC). 24 Newey (1994Newey ( , 1997Newey ( , 2009) and Andrews (1991) provided regularity conditions and basis functions for semiparametric models, like Dahl's approach. 25 In our simulations and application, we use the bootstrap to calculate standard errors of the final-step estimates, thereby accounting for variability in estimates upon which they are based. 26

Simulation Experiments
Our simulation study is for R = 3 networks. To give a concrete example, we consider three real estate companies competing in a region. Each group has n r = 5 (or n r = 10) agents and each agent in a group is responsible for different parts of the region. We assume that if the agents' areas of responsibility are adjacent, then they are competing across networks or cooperating within their network. We randomly assign agents into areas within the region and use a contiguity weighting scheme to specify connections. We generate random matrices, W rkt for r, k = 1, 2, 3, where each entry is randomly assigned with 0 or 1, while maintaining symmetry of the interaction matrix, . We then row-normalize each W rkt . We has the strongest within network effect (λ 11 = 0.4) but is also strongly affected by the other two networks: negatively by the second network (λ 12 = −0.2) and positively by the third network (λ 13 = 0.1). The second network is only affected by the first network but has no within network effect. The third network is only affected by the first, and has an intermediate within network effect (λ = 0.2), relative to others. 27 The agents in the three networks produce outcomes over T = 500 (or T = 1000) time periods. The data generating process for the outcome function is, y 1t = λ 11 W 11t y 1t + λ 12 W 12t y 2t + λ 13 W 13t y 3t + X 11t β 11 + ι n r x 21t β 21 + ι n r α 1t + u 1t , y 2t = λ 21 W 21t y 1t + λ 22 W 22t y 2t + λ 23 W 23t y 3t + X 12t β 12 + ι n r x 22t β 22 + ι n r α 2t + u 2t , 24 Newey, Powell, and Walker (1990) used the generalized cross-validation (GCV) of Craven and Wahba (1979) in this context, which is approximately equivalent to AIC (Wang et al. 2007). 25 For consistency and asymptotic normality, the number of basis functions should increase with the sample size. The number of basis functions will be selected by the econometrician in practice. 26 There is no theoretical guarantee that the bootstrap delivers valid inference here. However, simulation results reported in Table 2 (in the main text) and Tables 4-6 (in the online supplementary materials) show that the bootstrap standard errors approximate the empirical standard errors of the final step estimates reasonably well for MNL and RF. 27 We experimented with different and the results were qualitatively the same. y 3t = λ 31 W 31t y 1t + λ 32 W 32t y 2t + λ 33 W 33t y 3t + X 13t β 13 + ι n r x 23t β 23 + ι n r α 3t + u 3t , where X 1rt and x 2rt for r = 1, 2, 3 are the peer-level exogenous variables and the network-level exogenous variables, respectively. The X 1rt contains two regressors drawn from a bivariate normal distribution with a variance of 1 and a correlation of 0.5. The network-level exogenous variable, x 2rt , contains an intercept and two regressors, x 2rt (0), x 2rt (1), and x 2rt (2) (respectively), independently drawn from a standard normal distribution. The network-level exogenous variables will be specified in greater detail later, so as to induce a selection bias into the system. The network-level exogenous variables will be eliminated from the model with the within transformation but will be recovered in the estimation of the network managers' strategic bias in a final step. We set all elements of β 1r (i.e., β 1r (1) and β 1r (2)) and β 2r (i.e., β 2r (0), β 2r (1), and β 2r (2)) equal to 1 for r = 1, 2, 3, and each element in u rt is distributed iidN(0, 2) for r = 1, 2, 3. The α rt for r = 1, 2, 3 are the strategic bias terms, and we adapt the experimental design by Schmertmann (1994) and Bourguignon, Fournier, and Gurgand (2007) to model them. For simplicity, we assume the choice specific utility function for the managers of the three networks are the same (i.e., ϕ r = ϕ). For each time period t, the manager of the network r has three choices on her network, A = (1, 2, 3). Then her decision is made by the following rule 28 : an option s ∈ A is chosen if only if ϕ(s, z t ) + e rts > ϕ(s , z t ) + e rts for ∀s = s. The z t contains an intercept and two regressors, z t (0), z t (1), z t (2), and the two regressors are independently drawn from U(0, 1). The e rts for s = 1, 2, 3 are independently drawn from a Gumbel distribution, which naturally leads to the familiar MNL specification. The Gumbel distribution DGP provides an estimation advantage for MNL. The choice-specific utility function, ϕ(s, z t ), is specified in two ways, linear and nonlinear, such that 1. Linear: ϕ(1, z t ) = 0; ϕ(2, z t ) = 1 + z t (1); and ϕ(3, z t ) = 2 + z t (2). 2. Nonlinear: ϕ(1, z t ) = 0; ϕ(2, z t ) = Beta 5,1 (z t (1)); and ϕ(3, z t ) = Beta 5,1 (z t (2)), where Beta a,b (·) is the beta density with shape parameters a and b. Here, ϕ(1, z t ) is normalized to zero for identification of MNL. Given the two specifications of ϕ above, we generate two sets of T choices for each manager r. Next we set up a correlation between z t and x 2rt such that x 2rt (j) = 0.9 × z t (j) plus a draw from a N(0, (1 − 0.9 2 )/12) random variable for j = 1, 2 and r = 1, 2, 3. 29 Following Schmertmann (1994), let ζ rt = (ζ rts , ζ rts ) = (e rts −e rts , e rts −e rts ), where s is the selected option, and s and s are the nonselected options. Also, let F e (ζ rt ) = 1+exp(−ζ rts )+ exp(−ζ rts ). We generate the selection bias in two ways: 30 28 Simulating the selection equation precedes simulating the outcome function, but for the purpose of exposition we described the outcome function first. 29 The correlation between z t and x 2rt is to make sure selection bias works for x 2rt , not only for the intercept. The large correlation is to make bias apparent so estimators are easily contrasted. 30 See Schmertmann (1994) and Bourguignon, Fournier, and Gurgand (2007) for other forms of the selection bias.
1. Lee's model (monotonicity) and ρ is a scalar, which we discuss below. 2. Dahl model (nonmonotonicity) (13) is simply the conditional mean of the residuals of the outcome equation under Lee's (1983) approach, and (14), proposed by Schmertmann (1994), is an example of a violation of the assumptions in Lee (1983). In particular, it violates monotonicity of the selection bias. We expect Lee's approach will work well for (13), whereas Dahl's approach will work better for (14). Following Schmertmann (1994), we adjust the experimental parameters, ρ, ρ 2 , c 1 , and c 2 , so that the bias term explains roughly 25% of the variation in the outcome. Therefore, we have a total of four different DGPs for the selection bias: two types of choice-specific utility functions (linear and nonlinear), and two types of bias generating processes (monotonic and nonmonotonic). With all variables generated, the outcome variables, y rt for r = 1, 2, 3, are generated from the reduced form of (12) for each r and t. For estimation, we consider several different approaches. In the first step, we estimate the network effects with and without selectivity bias correction. That is, "with selectivity bias correction" means estimation of the within-transformed model in (5), and "without selectivity bias correction" means estimation of (2) ignoring α t . In the second stage we estimate selection probabilities, using standard MNL and RF. For RF we experiment with tuning the maximum number of splits in a tree using two crossvalidation methods: one based on out-of-bag classification error and another based on 5-fold cross-validated error. We compared these methods to the common practice of naively setting the maximum number of splits to T − 1 (a "fully grown" tree or a "tree without tuning"). To save space, a discussion of these methods and complete simulation results are reported in the online supplementary materials. However, we found that 5-fold cross-validation worked best in terms of bias and variance of the final-step estimates, so in the main text we only report RF results based on this tuning method.
Next, using the probability estimates we formulate the selection bias term γ r using both the Lee approach in (10) and the Dahl approach, for which we use polynomial approximations of ψ r in (11). Then, we estimate the network-level coefficients, β 2r , in two ways: with and without the estimated selection bias term γ r included in the regression ofυ rt on x 2rt . These are reported along with ρ from Lee's model in (13) when appropriate. We perform 1000 draws for each simulation design.

Simulation Results: Peer-Effects (λ rr ) and Competitor-Effects (λ rk ) Estimation
We do not report coefficient results for (X 1rt ), but they are available upon request. Table 1 reports the empirical mean, standard deviation (SD), and root mean squared error RMSE of the network effect estimates over 1000 simulation draws. The first column contains the true values of the effects, and the second and third columns contain estimates without selectivity bias correction for the {n r , T} = {5, 500} and {10, 1000} designs, respectively. These estimates are contaminated by the managers' network selections and are (mostly upward) biased. For example, in the first row λ 11 = 0.4 is estimated with a mean of 0.6451 in the {5, 500} design (column 2), and the bias worsens to a mean of 0.7598 as {n r , T} increases to {10, 1000} (column 3). Columns 4-7 are for various {n r , T} designs, but with selectivity bias correction (i.e., the within transformation in (5)). The last two columns employ both the selectivity bias correction and a bootstrap finite sample bias correction, discussed below. Notice that the empirical SD of the estimates with "no bias correction" is smaller than with "selection bias correction only. " For λ 11 = 0.4 in the {5, 500} design, compare SD (0.0138) with no bias correction (column 2) to SD (0.0192) with bias correction (column 4). This occurs because the within transformation of the bias corrected estimates reduces the effective sample size. We can also see that the number of peers in a network, n r , and the number of time periods, T, have different effects on the bias and variance of the selection bias corrected estimates (columns 4-7). First, as T increases, SD decreases, but the opposite is true for n r . This is because, in our setup, the network becomes more dense as n r increases due to the increased number of effective peers or competitors. This makes it harder to estimate the network effect consistently because within-variation in a network decreases. Lee (2004) analyzed this phenomenon, and Lee, Liu, and Lin (2010) provided simulated evidence suggesting that this may be avoided if network weights are constant as network size increases. Thus, if network sparsity is maintained, increasing n r or T will have the same effect on the variance of the estimates. 31 Moreover, moderately sized n r is important for reducing bias. For most cases in Table 1, when n r = 10, SD is approximately equal to RMSE, implying that the empirical distributions of the estimates are well centered on the true values. However, when n r = 5, there is a sizable finite sample bias across all the network effect estimates which persists as T increases. (We conjecture that Assumption 4 may only be weakly satisfied in this case due to a lack of variability in the network topology as T increases.) This may be a concern in our empirical application where the network size is only 5. Therefore, we use a parametric (residualbased) bootstrap to remove the bias here and in our application. From the full sample, we obtain initial QML estimates,ˆ 0 , and the residuals. The residuals contain the errors as well as selectivity bias and the effects of the group level regressors. Then we generate b = 1, . . . , B bootstrap samples usingˆ 0 as the true parameter in the following way: we randomly sample with replacement a set of residuals for our three networks for a given t, so that we can maintain network dependency in our data over time. Then we use the reduced form of (12) to generate bootstrap outcomes y * and compute bootstrap estimatesˆ * (b) . Bootstrap bias correction is then done by˜ = 2ˆ 0 − 1 B B b=1ˆ * (b) . 31 Additional unreported simulations confirm these sparsity results. For each parameter λ rk we report the mean, the (standard deviation), and the RMSE over 1000 simulated draws. Two types of bias correction employed: selection bias correction only (columns 4-7), and selection bias correction + bootstrap finite sample bias correction (columns 8 and 9).
We set B = 300 in our simulations and empirical application. 32 A similar bootstrap bias correction is considered in Kim and Sun (2016) for nonlinear panel data models with fixed-effects. We also use these bootstrap samples to compute the standard errors of the estimates of the network-level coefficients, β 2r , in the final-step estimates below. 33 For the first step, the estimated peer-and competitor-effects in the last two columns of Table 1 are well centered on their true values even when n r = 5. Table 2 reports selected estimation results of the networklevel coefficients. Even though they are selected results (to save space), they are not atypical. Full results are in the online supplementary materials. All these results are computed from the final-step regression ofυ rt (the residuals from the firststep) on x 2rt with or without the selection bias correction term (γ r ) as described in Section 2.2.2. We report both MNL and RF results but only RF results with 5-fold cross-validation 32 QML estimation in the first step is computationally costly, so having a large B is not practical. We examined the effect of B on the estimation results and found that B = 300 provides reasonable results without computational burden. 33 For these estimates, the bootstrap can be seen as a nonparametric bootstrap.

Simulation Results: Network-Level Coefficients (β 2r ) Estimation
tuning. 34 The first-step estimation includes bootstrap finite sample bias correction and selectivity bias correction in all cases, corresponding to the last two columns of Table 1. Table 2 presents averages over all R = 3 networks of the empirical bias (Bias), root mean squared error (RMSE), and bootstrap standard error coverage rate with a target rate of 0.95 (95% coverage rate) for estimates of the network-level coefficients (β 2r ) over the 1000 simulation draws for each of four DGPs described above. For Dahl's approach, estimates of the intercept and the polynomial terms are omitted from the table. 35 Columns 2-4 under "no bias term" contain the regression results without the selection bias term, γ r . Obviously, these estimates show much larger RMSE than the other estimates in the table (columns 5-16), which include the bias correction term. For example, for "DGP1" in the {5, 500} design, compare the RMSE for β 2 (1) of 0.42 (column 3) to 0.25 (column 6). The next four columns (5-8) and the following two columns (9 and 10) under "MNL" contain results for Lee's and Dahl's approaches, respectively, when MNL is used to estimate selection probabilities. Similarly, the last six columns (11-16) under 34 5-fold cross-validation worked best in terms of RMSE of the final-step estimates. The results based on the other tuning method and nontuning results are in the online supplementary materials. 35 Empirical standard deviation (SD) and average bootstrap standard error estimates (avg. bootstrap SE) for the estimates are included in the online supplementary materials.  "RF" contain a corresponding set of results when RF is used for estimation of the selection probabilities. As expected, Lee's approach with MNL outperforms the others in DGP1 in terms of bias and RMSE, whereas Dahl's approach with MNL shows the best performance in DGP2. Interestingly, when we switch between DGP1 and DGP2, performance deterioration of Lee's approach is mainly due to increased bias, while it is an increased variance for Dahl's approach. For DGP3 and DGP4, when RF is used to estimate the selection probabilities there is smaller bias when compared to MNL estimation of the probabilities. In these cases, Lee's approach with MNL exhibits similar deterioration as in DGP2, but Dahl with MNL shows a somewhat different pattern: the bias increases but variance decreases in many cases. Notice that Dahl's RMSE is improved when we move from DGP1 to DGP3. Overall, if we compare the performance of the MNL estimates to the RF estimates across the various DGPs, RF appears more robust to different designs in terms of bias, RMSE and the coverage rate.

Empirical Model and Variables
We apply our network competition model to NBA data for 30 teams over the 2015-2016 regular season. The primitive play-by-play data were purchased and downloaded from Big-DataBall.com. We then formatted the data to the player-period level, where a period represents any contiguous game period in which the same ten players are on the court. This formatting is similar to that done in the calculation of the player statistic Real Plus Minus. We tabulate player box-score data to obtain Wins Produced for each player in each period for our outcome variable. 36 The league plays 1230 regular-season games per season (41 home games for each of 30 teams per regular season). Therefore, our data spans 3690 regular-season games, consisting of roughly 30 time periods per game. This produces 112,204 time periods in which we observe the play of 10 players i at a time, producing a total of 1,122,040 observations. In each game a coach typically has 15 players to fill a network of five players at a time. 37 Following Horrace, Liu, and Patacchini (2016), we drop time periods less than 30 sec and overtime periods. 38 This 36 See, for example, Berri (1999). Wins produced is a continuous weighted average of individual player offensive and defensive statistics that will be defined in what follows. Wins Produced is highly predictive of team success and is measurable at the individual level. 37 Understanding the effect of player injuries (or player ineligibility) on the coaches' decisions is left for future research. Sports injuries are analogous to worker absenteeism. 38 Horrace, Liu, and Patacchini (2016) also dropped time periods where the number of player types in any active lineup is less than 2, where "player types" are Guards or Forwards. Defining heterogeneous types aids in results in 83,334 time periods for the league, roughly 833,000 observations in total.

Outcome Function
Since competition in sports is head-to-head, we can consider a restricted version of the general model, where there is only one peer network and one competitor network in each time period. In this case, the outcome function for team r and k in period t of game g is y rtg = λ rr W rrtg y rtg + λ rk W rktg y ktg + X 1,rtg β 1r + ι 5 x 2,rtg β 2r + ι 5 α rt + u rtg y ktg = λ kk W kktg y ktg + λ kr W krtg y rtg + X 1,ktg β 1k + ι 5 x 2,ktg β 2k + ι 5 α kt + u ktg (15) where y rtg and y ktg are the 5 × 1 outcome vector of team r's and team k's chosen lineup in period t of game g, respectively, and W rrtg and W kktg are the 5×5 zero diagonal and row-normalized matrices for the within-network interactions, and W rktg and W krtg are similarly defined matrices for cross-network interactions. The X 1,rtg and X 1,ktg are matrices of the player-varying exogenous variables for team r and k's lineup in period t of game g, respectively. The x 2,rtg and x 2,ktg are matrices of the networklevel exogenous variables which, as previously noted, will be eliminated from the model with the within transformation, but the coefficients will be recovered in the estimation of the coach's strategic bias. The u rtg and u ktg are 5 × 1 error term vectors, in which each element is assumed to be iid(0, σ 2 r ) and iid(0, σ 2 k ), respectively. The likelihood function of (15) is a special case of the likelihood derived in the online supplementary materials.

Selection Equation
Ideally, coaches would select from all possible five-player lineups at their disposal, and each network would be specified as complete (i.e., w ij = 1/4 for i = j, w ij = 0 otherwise), so each selected peer would interact with all other selected peers in the lineup in the same way. Then, the set of all actions A would map directly in to the network specification. However, if we specify complete networks, then (15) is not identified. Identification requires sparsity (exclusion restrictions) in the networks for Assumption 4 to be satisfied. Instead, we assume that NBA coaches employ a strategy to create favorable offensive mismatches against the opposing team. That is, coaches are interested in creating easy scoring opportunities, where opposing guards are defending his taller forwards (close to the basket) or where opposing forwards are defending his quicker guards (away from the basket). 39 Therefore, we choose the number of active guards in the current period (Nguard rt ) for the dependent variable in our selection equation. Therefore, the model is specified as a MNL model such that the action set for team r in period t is [0, 1, . . . , 5], representing the number of guards in the current period. Since the coach's decision rule may be highly identification of the model, as we shall see. However, we found that their restriction eliminated too many of our data, so we relaxed the restriction to "less than 1, " which removed only 240 out of 112,204 time periods. 39 The precedent for this assumption on strategy can be found in Calvo, Garcia, and Navandar (2017) and Marmarinos et al. (2016).
nonlinear, we also estimate the selection equation using RF with 5-fold CV. Selected actions are a function of state variables z grt , which we describe below.

Network
Given our assumption that coaches are interested in creating positional mismatches, we would like to specify our network adjacency matrices to reflect this. With this in mind, we use the same-type peer-effect weight matrix considered in Horrace, Liu, and Patacchini (2016), where "types" are the player positions: Guards or Forwards, with Forwards including Centers. That is, the same-type weight matrix is W, where W 0 = [w 0,ij ] is an adjacency matrix with w 0,ij = 1 if the ith and jth players are both Guards or Forwards. 40 Then row-normalize W 0 so that Wι 5 = ι 5 . This network specification assumes that each individual is affected only by the same type of agents in his network and the same type of agents from the opposing network (an exclusion restriction). This restriction is required for identification of the model, in particular to separately identify the network parameters from the other input parameters. We can easily see that if there are no heterogeneous types, we can not distinguish the competitor-effect from the network fixed-effect induced by a manager. For more details on identification and the use of heterogeneous type restrictions, see Horrace, Liu, and Patacchini (2016) and Horrace and Jung (2018). For example, let's assume the lineup for team r is [F, F, G, F, G] , and for team k it is [G, F, F, G, G] in period t of game g, where F = Forward and G = Guard. Then, the network matrices in (18)

Variables
We use the Wins Produced measure based on the work of sports economist David Berri (Berri 1999;Berri, Schmidt, and Brook 2006) for the outcome, where 3PT irtg , 2PT irtg , and FT irtg are 3-point field goals made, 2-points field goals made, and free throws made, respectively, REB irtg is rebounds, STL irtg is steals, BLK irtg is blocks, MFG irtg is missed field goals, MFT irtg is missed free throws, TO irtg is turnovers, and Mins irtg is minutes played by player i of team r in period t of game g. Wins produced per minute (or wins per 40 We realize that this is a fairly restrictive network specification. An alternative would be to create network connections based on the frequency that players pass to one another, but this is left for future research. minute) estimates a player's marginal win productivity based upon player-level variables related to team-winning. It represents a leading measure of NBA player production. 41 The player-varying exogenous input variables in the outcome equation (X 1,rtg ) are Experience irtg and Fatigue irtg . The Experience irtg variable is minutes played from the start of the game to the end of period t − 1, and Fatigue irtg is minutes continuously played until the end of period t − 1. We also included player dummies to control for player-specific heterogeneity. The network-level exogenous variables (x 2,rtg ) are the RPI rtg of the opposing team from the end of the previous season (relative percentage index, a measure of the opposition's power rating from the previous season), Home rtg , an indicator variable for a home game, and 2ndHalf rtg , an indicator equal to 1 if t is in the second half. We use nine exogenous variables for the determinants in the selection equation (z rtg ). The CurScoreDiff rtg variable is the "current" score differential between the two teams in the last period (t − 1), and CumScoreDiff rtg is a "cumulative" version of the score differential from the beginning of the game up to and including period t − 1. The CumFoul rtg (CurFoul rtg ) variable is the cumulative (current) number of fouls committed at the end of period t − 1 (in period t − 1). The CurTime rtg variable is the game time at the start of period t; the Duration rtg variable is the duration of period t − 1. Because we are using the number of Guards in the game as our selection equation outcome, we include predetermined measures of the number of Guards as determinants of selection. The NguardR rtg variable is the Number of Guards available to the coach (on the Roster) at the beginning of the game. 42 The NguardOPP rtg variable is the Number of guards that the OPPosing coach had in the game last period. We also include the one period lag of the dependent variable (Nguard r(t−1)g ) in the selection equation. Descriptive statistics for the variables in the outcome and selection equations are in the online supplementary materials. In the next section we report the first-step structural estimates of team chemistry and team rivalries for the 2015-2016 NBA regular season. All other estimates are reported and discussed in the online supplementary materials. Table 3 contains the structural estimates of team chemistry (within-team peer-effects) and team rivalry (cross-team competitor-effects) from first-stage QML estimation of the within-transformed version of (15). The table contains a subset of the estimated structural parameters by division. That is, we do not report all competitor-effects, λ rk and λ kr , for all match-ups for the 30 teams. Instead, to save space we report competitor-effects for within-division rivals for each of six divisions in the table (e.g., Atlantic Division of the Eastern Conference). These divisions (based on geography) tend to be the most competitive rivalries, and divisional teams play the most head-to-head games over the season (four games). This choice reduces the number of reported competitor-effect estimates from 30×29 = 870 to a manageable 6×(5×4) = 120 with relatively smaller standard errors. 43 Estimates in the table incorporate bootstrap finite sample correction, and standard errors are based on the asymptotic distribution. 44 Peer-effects measure team chemistry conditional on strategies, abilities and opposition and do not measure team quality. Like a talented shooter can play well even with sub-optimal shot selection, a talented team can perform well even given low peereffects. Table 3 contains the peer-effect and competitor-effect estimates for 30 NBA teams in the 2015-2016 season. Bounded on the unit-circle, a peer-effect close to 1 (−1) indicates good (poor) conditional team chemistry, as player performance is positively (negatively) linked to average teammate performance. Consider Table 3 where Milwaukee (MIL) has the largest positive peer-effect (λ rr ) of 0.044. That is, when the team's average "wins produced" increases, the team's performance is enhanced by virtue of its good chemistry, conditional on coaching strategy and other environmental and performance variables. The estimates are structural parameters and not the reduced-form effects. 45 Reduced-form effects by division are presented in the online supplementary materials.

Results
In Table 3 about one third (eleven) of the teams exhibit positive and statistically significant (5% error rate) estimated peereffects (λ rr ). After Milwaukee, Detroit (DET) has the second largest positive and significant peer-effect (0.036). The smallest positive and significant peer-effect is Houston (HOU) at 0.023, and only slightly better were Phoenix (PHX) and Portland (POR), tied at 0.0026. Of the eleven teams with positive and significant peer-effects, seven were in the Eastern Conference and only four in the Western Conference. The division with the most positive and significant peer-effects (three) is the Central Division: Cleveland (CLE), Detroit (DET) and Miluakee (MIL). Only two teams exhibit negative and significant peer-effects, Philadelphia (PHI −0.028) and Miami (MIA, −0.027), both in the Eastern Conference. Finally, the bootstrap finite sample correction tended to reduce the magnitude of the peer-effects (team chemistries), while its effect on the competitor-effects was ambiguous.
Turning to the competitor-effects (λ rk ), the first row of Table 3 contains results for the Toronto Raptors (TOR). Their results are read as follows. When Toronto plays Boston (BOS), the competitor-effect of Boston on Toronto's performance is −0.272. That is, when these two teams meet, Boston's team chemistry decreases the Wins Produced performance of individual Toronto players on average. Looking across the row, we see that the Knicks (NYK) increase Toronto's performance (0.557). Finally, Brooklyn's (BKN, −0.159) and Philadelphia's (PHI, −0.041) team chemistries diminish the performance of Toronto. However, only the Knicks' competitor-effect is statistically significant. Looking at the second row, we see 43 The interaction map for the entire league is available from the authors by request. 44 We use the asymptotic results of Lee, Liu, and Lin (2010). 45 The same holds for the coefficients on the exogenous variables: they do not reflect the marginal effects of the exogenous variables under consideration, but they give us insight into the relative magnitudes of the marginal effects within and across teams. NOTE: This is a subset of results from estimation of the complete model. The interaction map for the entire league is available from the authors by request. Estimates include both selection bias correction and bootstrap finite sample bias correction. Standard errors are computed using the asymptotic results of Lee, Liu, and Lin (2010).
that Toronto enhances the play of Boston 0.280. The largest competitor effect is Atlanta's (ATL) effect on Charlotte (CHA) at −0.726, and the second largest is that of Cleveland (CLE) on Milwaulkee (MIL) at −0.707. Not surprisingly, when team A enhances the individual play of team B, team B almost always makes team A play worse. This is reflected in the signs changing around the diagonal in each divisional matrix. Of course these are estimates of the structural parameters. See the online supplementary materials for presentation and discussion of reduced-form estimates of the peer-and competitor-effects, and other coefficient estimates from the outcome function in (15).

Conclusions
Adapting and extending the SAR model, we develop a network competition model, allowing estimation of both withinnetwork effects (peer-effects) and cross-network interaction effects (competitor-effects). The estimates provide a more complete picture of market interactions, which may be useful in understanding and predicting how exogenous shocks to a single network translate to an entire market. We apply our network competition model to 30 teams in the 2015-2016 NBA regular season and find evidence of mostly positive peereffects (team chemistry). We find both positive and negative competitor-effects (team rivalries). That is, teams with good network play may enhance or diminish the performance of opposing teams. The model is somewhat restrictive relative to other simultaneous SAR models in the literature, so future research should explore generalization of the basic model. For example, inclusion of exogenous network effects (e.g., W rrt X 1t ) in the outcome function should be considered, as should time and/or cross-sectional dependence in its error, u rt (relaxation of Assumption 1). It may also prove fruitful to explore formulation of managerial selection bias that varies across peers within a network (relaxation of Assumption 5). For our NBA application it may be interesting to develop peer-effect weighting schemes based on passing and ball sharing.

Supplementary Materials
The online supplementary materials contain additional Monte Carlo simulation results, additional empirical results, and proofs of results that are not provided in the main text of the article.