Date of Award

Summer 8-27-2021

Degree Type


Degree Name

Doctor of Philosophy (PhD)




Jales, Hugo B.

Second Advisor

Flores-Lagunes, Alfonso

Subject Categories

Economics | Social and Behavioral Sciences


This dissertation studies two topics in labor economics: the distribution of earnings and the effect of social networks on starting wage.

Chapter 1, co-authored with Professor Hugo Jales, evaluates theoretical models of the distribution of earnings. The shape of the distribution of earnings in a country is informative about income inequality in that country. In most countries, the distribution of earnings is asymmetric and skewed to the right. What are the sources of this right-skewness? All major theories of the distribution of earnings aim at explaining this stylized fact and shedding light on the mechanism giving rise to this feature (and other features) of the earnings distribution. While these models contribute to our understanding of distributions of earnings, their relative performance in a real empirical setting has never been investigated. Which economic mechanism or labor market institution or non-economic factor best explains the distribution of earnings in a labor market? In Chapter 1, we evaluate the empirical relevance of economic theories of the distribution of earnings in the context of Brazil's labor market by ranking the theories according to their goodness of fit, using root mean squared error between the empirical CDF and the unconditional CDF implied by each theoretical model as the measure of closeness between the fitted distribution and the true distribution. Using a cross-section of PNAD, a nationwide representative Brazilian household survey, for year 2001, we find that in this particular empirical setting, the theoretical model that models the minimum wage explicitly, i.e., Meyer-Wise's minimum wage model, performs better than the models that don't take into account the minimum wage.

Chapter 2 examines a data-driven approach to modeling the distribution of earnings. While there exist theories that attempt to explain the sources of income inequality, a data-driven, atheoretical approach can offer new insights into this important question. Although (log) wage is often modeled as linearly related to human capital, demographic, and firm's characteristics, machine learning algorithms allow flexible functional forms as well as select which characteristics to include. In Chapter 2, using a cross-sectional of PNAD, a nationwide representative Brazilian household survey, for year 2001, I predict whether a worker's log wage is less than or equal to the 1st, 2nd, ..., 99th sample percentile of the log wage distribution, based on his observable characteristics. This consists of 99 classification tasks, one for each percentile. To predict these binary outcomes, I use supervised machine learning algorithms such as penalized logistic regressions and random forest classification. The sequence of binary classifications traces out the conditional cumulative distribution function (CDF) of log wage. For each prediction task, I evaluate the out-of-sample predictive performance of machine learning algorithms against the benchmark of a basic linear logistic regression model with three human capital variables, using the area under the Precision-Recall curve as the performance measure. To preview the findings, first, I find that penalized logistic regressions have a higher predictive accuracy than the benchmark model. Second, lasso logistic regression selects different sets of covariates at different points of the log wage distribution and many of the selected covariates are interaction terms.

Chapter 3 investigates heterogeneity in the effect of social networks on starting wage. Using social networks is a prevalent method of finding jobs. However, the effect of using social contacts on wages may depend on the type of job. Using data from the National Longitudinal Survey of Youth 1997 (NLSY97), I estimate log real starting wage regressions for the whole sample and for each of the 22 major occupational groups. The main variable of interest is a binary indicator that takes value 1 if the currently employed worker used the method "contacted friends or relatives", possibly together with other job search methods, to look for work at the time when he found his current job. I find that there is heterogeneity in the effect: for 13 out of 22 occupation groups, the effect is negative. For 3 out of the 13 groups with a negative effect, the effect is statistically significant. Workers who contacted friends or relatives to look for work in management occupations, healthcare practitioner and technical occupations, construction and extraction occupations suffer a starting wage penalty compared to workers who did not, while workers who contacted friends or relatives in transportation and material moving occupations enjoy a starting wage premium. To explore the possible mechanisms behind these findings, I use the Occupational Information Network (O*NET 1998) to characterize the content of tasks in each occupation. I find that contacting friends or relatives while looking for work reduces the positive effect of having a job with a higher score on the nonroutine analytical (math) O*NET measure, lessens the negative effect of having a job with a higher score on the "number facility" O*NET measure, and intensifies the negative effect of having a job with a higher score on the "coordinate" O*NET measure, on log real starting wage.


Open Access

Included in

Economics Commons