Homework 1

Eco 231 - Undergraduate Econometrics

Spring 2021

1. Assume that you are hired to investigate the causal effect between being raised in high-poverty

neighborhoods in the US and future outcomes during adulthood (such as health, well-being, social

networks and economic self-sufficiency). Employ the sources in the Datasets file in Blackboard

and succinctly answer the following questions:

(a) Mention a suitable dataset that can help you answer the question above. Provide its name

and the website where it can be downloaded.

(b) What is the sample size in this dataset? Is this a reasonable number for your research?

(c) Briefly describe the data you found in part (a). Using the codebook discuss which variables

are crucial to answer the research question posed in the statement above (no more than 10


2. Suppose you are a researcher interested in studying the relationship between household character-

istics and future educational outcomes of children. You have been advised that one dataset which

satisfies your requirements is the Early Childhood Longitudinal Study, Birth Cohort. Try to find

the data through the sources talked about in the Stata lecture. In order to answer the following

questions, additionally you will need to locate the codebooks of this database. (Note: You do not

need the data, the codebooks and webpage pdfs contain all information you require)

(a) Briefly describe the objectives of this study and the different rounds of the survey. Mention

the methods employed for data collection. At what ages are the interviews conducted? (Your

answer should not exceed 10 lines).

(b) Describe which are the restrictions for the use of this database.

(c) How many children are classified as low birth weight in the first round of the survey?

(d) Describe the groups of variables available in the first round. Classify them in child charac-

teristics, mother characteristics and household characteristics.

(e) Choose two variables you could employ as baseline characteristics of the household. Describe

how these variables would be relevant for studying future outcomes of children.

(f) Calculate the nonresponse rate between the initial number of individuals interviewed and the

two following rounds of the survey.


(g) Suppose you are interested in studying how socio-emotional skills are developed before the

age of two. Describe which assessments included in this study could be employed for this

purpose. Does the study have similar assessments for higher ages?

(h) Describe which measurements can be used to analyze the cognitive skills of children in kinder-


3. This problem asks you to work directly with Stata. Suppose you are a researcher interested in

studying the labor market outcomes of recent college graduates. One public-use, suitable dataset

for this purpose is the National Survey of College Graduates (NSCG). In order to answer the

following questions, you will need to use the attached documentation to identify the variables of


(a) Explore the survey using the interview questionnaire. Based on this, write down one scientific

question (related to the topic mentioned above) which could be answered using the NSCG.

(b) Use the interview questionnaire provided with the database to identify the variables related

to hours worked per week, weeks worked per year and year earnings. Notice that information

about weeks worked can be derived using two variables. Also note that the NSCG15 value

of 98 for hrs worked per week = logical skip.

(c) After handling invalid values properly, create a table showing the mean and standard devia-

tion of the three variables described in part (a) for men and women separately.

(d) In order to see the distribution of hours worked per week, crate a histogram of this variable

for men and women. Plot the density in the y-axis and use a bin width of 10 for the x-axis.

(e) Create a new variable lnhourwage defined as the (natural) logarithm of year earnings di-

vided by total hours worked during the year. Produce a table showing the mean, standard

deviation and percentiles 10th and 90th of this variable for men and women separately. Drop

observations which yield a negative value of this variable.

(f) Use the interview questionnaire to identify the variable which indicates whether a respondent

changed employer and/or job between 2013 and 2015, as well as the variables describing the

reason of change in case the employer is different between these two years. What is the

proportion of respondents who stayed with the same employer and job during this period?

(g) As a researcher, you are also interested in studying how the gender wage gap varies across

major fields. Using the variable related to the first bachelor degree (nbamemg) and your

variable lnhourwage create a table showing the mean hourly wage for women and men

across different majors. Which is the one that presents the higher wage gap?

(h) Run a regression of hourly wages on education separately for men and women. How does the

parameter of education differ across gender?

(i) Create a variable of potential experience ptlexper, defined as age-education-6. Run a re-

gression of hourly wages on education, potential experience and potential experience squared

separately for men and women. Interpret your results.