# Duscssion Question

Discussion and content attached

600-700 work count

Loading...

Discussion and content attached

600-700 work count

CHAPTER 7:

REGRESSION ANALYSIS

Chapter 7: Regression Analysis

Agenda

1. Understanding Regression Analysis

2. Data Requirements

3. Specifying Regression Models

4. Regression Assumptions

5. Interpreting Regression Results

6. Validation and Use of Regression Models

7. Steps for Stata Actions

Chapter 7: Regression Analysis

Understanding Regression Analysis

Importance of regression analysis:

• Regression analysis is one of the most important methods in quantitative market research

• Marketing and strategy departments of major companies (e.g. Proctor & Gamble, BMW

Group, Nestlé) use regression analysis regularly for decision-making

• Regression analysis can:

1. Indicate if independent variables have a significant relationship with a dependent

variable.

2. Indicate the relative strength of different independent variables’ effects on a

dependent variable.

3. Make predictions.

In this lecture we introduce linear regression (OLS)

Chapter 7: Regression Analysis

The Regression Analysis Estimate:

Drawing A Line from Observations

•The regression line is optimal if the

squared distances to all observation

points are minimized

(Distance = error = residual = e)

› Positive and negative deviances don’t

balance each other

› Bigger deviances are weighted

disproportionately high

• If the assumptions of the regression are

harmed, the line might not be the true

line!

Chapter 7: Regression Analysis

Understanding Regression Analysis

Notation:

y = α + β1x1 + e

Y dependent variable

α Constant/Intercept β1 Regression parameter x1 Independent variable

e Error or residual

For multiple regression:

y = α + β1x1 + β2x2 + β3x3 + e

Predicted (or estimated) value

Error (e)

Constant (α)

y ̂

Interpret the regression results

The regression analysis data requirements

Specify and estimate the regression model

Test the assumptions of regression analysis

Validate the regression results

Use the regression model

Chapter 7: Regression Analysis

The Process of Conducting a Regression Analysis

Chapter 7: Regression Analysis

Agenda

1. Understanding Regression Analysis

2. Data Requirements

3. Specifying Regression Models

4. Regression Assumptions

5. Interpreting Regression Results

6. Validation and Use of Regression Models

7. Steps for Stata Actions

Chapter 7: Regression Analysis

Consider Data Requirements for Regression Analysis

• Before conducting a regression analysis, see if the available data will allow for

regression analysis:

Greene’s rules of thumb: 104 + k to determine sufficient sample size

Do the dependent and independent variables have (sufficient) variation?

Is the dependent variable interval or ratio scale (if not, use other methods

such as logistic or ordinal regression)

Are the independent variables collinear? (if they are, some remedies exist!)

Chapter 7: Regression Analysis

Agenda

1. Understanding Regression Analysis

2. Data Requirements

3. Specifying Regression Models

4. Regression Assumptions

5. Interpreting Regression Results

6. Validation and Use of Regression Models

7. Steps for Stata Actions

Chapter 7: Regression Analysis

Specify the Regression Model

Chapter 7: Regression Analysis

Specify Regression Model: Different Methods

• Stata will show you several estimation options.

• You should maintain the Default standard errors,

which is identical to Ordinary least squares (OLS).

• However, when heteroskedasticity is present, use

Robust standard errors.

• On the Reporting tab you find several options,

including reporting the Standardized beta

coefficients.

Rules of thumb:

1. Don’t put all possible variables in your regression

model

2. Avoid overlapping / correlating variables (select

or factor analysis)

3. Consider relation between sample and variables

Chapter 7: Regression Analysis

Agenda

1. Understanding Regression Analysis

2. Data Requirements

3. Specifying Regression Models

4. Regression Assumptions

5. Interpreting Regression Results

6. Validation and Use of Regression Models

7. Steps for Stata Actions

Chapter 7: Regression Analysis

Regression Assumptions

Formal assumptions:

• The regression model can be expressed in a linear way.

• The expected mean error of the regression model is zero.

• The variance of the errors is constant (homoskedasticity).

• The errors are independent (no autocorrelation).

Optional assumption (to determine significance of parameters):

• The errors need to be approximately normally distributed.

Chapter 7: Regression Analysis

Linearity

Linearity is an assumption that asks if •

we can write the regression model as

y = α + β1x1 + .. + βzxz+ e

A separate issue is if the relationship •

between y and x is best expressed as

a linear relationship. If not, we can

use transformations:

– x2

Log (x– )

After such transformations the model •

is still linear!

**In Stata this can be done by using the “ robust” option.

Chapter 7: Regression Analysis

Homoskedasticity

Is the variance constant or de(in) creasing

as a function of one or more independent

variable(s)

Homoskedastic distribution

Heteroskedastic distribution

Chapter 7: Regression Analysis

Are the Errors Independent?

• Primarily relevant for time series: Does one observation influence another?

Satisfaction of same group of customers asked monthly

Success of new products launched after one another

• Use the Durbin-Watson test after ranking the observations on a time dimension.

Durbin-Watson test values (n = 30, k = 1)

Chapter 7: Regression Analysis

Are the Residuals Approximately Normally Distributed?

• This optional assumption can be

visualized using a histogram of the

residuals.

• Also conduct a Shapiro-Wilk test on

the residuals

Tests of Normality

Kolmogorov-Smirnov a Shapiro-Wilk

Statisti

c df Sig. Statistic df Sig.

Unstandardized Residual .131 30 .200 * .939 30 .084

a. Lilliefors Significance Correction

*. This is a lower bound of the true significance.

Chapter 7: Regression Analysis

Agenda

1. Understanding Regression Analysis

2. Data Requirements

3. Specifying Regression Models

4. Regression Assumptions

5. Interpreting Regression Results

6. Validation and Use of Regression Models

7. Steps for Stata Actions

Chapter 7: Regression Analysis

Interpret the Regression Results

• To interpret a regression model, look at the following two elements:

Overall Model Fit

Effects of Individual Variables

Chapter 7: Regression Analysis

Overall Model Fit

Model fit measures:

N=Sample, K=Number of variables

T

R

RE

R2

SS

SS

SSSS

SS R

Radj 2 =1-(1-R2 )

(n-1)

n-k-1 F=

R2

(1-R2 ) ´

n-k-1

k

Chapter 7: Regression Analysis

Effects of Individual Variables

• The effects of individual variables:

• Consider the significance of the parameter of each variable (not the intercept) in turn. The

hypothesis tested for each variable is:

– H0: β1 = 0 (line is horizontal)

– H1: β1 ≠ 0 (regression line is not horizontal)

– If the Sig. results indicate a value below .05 we consider this to be significant.

• For significant variables also consider:

– The sign

– The βs. (direction of the regression line)

• Unstandardized: expresses the effect of a one-unit change in the independent variable on the

dependent.

• The standardized effect: the effect relative to the other variables. Standardized effects cannot be

compared, but not with dummy variables

Significance? (p-value)

Direction? (sign)

Size? (b vs. standardized b)

Chapter 7: Regression Analysis

Agenda

1. Understanding Regression Analysis

2. Data Requirements

3. Specifying Regression Models

4. Regression Assumptions

5. Interpreting Regression Results

6. Validation and Use of Regression Models

7. Steps for Stata Actions

Chapter 7: Regression Analysis

Validate the Regression Results

• Validating the results is necessary to understand if the model is robust.

• There are three basic ways to validate regression results, these are often used in

combination:

Split sample validation: Split the sample randomly into two groups consisting

of 70% (estimation sample) and 30% (validation sample) of the

observations. Run the regression model and check if the parameter estimates

and significance levels are about the same. What is “about” is open to

discussion but typically the signs are expected to be similar if significant and

significant parameters should remain approximately so. Marginally significant

effects (e.g. p=0.045 changes to 0.51) are not problematic.

Use a new dataset, if available, to test for stability (cross validation)

Add a few other variables to the model that might be relevant. See if these

change the results.

Coefficients a

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig. B Std. Error Beta

1 (Constant) 29011.585 18448.456 1.573 .127

Average supermarket price -24003.037 16694.676 -.241 -1.438 .162

Index of promotional

activities

44.227 13.567 .547 3.260 .003

a. Dependent Variable: Weekly sales in USD

Chapter 7: Regression Analysis

Use the Regression Model

• Using the regression model is the final and key step. Use involves prediction of effects or interpreting effects (see previous steps).

• Prediction: consider the following model and results:

y = α + β1x1 + β2x2 + e

• What is our predicted outcome if we set the price to 1.10 USD and the level of promotional activities to 50?

29,011.585 – 24,003.037 · 1.10 USD + 44.227 · 50 promotional activities = 4,819.594 USD sales per week

Chapter 7: Regression Analysis

Agenda

1. Understanding Regression Analysis

2. Data Requirements

3. Specifying Regression Models

4. Regression Assumptions

5. Interpreting Regression Models

6. Validation and Use of Regression Results

7. Steps for Stata Actions

Chapter 7: Regression Analysis

Steps for Stata Actions

• Let’s summarize and re-cap the major theoretical decisions we need to make if we

want to run a regression model.

• These decisions are then “translated” into Stata actions.

Step 1. Consider the regression analysis data requirements

Sufficient sample size Check if sample size is 104+ k, where k indicates the number of independent variables. If

the expected effects are weak (the R2 is .10 or lower), use at least 30 k observations per

independent variable.

This can be done easily by calculating the correlation matrix. Note the number of

observations (obs=…) immediately under the correlate command to determine the sample

size available for regression.

“correlate commitment s9 s10 s19 s21 s23 status age gender “

Do the dependent and

independent variables show

variation?

Calculate the standard deviation of the variables by going to ► Statistics ►Summaries,

tables, and tests ► Summary and descriptive statistics ► Summary statistics (enter the

dependent and independent variables). At the very least, the standard deviation (indicated

by Std. Dev. in the output) should be greater than 0.

“summarize commitment s9 s10 s19 s21 s23 i.status age i.gender”

Chapter 7: Regression Analysis

Steps for Stata Actions

Step 1. Consider the regression analysis data requirements

Is the dependent variable

interval or ratio scaled?

See Chap. 3 to determine the measurement level.

Is (multi)collinearity

present?

The presence of (multi)collinearity can only be assessed after the regression

analysis has been conducted (to run a regression model; ► Statistics ► Linear

models and related ► Linear regression. Under Dependent variable enter the

dependent variable and add all the independent variables under the box

Independent variables and click on OK).

Check the VIF: ► Statistics ► Postestimation ► Specification, diagnostic, and

goodness-of-fit analysis ► Variance inflation factors. Then click on Launch and

OK. The VIF should be below 10 (although it can be higher, or lower, in some

cases; see section 7.3.1.4 for specifics).

“vif”

Chapter 7: Regression Analysis

Steps for Stata Actions

Step 2. Specify and estimate the regression model

Model specification 1. Pick distinct variables

2. Try to build a robust model

3. Consider the variables that are needed to give advice

4. Consider whether the number of independent variables is in relation to

the sample size

Estimate the regression

model

► Statistics ► Linear models and related ► Linear regression. Under

Dependent variable enter the dependent variable and add all the

independent variables under Independent variables and click on OK.

“regress commitment s9 s10 s19 s21 s23 i.status age gender”

Use robust regression when heteroskedasticity is present:

“regress commitment s9 s10 s19 s21 s23 i.status age gender, robust”

Chapter 7: Regression Analysis

Steps for Stata Actions

Step 3. Test the regression analysis assumptions

Can the regression model be

specified linearly?

Consider whether you can write the regression model as: 𝑦 = 𝛼 + 𝛽1𝑥1 +

𝛽2𝑥2 +⋯+ 𝑒

Is the relationship between the

independent and dependent

variables linear?

Plot the dependent variable against the independent variable using a scatterplot matrix to

see if the relation (if any) appears to be linear. ► Graphics ► Scatterplot matrix. Then

add all the variables and click on Marker properties where, under Symbol, you can choose

Point for a clearer matrix. Note that you cannot add variables that start with i. (i.e.,

categorical variables). Then click on OK.

“graph matrix commitment s9 s10 s19 s21 s23 status age gender, msymbol(point)”

Conduct Ramsey’s RESET test to test for non-linearities. Go to ► Statistics ► Postestimation

► Specification, diagnostic, and goodness-of-fit analysis ► Ramsey regression

specification-error test for omitted variables. Then click on Launch and OK.

“estat ovtest”

Chapter 7: Regression Analysis

Steps for Stata Actions

Step 3. Test the regression analysis assumptions

Is the expected mean error

of the regression model

zero?

Choice made on theoretical grounds.

Are the errors constant

(homoscedastic)?

Breusch-Pagan test: This can only be checked right after running a regression

model. Go to ► Statistics ► Postestimation ► Specification, diagnostic, and

goodness-of-fit analysis ► Tests for heteroskedasticity (hettest). Then click on

Launch and then OK. Check that the Breusch-Pagan / Cook-Weisberg test for

heteroskedasticity is not significant. If it is, you can use robust regression to

remedy this.

“estat hettest”

White’s test: This can only be checked right after running a regression model. Go

to ► Statistics ► Postestimation ► Specification, diagnostic, and goodness-of-

fit analysis ► information matrix test (imtest). Then click on Launch and then OK.

“estat imtest”

Chapter 7: Regression Analysis

Steps for Stata Actions

Step 3. Test the regression analysis assumptions

Are the errors correlated

(autocorrelation)?

This can only be checked after running a regression model and by declaring the time aspect. This means you need a variable

that indicates how the variables are organized over time. This variable, for example, week, should be declared in Stata using the

tsset command, for example, tsset week. Then conduct the Durbin–Watson test. You can select this test by going to ► Statistics ►

Postestimation ► Specification, diagnostic, and goodness-of-fit analysis ► Durbin-Watson statistic to test for first-order serial

correlation. Click on Launch and then OK. The Durbin-Watson test for first-order serial correlation should not be significant. The

critical values can be found on the website accompanying this book ( Web Appendix Downloads).

tsset week

estat dwatson

Are the errors normally

distributed?

This can only be checked after running a regression model. You should first save the errors by going to ► Statistics ►

Postestimation ► Predictions ► Predictions and their SEs, leverage statistics, distance statistics, etc. Then click on Launch. Enter

the name of the error variable (we use error in this chapter), making sure Residuals (equation-level scores) is ticked, and click

on OK.

You should calculate the Shapiro-Wilk test to test the normality of the errors. To select the Shapiro-Wilk test, go to Statistics ►

Summaries, tables, and tests ►Distributional plots and tests ► Shapiro-Wilk normality test. Under Variables enter error and

click on OK. Check if the Shapiro-Wilk test under Prob>z reports a p-value greater than 0.05.

To visualize, create a histogram of the errors containing a standard normal curve: ► Graphics ► Histogram and enter error.

Under ► Density plots, tick Add normal-density plot.

predict error, res

swilk res

histogram res, normal

Chapter 7: Regression Analysis

Steps for Stata Actions

Step 4. Interpret the regression model

Consider the overall model

fit

Check the R2 and significance of the F-value.

Consider the effects of the

independent variables

separately

Check the (standardized) β. Also check the sign of the β. Consider the significance of

the t-value (under P>|t| in the regression table).

To compare models Calculate the AIC and BIC ► Statistics ► Postestimation ► Specification, diagnostic,

and goodness-of-fit analysis ► Information criteria - AIC and BIC. Click on Launch

and then OK.

Check the AIC and BIC, and ascertain if the simpler model has AIC or BIC values that

are at least 2, but preferably 10, lower than that of the more complex model.

estat ic

Calculate the standardized

effects

Check Standardized beta coefficients under the Reporting tab of the regression

dialog box, which can be found under ► Statistics ► Linear models and related ►

Linear regression ► Reporting.

Determine, sequentially, the highest absolute values.

Chapter 7: Regression Analysis

Steps for Stata Actions

Step 4. Interpret the regression model

Calculate the effect size Make sure you have used OLS regression (and not robust regression).

Then go to ► Statistics ► Postestimation ► Specification, diagnostic,

and goodness-of-fit analysis ► Eta-squared and omega-squared

effect sizes. Then click on Launch and OK.

Interpret each eta squared as the percentage of variance explained

(i.e., as that variable’s R2). An effect of individual variables of 0.02 is

small, 0.15 is medium, and 0.30 and greater is large.

Chapter 7: Regression Analysis

Steps for Stata Actions

Step 5. Validate the model

Are the results robust? This can only be done easily using the command window. First create a random

variable.

set seed 12345

gen validate = runiform() < 0.7

Then run the regression model where you first select 70% and then last 30% of

the cases. Do this by going to ► Statistics ► Linear models and related ►

Linear regression. Then click on by/if/in and under If: (expression) enter

validate==1. Then repeat and enter validate==0.

regress commitment s9 s10 s19 s21 s23 i.status age gender, robust if

validate==1

regress commitment s9 s10 s19 s21 s23 i.status age gender, robust if

validate==0

Compare the model results to ensure they are equal.

Chapter 7: Regression Analysis

Takeaways

• Regression analysis is one of the most important methods in empirical economic

research

• The regression analysis puts an optimal line through the observed points

• Assumptions influence the regression results in mostly unexpected ways. We should

check if

The regression model can be expressed in a linear way.

The expected mean error of the regression model is zero.

The variance of the errors is constant (homoskedasticity).

The errors are independent (no autocorrelation).

• Interpretation of results: consider model fit and the effects of individual

variables.