Week 10 (Mar 29-31)
On Monday this week we will cover the material about randomized experiments and matching that we didn’t get to last week. On Wednesday we will start talking about computing adjusted or partial effects of variables that account for potential confounding relationships by using multiple regression.
In-class activities, case studies, exercises, etc for this week
We will work on these in breakout rooms. I suggest having one person share their screen and serve as note-taker, or quickly sharing a Google doc so you can work together. The activities will be a component of your next homework.
Activity 1: Green buildings Activity 2: Hold my beer
Other materials used this week
Saratoga housing matching example
Multiple regression example: School expenditures - Refer to this script for the most relevant parts of the code in the video for today. Use this as your example for fitting multiple regression models
Multiple regression example: School expenditures (full script) - This is the full script I used in the video, if you’re curious
Readings, videos, and activities to do for next week
For Monday 4/5:
Complete the “Hold my beer” case study above; we will discuss it at the beginning of class.
Watch Categorical variables in Regression Models Reviews categorical variables in simple linear regression models and multiple linear regression models.
For Wednesday 4/7:
Watch Interaction terms in Regression Models
Exercises and other things due next week
For Wednesday 4/7 hand in:
Completed Green buildings case study, and:
Problem 1
More on the green buildings case study:
- Construct a matched dataset using age, class_a, class_b, cluster_rent, renovated, amenities, and heating_costs. Continue to use the optio ratio=3.
- Using this matched dataset, estimate the effect of the green rating. Provide a 95% confidence interval for the effect. Does it appear that the green rating increases revenue? Why or why not?
- Some of you noticed that as more matching variables were added, the overall balance sometimes degraded. This is not uncommon. One way to address this is to use a multiple regression model to estimate the effect of the green rating. Using multiple regression here does two things: Address residual imbalance, and (potentially) get more precise inferences about the treatment effect when including strong predictors of the outcome. Repeat part 2 using a multiple regression model.
Problem 2
- Revisit the Saratoga housing example above. Recall that the balance here was quite poor after matching. Use multiple regression (with the same matching variables I used) on the matched dataset to estimate the increase in average sale price for house with versus without a fireplace. Give the estimate, its interpretation, and a 95% confidence interval. How does the result compare to the one that does not use multiple regression?