Week 9 (Mar 22-24)

On Monday this week we will talk about doing hypothesis tests when you don’t know the sampling distribution of the test statistic under the null hypothesis by using permutation tests.
We will focus on tests evaluating null hypothesis of no difference between groups. On Wednesday we will start talking about methods for isolating causal effects – that is, if we observe a difference between two groups (say a treated group and an untreated group) is that difference caused by their membership in those groups, or simply associated with it?

In-class activities, case studies, exercises, etc for this week

We will work on these in breakout rooms. I suggest having one person share their screen and serve as note-taker, or quickly sharing a Google doc so you can work together. The activities will be a component of your next homework.

Activity 1: Permutation test walkthrough. Because I’m a softie, I decided not to ask you do do this over break/before class. We’ll take some time to work through it in breakout rooms to get ready for our next activity. You should skip the last subsection entitled “Using other statistics in the permutation test” for now.

Activity 2: Let’s revisit the Saratoga housing data we saw in our last regular session (see here).

Let’s start by revisiting the comparison between houses with any and no fireplaces (in the first section of part B).

Run all the relevant code up to and including the preamble of part B to load the dataset and construct the any_fireplaces variable
Start by bootstrapping the regression coefficient on the any_fireplaces dummy variable. Plot a histogram of the sampling distribution and compute a 95% confidence interval. What does this tell us about the difference in sale prices for houses with one or more vs. no fireplaces?
Now we’ll do a hypothesis test, testing the null hypothesis that there is no difference in average sale prices. a. Using the same regression model, perform a permutation test of the null hypothesis using the relevant coefficient as a test statistic. (Hint: Use the “shuffle” function inside the formula like in the walkthrough) b. Draw a histogram of the sampling distribution of the regression coefficient under the null hypothesis. c. How should we define extreme in this case? Why? Using this definition, compute the p-value and comment on the strength of evidence against the null hypothesis.
Compare the two histograms from problem 2 and problem 3. Describe in your own words the difference between them and the information they convey about the difference in sale prices for houses with and without any fireplaces.

Activity 3: Green buildings

Other materials used this week

Permutation testing webapp

Saratoga housing matching example

Readings, videos, and activities to do

For Wednesday 3/24 read:

Experiments
Quasi-experiments and matching, from the “Matching” header on page 117 up to but not including the subsection “Matching isn’t a silver bullet: a bigger example” on page 121.

Permutation test reading. You’re responsible for this material, but there is no set deadline for you to read it.

For Monday 3/29 finish reading the “Matching” section of Quasi-experiments and matching (the subsection “Matching isn’t a silver bullet: a bigger example”). Students in the 2PM section should watch the final 10-15 minutes of either the 12:30 or 3:30 zoom recording, where I briefly discuss matching (sorry again we ran over!).

For Wednesday 3/31 read Multiple regression: the basics and watch this video

Exercises and other things due next week

Due next Wednesday 3/31:

Completed versions of:

Week 8 Activity 1
Week 9 Activity 2

And these two activities we didn’t get to in class:

Revisit the Milk demand case study. Plot a histogram of the bootstrap approximation to the sampling distribution of the elasticity and compute a 95% confidence interval. Do the data provide strong evidence that milk is an elastic good? Why or why not?
Revisit the “Market Models” case study. For the Apple and Target models, plot a histogram of the bootstrap approximation to the sampling distribution of the slope and compute a 95% confidence interval. Is there evidence that one or both of these stocks is less volatile than the market as a whole?

Written on March 21, 2021