STA 210 - Fall 2022 - SLR: Mathematical models for inference

Announcements

Lab 02 due
- Today at 11:59pm (Thursday labs)
- Tue, Sep 20 at 11:59pm (Friday labs)
HW 01: due Wed, Sep 21 at 11:59pm
Statistics experience - due Fri, Dec 09 at 11:59pm
Lab 01 solutions posted in Resources folder in Sakai
See Week 04 for this week’s activities.

Topics

Define mathematical models to conduct inference for the slope
Use mathematical models to
- calculate confidence interval for the slope
- conduct a hypothesis test for the slope

Computational setup

# load packages
library(tidyverse)   # for data wrangling and visualization
library(tidymodels)  # for modeling
library(openintro)   # for the duke_forest dataset
library(scales)      # for pretty axis labels
library(knitr)       # for pretty tables
library(kableExtra)  # also for pretty tables
library(patchwork)   # arrange plots

# set default theme and larger font size for ggplot2
ggplot2::theme_set(ggplot2::theme_bw(base_size = 20))

The regression model, revisited

df_fit <- linear_reg() |>
  set_engine("lm") |>
  fit(price ~ area, data = duke_forest)

tidy(df_fit) |>
  kable(digits = 3)

term	estimate	std.error	statistic	p.value
(Intercept)	116652.325	53302.463	2.188	0.031
area	159.483	18.171	8.777	0.000

Inference, revisited

Earlier we computed a confidence interval and conducted a hypothesis test via simulation:
- CI: Bootstrap the observed sample to simulate the distribution of the slope
- HT: Permute the observed sample to simulate the distribution of the slope under the assumption that the null hypothesis is true
Now we’ll do these based on theoretical results, i.e., by using the Central Limit Theorem to define the distribution of the slope and use features (shape, center, spread) of this distribution to compute bounds of the confidence interval and the p-value for the hypothesis test

Mathematical representation of the model

where the errors are independent and normally distributed:

independent: Knowing the error term for one observation doesn’t tell you anything about the error term for another observation
normally distributed:

Mathematical representation, visualized

Mean: , the predicted value based on the regression model
Variance: , constant across the range of
- How do we estimate ?

Regression standard error

Once we fit the model, we can use the residuals to estimate the regression standard error (the spread of the distribution of the response, for a given value of the predictor variable):

Why divide by ?
Why do we care about the value of the regression standard error?

Standard error of

or…

term	estimate	std.error	statistic	p.value
(Intercept)	116652.33	53302.46	2.19	0.03
area	159.48	18.17	8.78	0.00

Magnitude of p-value	Interpretation
p-value < 0.01	strong evidence against
0.01 < p-value < 0.05	moderate evidence against
0.05 < p-value < 0.1	weak evidence against
p-value > 0.1	effectively no evidence against

SLR: Mathematical models for inference

Announcements

Topics

Computational setup

The regression model, revisited

Inference, revisited

Mathematical representation of the model

Mathematical representation, visualized

Regression standard error

Standard error of

Mathematical models for inference for

Hypothesis test for the slope

Hypothesis test: Test statistic

Hypothesis test: p-value

Understanding the p-value

Hypothesis test: Conclusion, in context

Confidence interval for the slope

Confidence interval: Critical value

95% CI for the slope: Calculation

95% CI for the slope: Computation

Intervals for predictions

Intervals for predictions

Two types of predictions

Uncertainty in predictions

Standard errors

Standard errors

Confidence interval

Prediction interval

Comparing intervals

Extrapolation