library(tidyverse)
library(tidymodels)
library(knitr)
Lab 03: Coffee grades
Inference for simple linear regression using mathematical models
Introduction
In today’s lab you will analyze data from over 1,000 different coffees to explore the relationship between a coffee’s sweetness and it’s flavor grade.
Learning goals
By the end of the lab you will…
- be able to use mathematical models to conduct inference for the slope
- be able to assess conditions for simple linear regression
Packages
The follow packages are used in the lab.
The Data
The dataset for this lab comes from the Coffee Quality Database and was obtained from the #TidyTuesday GitHub repo. It includes information about the origin, producer, measures of various characteristics, and the quality measure for over 1,000 coffees. The coffees can be reasonably be treated as a random sample.
This lab will focus on the following variables:
sweetness
: Sweetness grade, 0 (least sweet) - 10 (most sweet)flavor
: Flavor grade, 0 (worst flavor) - 10 (best flavor)
Click here for the definitions of all variables in the data set. Click here for more details about how these measures are obtained.
<- read_csv("data/coffee-grades.csv") coffee
Exercises
Goal: The goal of this analysis is to use linear regression to understand variability in coffee flavor grades based on the sweetness grade.
Exercise 1
Visualize the relationship between the sweetness and flavor grades. Write two observations from the plot.
Exercise 2
Fit the linear model using sweetness grade to understand variability in the flavor grade. Neatly display the model using three digits, and include the 98% confidence interval for the model coefficients in the output.
Exercise 3
Interpret the slope in the context of the data.
Assume you are a coffee drinker. Would you drink a coffee represented by the intercept? Why or why not?
Exercise 4
Fill in the name of your model object from Exercise 2 to obtain the estimated regression standard error, \(\hat{\sigma}_\epsilon\).
glance(_____$fit)$sigma
Describe what this value means in the context of the data.
Exercise 5
Do the data provide evidence of a statistically significant linear relationship between sweetness and flavor grades? Conduct a hypothesis test using mathematical models to answer this question. In your response
- State the null and alternative hypotheses in words and in mathematical notation.
- What is the test statistic? State with the test statistic means in the context of this problem.
- What distribution was used to calculate the p-value? Be specific.
- State the conclusion in the context of the data using a threshold of \(\alpha = 0.02\) to make your decision.
Exercise 6
What is critical value used to calculate the 98% confidence interval displayed in Exercise 2? Show the code and output used to get your response.
Is the confidence interval consistent with the conclusions from the hypothesis test? Briefly explain why or why not.
Exercise 7
We’d like to check the model conditions to assess the reliability of the results in Exercises 4 - 6.
Make a scatterplot of the residuals vs. fitted values. Add a horizontal dotted line at \(residuals = 0\).
Make a histogram of the residuals.
Exercise 8
Is the linearity condition satisfied? Briefly explain your response using the plot of residuals vs. fitted to support your assessment.
Is the constant variance condition satisfied? Briefly explain your response using the plot of residuals vs. fitted to support your assessment.
Exercise 9
Is the normality condition satisfied? Briefly explain your response.
Exercise 10
Is the independence condition satisfied? Briefly explain your response.
Submission
In this class, the final PDF documents will be submitted to Gradescope.
To submit your assignment:
- Go to http://www.gradescope.comand click Log in in the top right corner.
- Click School Credentials ➡️ Duke NetID and log in using your NetID credentials.
- Click on your STA 210 course.
- Click on the assignment, and you’ll be prompted to submit it.
- Mark the pages associated with each exercise. All of the pages of your lab should be associated with at least one question (i.e., should be “checked”).
- Select the first page of your .PDF submission to be associated with the “Workflow & formatting” & “Reproducible report” sections.
Grading (50 pts)
Component | Points |
---|---|
Ex 1 - 10 | 45 |
Workflow & formatting | 31 |
Reproducible report | 22 |
Footnotes
The “Workflow & formatting” grade is to assess the reproducible workflow. This includes implementing version control with at least at least 3 informative commit messages and having a neatly formatted PDF document that is easily readable with an updated name and date.↩︎
This means we will be able to render the .qmd file in your GitHub repo and exactly reproduce the .pdf file in the Github repo and on Gradescope.↩︎