library(tidyverse)
library(tidymodels)
library(knitr)
AE 07: Exam 01 review
Restaurant tips
Packages
Restaurant tips
What factors are associated with the amount customers tip at a restaurant? To answer this question, we will use data collected in 2011 by a student at St. Olaf who worked at a local restaurant.1
The variables we’ll focus on for this analysis are
Tip
: amount of the tipParty
: number of people in the partyAlcohol
: whether alcohol was purchased with meal
View the data set to see the remaining variables.
<- read_csv("data/tip-data.csv") tips
Exploratory analysis
- Visualize, summarize, and describe the relationship between
Party
andTip
.
# add your code here
Modeling
Let’s start by fitting a model using Party
to predict the Tip
at this restaurant.
Write the statistical model.
Fit the regression line and write the regression equation. Name the model
tips_fit
and neatly display the results with 3 digits and the 95% confidence interval for the coefficients.
# add your code here
Interpret the slope.
Does it make sense to interpret the intercept? Explain your reasoning.
Inference
Inference for the slope
- The following code can be used to create a bootstrap distribution for the slope (and the intercept, though we’ll focus primarily on the slope in our inference). Describe what each line of code does, supplemented by any visualizations that might help with your description.
set.seed(1234)
<- tips |>
boot_dist specify(Tip ~ Party) |>
generate(reps = 100, type = "bootstrap") |>
fit()
- Use the bootstrap distribution created in Exercise 6,
boot_dist
, to construct a 90% confidence interval for the slope using bootstrapping and the percentile method and interpret it in context of the data.
# add your code here
- Conduct a hypothesis test at the equivalent significance level using permutation with 100 reps. State the hypotheses and the significance level you’re using explicitly. Also include a visualization of the null distribution of the slope with the observed slope marked as a vertical line.
set.seed(1234)
# add your code here
- Check the relevant conditions for Exercises 7 and 8. Are there any violations in conditions that make you reconsider your inferential findings?
# add your code here
Now repeat Exercises 7 and 8 using approaches based on mathematical models. You can reference output from previous exercises and/or write new code as needed.
Check the relevant conditions for Exercise 10. Are there any violations in conditions that make you reconsider your inferential findings? You can reference previous graphs / conditions and add any new code as needed.
# add your code here
Inference for a prediction
- Based on your model, predict the tip for a party of 4.
# add your code here
Suppose you’re asked to construct a confidence and a prediction interval for your finding in the previous exercise. Which one would you expect to be wider and why? In your answer clearly state the difference between these intervals.
Now construct the intervals and comment on whether your guess is confirmed.
# add your code here
Multiple linear regression
- Make a plot to visualize the relationship between
Party
andTip
with the points colored byAlcohol
. Describe any patterns that emerge.
# add your code here
- Fit a multiple linear regression model predicting
Tip
fromParty
andAlcohol
. Display the results withkable()
and three digits.
# add your code here
Interpret the coefficients of
Party
andAlcohol
.According to this model, is the rate of change in tip amount the same for various sizes of parties regardless of alcohol consumption or are they different? Explain your reasoning.
Footnotes
Dahlquist, Samantha, and Jin Dong. 2011. “The Effects of Credit Cards on Tipping.” Project for Statistics 212-Statistics for the Sciences, St. Olaf College.↩︎