Lab 06: Adelie Penguins

Logistic regression intro

Important

Due:

  • Monday, November 07 , 11:59pm (Thursday labs)
  • Tuesday, November 08, 11:59pm (Friday labs)

Introduction

In this assignment, you’ll get to put into practice the logistic regression skills you’ve developed to analyze data about Palmer Penguins.

Learning goals

By the end of the lab you will be able to

  • conduct exploratory data analysis for logistic regression
  • fit logistic regression models and write the regression equation
  • use the model to calculate predicted probabilities
  • continue developing a collaborative workflow with your teammates

Getting started

  • A repository has already been created for you and your teammates. Everyone in your team has access to the same repo.

  • Go to the sta210-fa22 organization on GitHub. Click on the repo with the prefix lab-06. It contains the starter documents you need to complete the lab.

  • Each person on the team should clone the repository and open a new project in RStudio. Throughout the lab, each person should get a chance to make commits and push to the repo.

Workflow: Using Git and GitHub as a team

Important

There are no Team Member markers in this lab; however, you should use a similar workflow as in Lab 04. Only one person should type in the group’s Qmd file at a time. Once that person has finished typing the group’s responses, they should render, commit, and push the changes to GitHub. All other teammates can pull to see the updates in RStudio.

Every teammate must have at least one commit in the lab. Everyone is expected to contribute to discussion even when they are not typing.

Packages

The following packages are used in the lab.

library(tidyverse)
library(tidymodels)
library(palmerpenguins)
library(knitr)

# add other packages as needed

Data: Palmer Penguins

We will go back to the Palmer penguins data used in HW 02.

Data were collected and made available by Dr. Kristen Gorman and the Palmer Station, Antarctica LTER, a member of the Long Term Ecological Research Network. (Gorman, Williams, and Fraser 2014)


These data can be found in the palmerpenguins package. We’re going to be working with the penguins dataset from this package. The dataset contains data for 344 penguins. There are 3 different species of penguins in this dataset, collected from 3 islands in the Palmer Archipelago, Antarctica. Click here to see the codebook.

We will focus on the following variables:

variable class description
species integer Penguin species (Adelie, Gentoo, Chinstrap)
island integer Island where recorded (Biscoe, Dream, Torgersen)
bill_depth_mm integer Bill depth in mm

Exercises

The goal of this analysis is to use logistic regression to understand the relationship between bill depth, island, and whether a penguin is from the Adelie species. First, we need to create a new response variable to identify whether a penguin is from the Adelie species.

penguins <- penguins |>
  mutate(adelie = factor(if_else(species == "Adelie", 1, 0)))

And let’s check to make sure the new variable looks how we would expect before we continue with the analysis.

penguins |> 
  count(adelie, species)
# A tibble: 3 × 3
  adelie species       n
  <fct>  <fct>     <int>
1 0      Chinstrap    68
2 0      Gentoo      124
3 1      Adelie      152

Exercise 1

Let’s start by examining the relationship between adelie and island.

Visualize the relationship between adelie and island. What is something you observe about the relationship between these two variables based on the plot?

Tip

If you need inspiration, click here for example plots and code to visualize the relationship between two categorical variables.

Exercise 2

What does the values_fill argument do in the following chunk? The documentation for the function will be helpful in answering this question.

penguins |>
  count(island, adelie) |>
  pivot_wider(names_from = adelie, values_from = n, values_fill = 0)
# A tibble: 3 × 3
  island      `0`   `1`
  <fct>     <int> <int>
1 Biscoe      124    44
2 Dream        68    56
3 Torgersen     0    52

Exercise 3

  • Calculate the probability a randomly selected penguin is from the Adelie species if it was recorded on Biscoe island.

  • Calculate the odds a randomly selected penguin is from the Adelie species if it was recorded on Biscoe island.

Exercise 4

You want to fit a model using island to predict the odds of being from the Adelie species. Let \(\pi\) be the probability a penguin is from the Adelie species. The model has the form shown below.

\[ \log\Big(\frac{\pi}{1-\pi}\Big) = \beta_0 + \beta_1~ Dream + \beta_2 ~ Torgersen \]

  • Fit the model and neatly display the model output using three digits.
  • What are the predicted odds of a penguin being from the Adelie species if it was recorded on Biscoe island?

  • What are the predicted odds of a penguin being from the Adelie species if it was recorded on Dream island?

Exercise 5

Next, we’d like to add bill depth to the model. We’ll start by examining the relationship between these two variables.

Visualize the relationship between bill_depth_mm and adelie. What is something you observe about the relationship between these two variables based on the plot?

Exercise 6

  • Add bill depth to the previous model so that there are two predictors, island and bill_depth_mm. Neatly display the model output using three digits.

  • Write the estimated regression equation.

Exercise 7

Use the model from Exercise 6.

  • How do you expect the log-odds of being from the Adelie species to change when going from a penguin with bill depth 17 mm to a penguin with bill depth 20 mm? Assume both penguins were recorded on the Dream island.

  • How do you expect the odds of being from the Adelie species to change when going from a penguin with bill depth 17 mm to a penguin with bill depth 20 mm? Assume both penguins were recorded on the Dream island.

Exercise 8

Use the model from Exercise 6.

  • How do you expect the log-odds of being from the Adelie species to change when going from a penguin with bill depth 18 mm recorded on Biscoe island to a penguin with bill depth 21 mm recorded on Dream island?

  • How do you expect the odds of being from the Adelie species to change when going from a penguin with bill depth 18 mm recorded on Biscoe island to a penguin with bill depth 21 mm recorded on Dream island?

Submission

Warning

Before you wrap up the assignment, make sure all documents are updated on your GitHub repo. We will be checking these to make sure you have been practicing how to commit and push changes.

Remember – you must turn in a PDF file to the Gradescope page before the submission deadline for full credit.

One team member submit the assignment:

  • Go to http://www.gradescope.com and log in using your NetID credentials.
  • Click on your STA 210 course.
  • Click on the assignment, and you’ll be prompted to submit it.
  • Select all team members’ names, so they receive credit on the assignment. Click here for video on adding team members to assignment on Gradescope.
  • Mark the pages associated with each exercise. All of the pages of your lab should be associated with at least one question (i.e., should be “checked”).
  • Select the first page of your PDF submission to be associated with the “Workflow & formatting” section.

Grading

Total points available: 50 points.

Component Points
Ex 1 - 8 45
Workflow & formatting 51

References

Gorman, Kristen B., Tony D. Williams, and William R. Fraser. 2014. “Ecological Sexual Dimorphism and Environmental Variability Within a Community of Antarctic Penguins (Genus Pygoscelis).” Edited by André Chiaradia. PLoS ONE 9 (3): e90081. https://doi.org/10.1371/journal.pone.0090081.

Footnotes

  1. The “Workflow & formatting” grade is to assess the reproducible workflow. This includes having at least one meaningful commit from each team member and updating the team name and date in the YAML.↩︎