Lab: Project Proposal

Oct 27 - 28, 2022

Welcome

Goals

  • Review models with log-transformed response or predictor variables
  • Project proposal

Models with log-transformed variables

Movies data

The goal of this analysis is to predict the total gross revenue of a movie using opening weekend statistics. The data set includes movies released in the U.S. in 2009 that opened on 500 or more theater screens. The data were obtained from Handbook of Regression Analysis.

The variables we’ll use are

  • TotalGross: Total US gross revenue in millions of dollars

  • Opening: opening weekend gross revenue in millions of dollars

  • Screens: the number of screens on which the movie opened

Exploratory data analysis

Below are the distributions and measure of center for the response and each predictor variable.

mean median
77.03 42.67

mean median
23.33 15.83

mean median
2751.55 2756

Log-transformed response variable

We’ll start by considering the following model with a log-transformed response variable. Note that OpeningCent and ScreenCent are the mean-centered version of Opening and Screens.

\[ \log(TotalGross) = \beta_0 + \beta_1 \times OpeningCent + \beta_2 \times ScreensCent + \epsilon \hspace{8mm} \epsilon \sim N(0, \sigma^2_{\epsilon}) \]

The model output is below:

term estimate std.error statistic p.value
(Intercept) 3.8470 0.0420 91.6190 0
OpeningCent 0.0251 0.0026 9.6891 0
ScreensCent 0.0005 0.0001 6.1691 0


  • Interpret the intercept in the context of the data.

  • Interpret the effect of Opening in the context of the data.

Log-transformed predictor variable

Next let’s consider the following model with a log-transformed predictor. Note that OpeningCent and ScreenCent are the mean-centered version of Opening and Screens.

\[TotalGross = \beta_0 + \beta_1 \times \log(OpeningCent) + \beta_2 \times ScreensCent + \epsilon \hspace{8mm} \epsilon \sim N(0, \sigma^2_{\epsilon}) \]

The model output is below:

term estimate std.error statistic p.value
(Intercept) 57.8738 27.3014 2.1198 0.0400
log(OpeningCent) 31.2932 11.0651 2.8281 0.0071
ScreensCent 0.0483 0.0368 1.3119 0.1967


  • Interpret the intercept in the context of the data.

  • Interpret the effect of a 10% increase in Opening in the context of the data.

Project proposal

Project proposal

  • Choose one of the usable data sets proposed in the Topic Ideas

  • Click here for proposal instructions