Oct 27 - 28, 2022
The goal of this analysis is to predict the total gross revenue of a movie using opening weekend statistics. The data set includes movies released in the U.S. in 2009 that opened on 500 or more theater screens. The data were obtained from Handbook of Regression Analysis.
The variables we’ll use are
TotalGross
: Total US gross revenue in millions of dollars
Opening
: opening weekend gross revenue in millions of dollars
Screens
: the number of screens on which the movie opened
Below are the distributions and measure of center for the response and each predictor variable.
mean | median |
---|---|
77.03 | 42.67 |
mean | median |
---|---|
23.33 | 15.83 |
mean | median |
---|---|
2751.55 | 2756 |
We’ll start by considering the following model with a log-transformed response variable. Note that OpeningCent
and ScreenCent
are the mean-centered version of Opening
and Screens.
\[ \log(TotalGross) = \beta_0 + \beta_1 \times OpeningCent + \beta_2 \times ScreensCent + \epsilon \hspace{8mm} \epsilon \sim N(0, \sigma^2_{\epsilon}) \]
The model output is below:
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | 3.8470 | 0.0420 | 91.6190 | 0 |
OpeningCent | 0.0251 | 0.0026 | 9.6891 | 0 |
ScreensCent | 0.0005 | 0.0001 | 6.1691 | 0 |
Interpret the intercept in the context of the data.
Interpret the effect of Opening
in the context of the data.
Next let’s consider the following model with a log-transformed predictor. Note that OpeningCent
and ScreenCent
are the mean-centered version of Opening
and Screens.
\[TotalGross = \beta_0 + \beta_1 \times \log(OpeningCent) + \beta_2 \times ScreensCent + \epsilon \hspace{8mm} \epsilon \sim N(0, \sigma^2_{\epsilon}) \]
The model output is below:
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | 57.8738 | 27.3014 | 2.1198 | 0.0400 |
log(OpeningCent) | 31.2932 | 11.0651 | 2.8281 | 0.0071 |
ScreensCent | 0.0483 | 0.0368 | 1.3119 | 0.1967 |
Interpret the intercept in the context of the data.
Interpret the effect of a 10% increase in Opening
in the context of the data.
Choose one of the usable data sets proposed in the Topic Ideas
Click here for proposal instructions