Multiple linear regression
Multiple linear regression (MLR)
Based on the analysis goals, we will use a multiple linear regression model of the following form
\[
\begin{aligned}\hat{\text{sale_price}} ~ = & ~
\hat{\beta}_0 + \hat{\beta}_1 \text{bedrooms} + \hat{\beta}_2 \text{bathrooms} + \hat{\beta}_3 \text{living_area} \\
&+ \hat{\beta}_4 \text{lot_size} + \hat{\beta}_5 \text{year_built} + \hat{\beta}_6 \text{property_tax}\end{aligned}
\]
Similar to simple linear regression, this model assumes that at each combination of the predictor variables, the values sale_price
follow a Normal distribution.
Regression Model
Recall: The simple linear regression model assumes
\[
Y|X\sim N(\beta_0 + \beta_1 X, \sigma_{\epsilon}^2)
\]
Similarly: The multiple linear regression model assumes
\[
Y|X_1, X_2, \ldots, X_p \sim N(\beta_0 + \beta_1 X_1 + \beta_2 X_2 + \dots + \beta_p X_p, \sigma_{\epsilon}^2)
\]
The MLR model
For a given observation \((x_{i1}, x_{i2} \ldots, x_{ip}, y_i)\)
\[
y_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + \dots + \beta_p x_{ip} + \epsilon_{i} \hspace{8mm} \epsilon_i \sim N(0,\sigma_\epsilon^2)
\]
Prediction
At any combination of the predictors, the mean value of the response \(Y\), is
\[
\mu_{Y|X_1, \ldots, X_p} = \beta_0 + \beta_1 X_{1} + \beta_2 X_2 + \dots + \beta_p X_p
\]
Using multiple linear regression, we can estimate the mean response for any combination of predictors
\[
\hat{Y} = \hat{\beta}_0 + \hat{\beta}_1 X_{1} + \hat{\beta}_2 X_2 + \dots + \hat{\beta}_p X_{p}
\]
Model fit
(Intercept) |
-7148818.957 |
3820093.694 |
-1.871 |
0.065 |
bedrooms |
-12291.011 |
9346.727 |
-1.315 |
0.192 |
bathrooms |
51699.236 |
13094.170 |
3.948 |
0.000 |
living_area |
65.903 |
15.979 |
4.124 |
0.000 |
lot_size |
-0.897 |
4.194 |
-0.214 |
0.831 |
year_built |
3760.898 |
1962.504 |
1.916 |
0.059 |
property_tax |
1.476 |
2.832 |
0.521 |
0.604 |
Model equation
\[
\begin{align}\hat{\text{price}} = & -7148818.957 - 12291.011 \times \text{bedrooms}\\[5pt]
&+ 51699.236 \times \text{bathrooms} + 65.903 \times \text{living area}\\[5pt]
&- 0.897 \times \text{lot size} + 3760.898 \times \text{year built}\\[5pt]
&+ 1.476 \times \text{property tax}
\end{align}
\]
Interpreting \(\hat{\beta}_j\)
- The estimated coefficient \(\hat{\beta}_j\) is the expected change in the mean of \(y\) when \(x_j\) increases by one unit, holding the values of all other predictor variables constant.
- Example: The estimated coefficient for
living_area
is 65.90. This means for each additional square foot of living area, we expect the sale price of a house in Levittown, NY to increase by $65.90, on average, holding all other predictor variables constant.