Multiple linear regression
Multiple linear regression (MLR)
Based on the analysis goals, we will use a multiple linear regression model of the following form
^sale_price = ˆβ0+ˆβ1bedrooms+ˆβ2bathrooms+ˆβ3living_area+ˆβ4lot_size+ˆβ5year_built+ˆβ6property_tax
Similar to simple linear regression, this model assumes that at each combination of the predictor variables, the values sale_price follow a Normal distribution.
Regression Model
Recall: The simple linear regression model assumes
Y|X∼N(β0+β1X,σ2ϵ)
Similarly: The multiple linear regression model assumes
Y|X1,X2,…,Xp∼N(β0+β1X1+β2X2+⋯+βpXp,σ2ϵ)
The MLR model
For a given observation (xi1,xi2…,xip,yi)
yi=β0+β1xi1+β2xi2+⋯+βpxip+ϵiϵi∼N(0,σ2ϵ)
Prediction
At any combination of the predictors, the mean value of the response Y, is
μY|X1,…,Xp=β0+β1X1+β2X2+⋯+βpXp
Using multiple linear regression, we can estimate the mean response for any combination of predictors
ˆY=ˆβ0+ˆβ1X1+ˆβ2X2+⋯+ˆβpXp
Model fit
| (Intercept) |
-7148818.957 |
3820093.694 |
-1.871 |
0.065 |
| bedrooms |
-12291.011 |
9346.727 |
-1.315 |
0.192 |
| bathrooms |
51699.236 |
13094.170 |
3.948 |
0.000 |
| living_area |
65.903 |
15.979 |
4.124 |
0.000 |
| lot_size |
-0.897 |
4.194 |
-0.214 |
0.831 |
| year_built |
3760.898 |
1962.504 |
1.916 |
0.059 |
| property_tax |
1.476 |
2.832 |
0.521 |
0.604 |
Model equation
^price=−7148818.957−12291.011×bedrooms+51699.236×bathrooms+65.903×living area−0.897×lot size+3760.898×year built+1.476×property tax
Interpreting ˆβj
- The estimated coefficient ˆβj is the expected change in the mean of y when xj increases by one unit, holding the values of all other predictor variables constant.
- Example: The estimated coefficient for
living_area is 65.90. This means for each additional square foot of living area, we expect the sale price of a house in Levittown, NY to increase by $65.90, on average, holding all other predictor variables constant.