Friday, May 24, 2019

When Confronted with Limited Number of Variables, Use Them Prudently in the Model

Intended for New Analysts and Researchers -

(Click on the image to enlarge)

Table-1 shows that the low predictive relationship between the Adjusted Sale Price (“ASP”) and Land_SF, as well as Bldg_Age have forced the model to a single stage multiplicative one, encompassing all of the continuous and descriptive variables. Both Land_SF and Bldg_Age coefficients would have been insignificant, thus pushing the linear regression step to be skipped altogether; for example, free land (improvement carries all value) is as absurd as the positive $19 depreciation annually.

Table-2 is the resulting correlation matrix of all of the available variables in the log form (for use in a non-linear multiplicative regression model). The high predictive relationship between ASP and Living_SF, as well as Town foretells that these independent variables would be the two most valuable contributors in the equation.

Table-3 is the output from the multiplicative regression model. As expected, the Living Area SF variable (Living_SF) is most important variable (highest t Stat) in this model, followed by the Town variable (Town). Land_SF remains non-contributing (low t stat and high P-value) as the vast majority of residential lots in this county tend to be predictably similar (violates the multiple regression assumption, hence being rejected by the model). The R-square of 0.8828 – without any residual analysis and outlier removal – points to the growing efficiency of the model.

Since one multiple regression equation has been resorted to, the three continuous variables – Land_SF, Living_SF and Bldg_Age – have not been converted into categorical variables; instead they remain in their original format, though log-transformed for the multiplicative use. Pool and Garage are the two binaries while Town is a categorical variable with categorical linear values.

- Sid Som, MBA, MIM
President, Homequant, Inc.

No comments:

Post a Comment