Tuesday, August 14, 2018

AVM is a Market Solution, Comparable Sales Analysis isn't (1 of 2)

-- Intended for Start-up Analysts and Researchers --

(Click on the image to enlarge)

The same sales population - derived from a single Zip Code - has been used across all three graphs (you may use other fixed locations like Census Tract, School District, etc.). Considering all sales from the same Zip, it helped minimize the impact of location (of course, you can never make location totally irrelevant as each block has a different appeal).

The above graph shows the noisy relationship between the uncorrected (raw) Sale Price and Bldg Size (Heated Living Area). The reason is very simple: Each sale is directly related to a buyer's judgment, hence is highly subjective. For instance, buyers are paying between $100K and $250K for a 1,500 SF home. While the investors would target the lower end of the range, the informed buyers would be in the middle and the uninformed buyers (someone who is bent on buying a pink house!) would succumb to the higher end of the range. The R-squared is therefore extremely low (0.189), explaining very little of the variations in sale prices.  




The Regression Value-1 graph proves that even a rudimentary regression model (with only three independent variables - Land SF, Bldg SF and Bldg Age) is capable of producing a decent market solution. The fit is significantly tighter, especially in the long end of the curve. The R-squared jumps from 0.19 to 0.91, accounting for 90% of the variations in sale prices. But this model has clearly bi-modal issues between 1,000 and 2,200 SF as the regression values are forked. 

FYI – If you see such stacked values, you have to investigate the underlying reason(s). The simple way to identify the issue is to scatter the normalized regression values against the other independent variables as well and look for an explanation.




The above investigation guided me to the solution. As the normalized regression values from the first model were scattered against the Bldg Age variable (above graph), it was evident that many buyers were paying a premium for the younger homes, causing the stack. In fact, a sizeable portion of those buyers were willing to pay over $130/SF for the younger homes, while very few offered such premium for the older ones. More precisely, none paid over $160/SF for the older stock.


(Click on the image to enlarge)

So the Bldg Age variable had to be transformed from continuous to binary (younger homes vs. the rest). The re-run of the regression model with the transformed Bldg Age produced the above (Regression Value-2) graph. Consequently, the value fork has disappeared, translating to a much tighter fit, with improved R-squared, lower intercept and a steeper slope approaching 45 degrees.  

We will discuss the Comp Sales Analysis in a future post (2 of 2).

Good Luck!

Sid Som, MBA, MIM
President, Homequant, Inc.


No comments:

Post a Comment