Thursday, June 28, 2018

How to Analyze and Present Large and Complex Home Sales Data – in 30 Minutes (2 of 2)

-- Intended for Start-up Analysts and Researchers --

In our prior post (1 of 2) we talked about analyzing and presenting a large and complex dataset in 30 minutes. Would you handle it differently if you had 60 minutes? Here is one approach you might like to consider:

1. Just because you are starting out, do not underestimate yourself. The very fact that you have been tasked with this critical presentation speaks volumes, so take full advantage of this visibility in narrowing the competition down. These meetings are often frequented by other department heads and high-level client representatives, leading to significant loss of time in unrelated (business) discussions. The best way to prepare for such contingencies is to split the presentation up into a two-phase solution where phase-1 leads seamlessly to phase-2. 

2. In a business environment, it's never a good idea to start with a complicated stat/econ model. Start a bit slow but use your analytical acumen and presentation skill to gradually force people to converge on the same page, thus retaining maximum control over the presentation (time and theme). Therefore, the phase-1 solution should be the same as the full* 30-min solution we detailed before (*including the sub-market analysis). Even if the meeting leads to unrelated business chit-chat, off and on, you will still be able to squeeze in the phase-1 solution, thus offering at least a baseline solution. Alternatively, if you have one all-encompassing solution, you will end up offering virtually nothing. 

3. Now that you have finished presenting the phase-1, establishing a meaningful baseline, you are ready to transition to the higher-up phase-2 solution. In other words, it's time to show off your modeling knowledge. In phase-1 you presented a baseline Champ-Challenger analysis (Champ=Median Sale Price, MoM; Challenger=Median SP/SF, MoM). You used the "Median" to avoid having to clean up the dataset for major outliers. Here is the caveat though: Sales, individually, are mostly judgment calls; for example, someone bent on buying a pink house would overpay; an investor would underpay by luring a seller with a cash offer, etc. In the middle (middle 68% of the bell curve), the so-called informed buyers would use five comps, usually hand-picked by the salespeople, to value their subjects - not an exact science either.   

4. Now, let's envision where you would be at this stage - 30 minutes on hand and brimming with confidence. But it's not enough time to try to develop and present a true multi-stage, multi-cycle AVM (see my recent post on 'How to Build A Better AVM'). So, settle for a straight-forward Regression-based modeling solution, allowing time for a few new slides to the original presentation. Build the model as one log equation with limited number of variables (though covering all of the three major categories). Variables you might like to choose: Living Area, Age, Bldg Style, Grade, Condition and School/Assessing District. Avoid 2nd tier variables (e.g., Garage SF, View, Site Elevation, etc.).

5. Derive the time adjustment factors from phase-1 (it's a MoM) and create Time Adjusted Sale Price (ASP), the dependent variable in your Regression model. Explain this connection in your presentation so the audience (including your SVP/EVP boss) knows the two phases are not mutually exclusive, rather one is the stepping stone to the other. At this point, you could face this question "Why did you split it up into two?" Keep you answer short and truthful: "It's a time-based contingency plan."

6. Keep the Regression output handy but do not insert it into the main presentation as it is a log model (audience may not be able to relate to the log parameter estimates). If the issue comes up, talk about the three important aspects of the model: a) variable selection (how you managed to represent all three categories), b) most important variables as judged by the model (walk down on the t-stat and p-value) and c) overall accuracy of the model (r-squared, f-statistics, confidence, etc.).    

7. Present model results in two simple steps. Value Step: ASP vs. Regression values. Show the entire percentile curve - 1st to 99th. Point out the smoothness of the Regression values vis-a-vis ASP. Even arms-length sales tend to be somewhat irrational on both ends of the curve (<=5th and >=95th). Standard deviation of the Regression values would be much lower than ASP's. Ratio Step: Run stats on the Regression Ratio (Regression Value to ASP). It's easier to explain the Regression Ratios than the natural numbers so spend more time on the ratios.    

8. Time permitting, run the above stats both ways - with and without outliers. Define outliers by the Regression Ratios. Keep it simple; example: remove all ratios below the 5th and above the 95th percentile or below 70 and above 143, etc. Considering this is the outlier-free output, run Std Dev, COV, COD etc. These stats would be significantly better than the prior (with outliers) ones. Another common outlier question is: "Why no waterfront in your model?" The answer is simple: Generally, waterfront parcels comprise less than 5% of the population, hence difficult to test representativeness. FYI - in an actual AVM, if sold waterfront parcels properly represent the waterfront population, it could be tried in the model, as long as it clears the multi-collinearity test.  

9. Last but least, be prepared to face an obvious question: "What is the point of developing this model?" Here is the answer: A sale price is more than a handful of top-line comps. It comprises an array of important variables like size, age, land and building characteristics, fixed and micro locations, etc. so only a multivariate model can do justice to sale price by properly capturing and representing all of these variables. The output from this Regression model is the statistically significant market replica of the sales population. Moreover, this model can be applied on to the unsold population to generate very meaningful market values. Simply put, this Regression model is an econometric market solution. Granted, the unsold population could be comp'd but that's a very time-consuming and subjective process.

Ace the next presentation. Be a hero. Prove to your bosses you are a future CEO.

Good Luck!

- Sid Som, MBA, MIM
President, Homequant, Inc.
homequant@gmail.com


No comments:

Post a Comment