Friday, April 21, 2017

Testing the Representativeness of a Sales Sample

While a market model is developed using the recent arms-length sales, the real objective is to generate efficient values for the unsold population, which is generally 95% of the entire population from which a sales sample is derived. Therefore, it is critical to scientifically test the representativeness of the sample.

While most AVMs are built around Multiple Regression Analysis (MRA) and Time Series, an efficient AVM process should also include Sampling considering its inherent power and strong association with the MRA. A properly derived sample is almost as good as the population, so sampling is extremely useful when the population is large.

The power of sampling makes the population more manageable, helps optimize allocation of resources, and points more scientifically to the inefficient areas of modeling or work-flow.

Also, there is no such thing as a generic 5% sample either. A 5% sample of a homogeneous (residential) population could be an excessive sample, while it could be an inadequate sample for a very heterogeneous population like the commercial properties, ranging from multifamily to automotive to daycare to dining to entertainment to office to retail to industrial to warehouse, etc.

A sample, by definition, is representative of the population, but there is no such thing as a perfect sample. A perfect sample is the population. While testing the representativeness of the sample, selecting variables from the different strata of variables – quantitative variables, qualitative variables, general location variables, etc. – is of utmost importance.

The sales sample has to pass the “representative” test not only at the overall level, but also pass a series of stratified tests, e.g., geography, size, age, price, exterior characteristics, etc. Once the sales sample is established, it must be split it up between a modeling sample (70-80%) and a holdout sample (30-20%).

Since the model is tested on the holdout sample before being applied on to the population, the holdout sample must have very similar attributes as the modeling sample. Of course, both must also represent the parent sales sample. If the sales sample is large, the split could be larger: modeling sample 65% and holdout sample 35%.

Moreover, the Percentile distribution (“Pctl”) curve – at least from the 25th Percentile to the 75th Percentile – is appropriate to test the representativeness of the quantitative variables, while the frequency percentile distribution is fitting for the qualitative and general location variables. Tests pertaining to the quantitative variables are more important than their qualitative counterparts.

The sample representative test for the quantitative variables must be performed in three sequential steps: (1) Median-based tests, (2) Body of the Curve (25th to 75th Percentile distribution) Tests, and (3) Expanded Curve (5th to 95th Percentile distribution) Tests.
Median-based Tests

Though the above sample is 6.90% of the population – admittedly, quite high for a homogeneous sample – it will be reduced later when it gets split up between modeling and holdout samples. Again, even the most representative samples will marginally differ from their parent populations in almost all categories. Therefore, in evaluating the representativeness of the quantitative variables, a 10% differential could be accepted as the rule of thumb. In Table-1 above, all of the variables, except Bldg Age, have passed the 10% test. Bldg Age – derived as Current Year–Year Built – is used in place of the Year Built in order to ease into the modeling, thus allowing direct introduction as an independent variable into the MRA equation. 

The median Bldg Age in the sales sample is 40% lower than the population’s, clearly pointing to the fact that the recent buyers preferred younger homes without sacrificing other attributes. When the Gross Bldg Area (GBA) variable is available in the database, it must also be evaluated in order to make the comparison more apples-to-apples, meaning the additional non-living improved areas must also be compared as they contribute to the overall value of the home. Of course, the Living Area variable must always be preferred as an independent variable to the GBA in the actual MRA. Some county databases may contain Heated Area instead of or in addition to the Living Area. Heated Area could be as good. While Bath (BA) counts are important comparison metrics, Bedroom (BR) count is not.

Given the fact that only a limited number of truly meaningful quantitative variables are generally available, any significant divergence – as in Bldg Age here – must be subjected to more in-depth examination and, more often than not, they could thus be saved and utilized. When the median shows a significant divergence, despite a reasonably large sample size, the entire percentile curve – from the 5th to the 95th – must be closely analyzed. The curve often points to a cluster in the short or long end (of the curve), forcing the sample median to be skewed and consequently diverge from the population median.

The Chart-1 above shows a significant cluster at the short end (through the 25th) of the sample curve which is pulling the sample median down, although the long end of the curve (75th to 95th) shows meaningful convergence. This phenomenon often occurs in the marketplace when builders offer additional incentives (free upgrades, low interest financing, interest buy down, etc.) to push the excess inventory out. As the excess inventory gets absorbed, the local market returns to normalcy, resulting in the future sales samples to mimic the population more closely. This tends to be a very short-term phenomenon, perhaps an aberration, and as such gets smoothed out over time. The Bldg Age variable here is therefore perfectly usable in actual modeling.  

Of course, when one of the basic quantitative variables exhibits such short-term aberration, the Living Area variable must also be co-examined alongside, to avoid having to unintentionally dismiss or gloss over any emerging or ongoing structural shift in the market. A structural shift could be in the offing if both variables move in tandem, in clear divergence from the population. For example, due to the vibrant local economy driven by the IT industry, if the millennial population starts to invest in smaller but younger homes, both sample medians would contract, diverging from the population medians.

If more and more millennials continue to move into the area and follow suit, a structural shift (a new trend) in the market would occur. While such a shift would be quickly captured in the sales sample, it would take much longer to reflect in the population stats. Meanwhile, in order to make the sales sample usable in market modeling, it would require lot more liquidity, e.g., instead of most recent twelve months worth of sales, eighteen to twenty-four months worth of sales could be experimented with, thus significantly reducing or perhaps ironing out the recent aberration (assuming, of course, it’s statistically non-structural).       

The Chart-2 above shows the two Living Area percentile curves are moving in tandem, without any divergence in short, mid or long ends, proving that no such structural change in the market has been taking place. Nonetheless, this test is critical when one important quantitative variable shows any misalignment, thereby instilling the necessary statistical confidence into the modeling spectrum.

Though many practitioners do not go past the median-based test, it is necessary but not sufficient. Models developed off of the median-based test samples are prone to hidden errors on both ends of the curve as they are totally overlooked at the point of sampling. If one is forced to accept the median-based solution, one must additionally compare the relative Standard Deviations (SD) and Coefficient of Variations (COV). In real estate economics, COV is normalized by the median, rather than the mean, as the median reduces the incidence of outliers in the data series.       
Body of the Curve (25th to 75th)

Even the body of the curve comparison (Table-4) shows very consistent (within 10%) results, except for the previously-identified Bldg Age variable. The only other out-of-range node happens to be the 75th Percentile of the Land Area, which is common in suburban counties.

In urban areas, the vast majority of inner city residential lots are zero-lot-line lots, while the remainders are generally smaller and are fairly homogeneous, thus compressing the deviation. While the GBA tends to become erratic on outer end of the curve, it is nevertheless quite consistent here. In analyzing and comparing the body of the curve, Q-Range (75th – 25th) is a good measure of variability.

The body of the curve methodology would be perfectly fine when the intended model is of lesser complexity. In other words, if the model does not demand ultra precision and accuracy, one could stop right at this step, to avoid having to drill down to the following more complex step.

      Expanded Curve (5th to 95th

When a highly accurate model is required, this expanded approach is recommended considering it depicts both ends of the curve as well. Predictably, the outer end (90th – 95th) of the Land Area is exponentially diverging. Therefore, if the Land Area variable becomes significant in the model, model values pertaining to those parcels (Land Area >=90th) must be subjected to additional tests and scrutiny. Conversely, the Bldg Age is fast converging on the outer end of the curve.

Since the COV is a normalized measure, it is a better measure of variability than the SD. For example, though the GBA’s SD (Table 5) is significantly higher than Living Area’s, the two COVs are very close to each other. Similarly, despite the Bldg Age’s lower SD (Table 5) than the Population’s (Table 6), the COVs are transposed.

        Testing Qualitative Variables

In both instances (Tables 8-9) the sample adequately represents the population, proving that the qualitative variables are statistically well-aligned from the modeling point of view. Since this dataset has been extracted from a Southern coastal state, the Waterfront variable is obviously meaningful. Alternatively, for a NE state, the Style variable would make sense in view of the diversity of home styles there.     

Table-10 demonstrates that the homebuyers had an inverse love affair with the two most liquid towns, i.e., TOWN 7 and TOWN 8, albeit both towns have successfully met the 10% threshold. Homebuyers were however more steady in approaching TOWN 5, the third most liquid town on the lineup. In any event, the town-wise distribution amply confirms the representativeness of the market sample a well.

Holdout Sample
Now that the primary sales sample has been established, it needs to be split up into two parts: Modeling Sample and Holdout Sample. The latter ensures the accuracy and consistency of the model before being applied on to the population. When the final model, developed off of a modeling sample, shows a COV of 12, the holdout application should show a very similar COV (e.g., between 11.50 and 12.50), after removal of the outliers involving the same range. If the holdout application produces a COV of 14.00, the model must be re-examined. Often, this interim step helps identify the areas of model failures and weaknesses. Often, based on the holdout results, the model is recalibrated, refined or fine-tuned. Absent this step, the direct model application on to the population would be unscientific, at least statistically insignificant.

Again, while the primary sales sample is tested against the population as a whole, the modeling and hold-out sub-samples are tested against the primary sales sample they are derived from. Needless to say, both sub-samples must retain the attributes of the primary. Unlike the extensive and multi-tier tests the primary sales sample is subjected to, the median-based tests for the sub-samples would suffice for the sub-samples, considering this is a review step, rather than a modeling step.

          Table-11 shows how the primary sales sample has been split up into modeling (70%) and holdout (30%) sub-samples while retaining the original statistical properties of all of the seven analysis variables. Post split, the modeling sample has been reduced to 4.83% (6,632/137,012). Depending on the size of the sales sample, the modeling and holdout splits could be manipulated. In this case, the sample was large to justify a 70%-30%. Had the sample been smaller, a 75%-25% (even 80%-20%) would have made sense. Since the model would be developed off of the modeling sample, its liquidity is more important.
Splitting up a Sample in Excel

  1. In splitting up the sales sample, Excel’s Random Number Generator was used.
  2. A new column B was created to store the resulting random numbers. Note: In order to access the Random Number Generator, Excel’s Data Analysis module needs to be activated.
  3.   The sales sample was then sorted by the random number column.
  4. The top 30% of the sales sample contributed to the holdout sample, while the bottom 70% was retained as the modeling sample. Note: It’s a manual step. No other manipulation is needed to generate the sub-samples.

Excel’s Median and Percentile (statistical) functions were utilized to derive median and percentile distributions of the quantitative variables.

Excel’s Frequency function was utilized to derive the frequency distributions of the qualitative variables.

Reprinted from the forthcoming book 'A Fully Illustrated Guide to Automated Valuation Modeling (AVM) . 100% Excel-based (No need to learn a complicated Stat package'

Friday, April 14, 2017

How would One Analyze a Group of Residential Sales? Town Analyst Explains

(Click on the image to enlarge)
The above graphic shows how to time-adjust all Orlando sales @ 6% annual growth (0.5% X 12) to a target date of 04-30-2017.

(click on the image to enlarge)
This graphic shows the results of the prior sales query. The median time-adjusted sale price in Orlando is $222,600, while the county's median market value is $166,307, resulting in a sales ratio (County MV/Adjusted SP) of 0.747.

This analysis is extracted from Homequants TownAnalyst portal. Try analyzing your Town or County's sales and assessment. It's all free and no registration is required.

Thursday, April 13, 2017

How would Homeowners Relate to Over and Under Assessment, Spatially. Town Analyst Explains

(Click on the image to enlarge)

As indicated in the prior Blog post, since the Median Countywide Sales Ratio points to a ratio of .77, all Towns in this County must be valued close to this ratio so that the Assessment Roll becomes fair and equitable.

While one would expect to see all Blue and Green balloons (legends on top right) on the spatial chart above, many Red and Yellow ones - even side by side - exist, indicating the presence of some seemingly serious under and over-assessed parcels. This scenario is quite normal as - more often than not - similar properties in the same neighborhood do not necessarily sell for the same price, which, in turn, introduces (visual) conflicts amongst individual ratios.

That is why Town-wise statistical summaries are always better front-line indicators. Of course, if an arms-length subject sale significantly deviates from its Town's, it may require a re-examination.    

Obviously, sales must be individually validated and the outliers removed from the sales universe. Better yet, those sales should then be modeled and the resulting model-defined outliers removed as well, thus paving the way for a more scientific universe of ratio-eligible sales.

Please visit the Town Analyst site to analyze if your Town is fairly assessed relative to the County. It's completely free and requires no registration whatsoever.

Disclaimer: This analysis is strictly illustrative. Any commercial or legal use of it is totally prohibited. Always consult a Tax Attorney on statutory requirements.    

How would Homeowners Know if their Town is Over-assessed. Town Analyst Explains

County Sales Ratio
                                         Over-assessed Town
(Click on the image to enlarge)
Sales Ratio (County Market Value to Adjusted Sales) is a better indicator of the proper assessment level than the Assessment Ratio (County Assessed Value to Adjusted Sales), as the latter often includes Town-wise special assessment, abatement, exemption, etc. 

All sales therefore must be adjusted to the taxable status date so the two values are comparable and the resulting ratios are statistically significant. 

In this example, since the Countywide Median Sales Ratio is .77, all Towns in that County must be valued close to this ratio so that the Assessment Roll becomes fair and equitable. Better yet, compare the 25th to 75th percentile of the ratio curve. If the sales are individually validated for ratio eligibility (ours are not), a much wider range – say, 5th to 95th - could be considered.

This Town however shows significantly higher Sales Ratios across the 25th-to-75th percentile curve, making it an over-assessed Town in that County.

While this is the Town-wise summary of ratios, in a separate Blog post we will further drill down to the areas in the Town - spatially - where the incidence of over-assessment could be more severe than their counterparts. 

Please visit the Town Analyst site to analyze your County and Town. It's completely free and requires no registration whatsoever.

Disclaimer: This analysis is strictly illustrative. Any commercial or legal use of it is totally prohibited. Always consult a Tax Attorney on statutory requirements.     

Tuesday, April 11, 2017

Homequant, TownAnalyst & LocValu Disrupting Home Valuation & Assessment

(Click on the image to enlarge)

Why Property Tax Appeals Consultants MUST Use AVM Values - Upfront - to Validate Values on Tax Roll

Here are the reasons why Property Tax Appeals Consultants must use AVM values - upfront - to scrutinize the Tax Roll:
1. Comparing independently developed Automated Valuation Model (AVM) Values to County Market Values (CMV) will point to the areas of failure, meaning over and under valued (leading to assessments) on the Roll. Often, the higher value properties are under-assessed while the lower value properties are over-assessed. If the comparison of "AVM to CMV" points in that direction, the Property Tax Appeals Consultants ("Consultants") must work up a small sample - using comps - to further authenticate the discovery. If the comps sample validates the discovery, Consultants must pay special attention to that over-valued (assessed) segment of the population.
2. In choosing AVM Vendors, Consultants must make sure that those AVM Values are developed specifically for the Tax Status Date. If the Tax Status Date is 01-01-2017 but the AVM Values were developed in June 2016, those values would produce a flawed picture when connecting to the County Values. It is therefore advisable to work with AVM Vendors that develop specialized models for the Appeals industry, per se.
3. Many AVM Vendors also sell Comps Reports. However, the Appeals Consultants must be careful in working with the specialized AVM Vendors that additionally tie their AVMs to the Comps production. In other words, the specialized AVM Vendors who use the model coefficients to adjust their comps via the Comps Adjustment Matrix do not necessarily produce the most optimal comp reports as AVM (top down) and comps reports (bottom up) are diametrically opposite solutions. If a Consultant is looking for a long-term AVM vendor, this is a question always worth asking, meaning if they tie their comps (reports) to the model coefficients.
4. In course of the due diligence Consultants may ask for a sample Adjustment Matrix for the comps production. The sample itself could say a lot about the quality of their valuation process. For example, if the Comps Adjustment Matrix shows a 'Lot SF' coefficient of .10 (10 cents per Lot SF, assuming it's transferred from the regression model producing the AVM model values), just dump them and move on (Hint: it's not multi-collinearity). It would be a clear indication that they are working with totally unqualified "make-shift" modelers. While no AVM Vendor would be forthcoming to show their AVM models, they might share a sample Comps Adjustment Matrix. It could be telling!

FYI - many such Consultants use our free Homequant site to work up samples. Our site, unlike other free valuation sites, allows users to arrive at their own value conclusions. Homequant is a true valuation site with advanced features like specific comps selection and adjustment mechanism, distance matrix, time adjustment, flexible valuation date, multiple ranking methods, interactive spatial interface, comps grid, value analysis, to name a few. In addition, no log-in or registration is required.

To learn more about AVM, you may read our trend-setting books on AVM (search 'Sid Som's Books' on Amazon).

Sid Som
President, Homequant, Inc. 

Monday, April 10, 2017

The concept of "User-defined Subjects" helps Valuation Community at large

Often, potential entrepreneurs fail to make inroads into the Valuation business due to the high cost of the unsold population data, which comprises roughly 95% of the entire population. But that's changing - and forever!
1. Because of Homequant's discovery of the "User-defined Subject" those entrepreneurs can now jump into the business, knowing full well that the sold data alone, generally available free of cost or for a token price from the taxing jurisdictions directly, would alone do the trick, without having to invest a fortune in acquiring, maintaining and warehousing the unsold data which, in terms of quality, is questionable at best, to begin with.    
2. Additionally, unlike an institutional client, an average home buyer – in Free Home Valuation industry where Homequant is positioned – is generally interested in valuing a handful of subject properties so researching and capturing such data tends to further motivate our users. Thus, the user-defined subject data - by design - are cleaner and more reliable.

Valuation entrepreneurs can consider the following sales-based opportunities:
a) Sales validation - validated sales can be sold (back) to the taxing jurisdictions, data vendors, appraisal companies, AVM houses, appeals consultants, mortgage companies, etc.
b) Sales statistics - generic and custom sales statistics can be marketed to many private houses - from appraisal houses to brokerage networks to banks to news agencies to statutory review boards to courts, etc. 

The President of Homequant recently explained their invention: “There are roughly 90 million single family homes in the US and, on average, 5% of that universe annually sells. By inventing the concept of the defined or simulated subject, we are able to value those 95% unsold properties by storing only the 5% sold data. The home valuation industry will soon recognize the significance of our invention.”

To read the entire article, follow this link:
Why is the most accurate free home valuation system today!

Homebuyers MUST Validate Home Values at a Self-directed Site like

Instead of blindly accepting some model-driven estimates from online brokerage sites (we all know those estimates are not explainable) or totally relying on salespeople's comps, prospective homebuyers MUST do their own validations at Self-directed Sites. A free Self-directed Site like will allow users to go through the valuation process in a step-by-step manner via the following FOUR steps:

1. Defining the Subject:  Subject property data from Public Records is often unreliable. As a homeowner, you know your home better than any such records. In Self-directed Sites, you can define/enter your own home data, without being forced to accept the data from Public Records. Here is an example of the basic subject data that you would be entering into a Self-directed system...

  (ONE Click on the image to enlarge)

2. Selecting Comps:  A list of sales - by default - does not become comps; nor do some cute model-derived estimates provide true home values. Comps - however close they are to the subject - require proper selection and adjustments. In a Self-directed Site you will get to enter your own comp selection and adjustment criteria. Here is an example...

3. Ensuring Proper Location:  The concept of Spatial comps (picking comps right off the map) is always a better way as it helps avoid picking comps from the "other side of the freeway" which could be a pricier economic neighborhood, thus inflating the value of your subject. Here is an example...

4. Understanding Final Value:  A statistically significant range, say 25th to 75th percentile, is a more meaningful way to understand comps-based subject valuation than one unique parameter.  Here is an example...

You may try out your subjects at the Homequant site. It is absolutely free and requires no registration whatsoever...

Homebuyers MUST Verify Appraiser's or Assessor's Values in Four Easy Steps

Step 1 - Verify Subject Info

(Click on the image to enlarge)
Subject Info is often inaccurate in Public Records so verify it. For example, our subject property - a single family home - is located in N. Las Vegas, with 1,400 SF building area, 30 years old and sits on a 7,000 SF lot.

Step 2 - Verify Comps Criteria

Sales must be properly selected and adjusted to become comps. In this example, sales within one mile radius and 20% of the Subject's physical attributes (Land, Bldg and Age) are considered to be comps. Also, all comps are adjusted for time (sale dates) at 1% growth per month and are projected out to 12-31-2014 (a tentative purchase date). Therefore, the Subject will be valued as of that date.

To converge all Comps to the Subject's physical attributes, they are adjusted at $10 per Land SF, $100 per Bldg SF and $1,000 per Year of Depreciation. If the subject is located in a pricy neighborhood, increase the adjustments as appropriate.

Step 3 - Verify Final Comps

From a pool of ten Comps - based on the selection criteria set forth - five most recent ones are chosen. Closest distance and least adjustment are the two other selection methods. Before finalizing, always take a look at them spatially (on the map), as the comps from the "other side of a major artery/freeway could be inappropriate." Here are the comps spatially:

Step 4 - Verify Valuation

The final five are producing the subject valuation. While the Median Adjusted Sale Price (ASP) is the most probable value for this subject, 25th to 75th Percentile range is the most probable range. Alternatively, an investor may consider up to the 25th Percentile, while someone bent on outbidding the competition to acquire the property could start above the 75th Percentile. It is therefore quite relative. 

Now, try out your own subjects at our Homequant site. It is absolutely free and requires no registration whatsoever...

Homebuyers MUST Self-select Five Best Comps to Value a Subject

In this example, the subject (a single family home) is located in Orlando, with the following attributes: 15 year old, comprising 1,500 SF of bldg area on a 6,000 SF lot. 

Users need to create a pool of up to ten best comparable sales ("comps") within 1 mile radius of the town, but constraining all three variables - Land, Bldg and Age - to 20% range and time adjusting all incoming sales at 6% annual (.5% per month) growth:

(Click on the image to enhance)

Once the pool is created by the scoring algorithm using the criteria set forth above, users need to select Five best comps (distance radius method, meaning closest distance, has been used to evaluate the pool leading to the final five):
(Click on the image to enlarge)

Let's take a Spatial look into the comps pool as well as the final five (green ones):

(Click on the image to enlarge)

Now, let's take a look at the Comps Grid to understand how to interpret the final value. While the most Probable Value is the Median Adjusted Sale Price (ASP), the 25th to 75th Percentile Range is the most Probable Value Range for a potential home buyer, although an aggressive investor might consider a lower value range, say up to the 25th Percentile, while a buyer bent on outbidding the competition might consider a more lax value range, say above the 75th Percentile.

(Click on the image to enlarge)

Visit the Homequant site to learn how to value a subject using optimal selection criteria and adjustments, zeroing in on the final five, as well as the value parameters.

Three Similar Sales from Three Prior Quarters - Unadjusted for Time of Sale - are NOT Comps

Three Sales from three prior quarters - unadjusted for time - are not comparable sales ("comps"). They must be adjusted for the time of sale. 

The following snapshot shows how to time-adjust comps at 6% GROWTH annually (.5% per mo) to arrive at the subject value as of 04-30-2017, resulting in the older comps to gain more in time value than the newer ones...

(Click on the image to enlarge)
Visit the Homequant site to learn how to use time adjusted comps to value subject properties:

Homebuyers MUST Choose Least Adjustment in Comps Selection over Distance or Sales Recency

Of the three methods to evaluate comparable sales ("comps") - distance radius, sales recency and least adjustment - least adjustment is the most powerful method. 

Since most comps are pooled from a limited distance within the same neighborhood and older sales are generally time adjusted, distance and sales recency become less powerful methods than least adjustment which, in addition to sales time adjustment, incorporates adjustments for property features as well. 

1. Least Adjustment

(Click on the image to enlarge)

Note - while determining least adjustment, signs are ignored. Therefore, +3,000 and -3,000 are tied. In the table above, +2,940 is a smaller number than -8,510, hence being considered the best comp here requiring the least adjustment.

2. Distance Radius

3. Sales Recency

Visit the Homequant site to learn how to use these methods to value subject properties using comp sales:

Sunday, April 9, 2017

Homebuyers MUST Learn to Differentiate between Sales and Comparable Sales

A list of sales - by default - does not become comparable sales ("comps"). Sales - even from the same neighborhood - must be quantitatively adjusted for characteristics and time to become comps. Once adjusted, the differences in property characteristics, distance and time (1/2016 and 1/2017 sales are not the same) become irrelevant. 

So, ask your Broker to show how the comps have been adjusted. Here is a snapshot of the adjustment process:

(Click on the image to enlarge)
Visit the Homequant site and use your comps selection and adjustment to value your own subjects: