Saturday, May 18, 2019

Why Valuation Modeling Error Rates across Property Tax Rolls are not Comparable

Analysis of the Assessment Rolls of major Jurisdictions requires advanced technical training and quantitative knowledge. It's hilarious when a local staff reporter settles the score annually, with a lengthy and superficial article and the politicians run with it, silencing the unhappy taxpayers. And the cycle continues, year in and year out.

A recent local newspaper article indicated that the percent error rates "of the five largest cities for which studies have been completed in the last two years...include New York at 17.6 percent, Chicago at 25.1 percent and Philadelphia at 20.2 percent. Houston’s error rate was 7 percent in its most recent study, and Phoenix’s was 8.1 percent."

Though Automated Valuation Modeling ("AVM") was used to develop all of the above Assessment Rolls ("Roll"), the modeling error rates as indicated above (generally defined by the Coefficient of Dispersion or "COD" of the underlying AVM) are not comparable. 

While there are general AVM guidelines, they are not like the SAT or GRE. In fact, the development of AVMs is highly subjective, depending largely on the acumen of the in-house modeler(s) or the hired consultant. Since the actual models are not published, post re-validation of those model CODs, externally, is even more subjective and circular.  

So, why do I say the above CODs are not comparable? Here are the reasons:

1. Sales Validation -- All market AVMs are developed off of recent, arms-length sales. Thus, all sales have to be validated and then a random or a stratified random sample of arms-length sales serves as the modeling sample. Of course, there is no hard science behind the sales validation process. Therefore, if Jurisdiction X considers all of its border-line cases as arms-length, while Jurisdiction Y aggressively removes them from its identical universe, the resulting AVM of the former, ceteris paribus, will produce a higher COD than the latter's. Unfortunately, when the local reporter compares the competing CODs, s/he will have no idea as to how the sales were validated by the respective jurisdictions.

2. Sales Sampling -- From the universe of the validated arms-length sales, a sample properly representing the overall population, is then derived. In fact, the sales sample must statistically "represent" the population, failing which the resulting AVM will be invalid, paving the way for a flawed Assessment Roll (statutorily, an Assessment Roll must be fair and equitable). Again, there is no hard and fast rule as to the extraction of the sales sample. If Jurisdiction X restricts the representative test to 1st-to-99th percentile range while Jurisdiction Y takes a more lax approach of 5th-to-95th percentile, the AVM of X, ceteris paribus, will have higher COD than Y's. Does the local reporter even know of this requirement, let alone performing the test?   

3. Removal of Outliers -- As part of the model optimization, a set of outliers are systematically identified and removed. While there are various methods to identify and remove outliers, the (sales) ratio percentile range is a common one. Of course, some would use a very conservative range or approach while others (those who are obsessed with better stats, i.e., lower CODs) would be more aggressive. Ceteris paribus, the modeler who conservatively defines and removes outliers below the 1st percentile and above the 99th percentile range will have a much higher model COD than someone who aggressively removes all below the 5th and above the 95th percentile range. Case in point: Chicago's 25.1 vs. Houston's 7. Think about the reporter who would try to justify either. Perhaps, some reporters already have.

4. Sub-market Modeling -- Many modelers and consultants build their AVMs bottom-up, instead of the customary top-down. Let me explain what I mean by bottom-up modeling. Let's say that the Assessment is for a County as a whole, though the County comprises five Towns. Now, if the modeling takes place at the Town level (bottom-up), instead of at the customary County level (top-down), the average COD would be lower than the customary top-down modeling, despite the fact that the objective remains unchanged: To produce a fair and equitable County-wide Roll. The problem of this type of bottom-up modeling is that there will be significant noise along the Town lines, generating significant amount of inconsistent values. Will the local reporter ever know any of this, considering that the models are rarely made public?

5. Spatial Tests -- Irrespective of #4 above, publications of Town-wise results are not common. Again, while the County-wide COD could be compliant, the Town-wise CODs could be far apart. If Town-1 is highly urban (requires complex modeling, hence higher COD) whereas Town-5 is highly suburban (involves easier modeling, hence much lower COD), the CODs are expected to be quite different. Of course, the modeling criteria (sales sampling, outliers, etc.) must remain uniform across all Towns. Absent publications of the actual models, the taxpayer advocacy groups must, at least, insist on CODs by major sub-markets (e.g., Towns), in addition to the system-wide COD. They must also insist on knowing if the modeling criteria were uniform across all major sub-markets. Do you think the local reporter will be in the know as to how the modeling had taken place? 
6. Equity Analysis -- A system-wide COD is just the beginning. It does not confirm that the Roll is fair and equitable. Let's assume that the reported COD is 15, which is compliant, a priori. Now, let's also assume that the unreported Town-wise average sales ratios range between 85 and 115. Since the Rolls tend to be regressive, it's highly likely that the 85 ratio would pertain to the richest Town in the County while the 115 would represent one of the middle-class Towns. In essence, the poor and middle-class neighborhoods perennially subsidize their rich counterparts. While the rich would make a lot of splash about their Roll values, they would be totally quiet when they sell their homes at twice the same Roll values. The average ratio of 85 does not mean that all homes in that Town are assessed strictly at or around that level. In fact, the 1st-to-99th range could be 70 to 100 (generally wider), while the Town with an average ratio of 115 could have a 1st-to-99th range of 100 to 130. Now, let's compare 70 to 75 with 125 to 130. How fair is that, local reporter?

7. Data Maintenance -- Intra (i.e., within the Jurisdiction) comparison: Sales are dressed and staged so the sale data are inherently cleaner and more up-to-date than the unsold property data, thereby producing lower CODs for the modeling sample. Also, the sold parcels with data inconsistencies fall off by way of model outliers, simply to resurface upon application of the model on to the population. It's a classic hide and seek, unless those data errors are heeded to before the model application. Of course, nobody knows what happens behind the curtain. Generally, the local MLS plays a big role in (indirectly) forcing the Jurisdiction to keep the sale data up-to-date (obviously, sale data are easy picking by the media and other interested groups). Inter (i.e., across Jurisdictions) comparison: Two adjoining Jurisdictions may have vastly different outlooks in terms of managing the population data. One may be very proactive while the other may be reactive, at best. Ceteris paribus, the lot fraction defective of the former Roll would be significantly lower, generating far fewer tax appeals (a good metric to follow) than the latter's. Will this be known to the local reporter?

8. Model Testing -- The modelers and consultants who apply their draft models on to the mutually exclusive hold-out samples, ceteris paribus, will have more sound and reliable Rolls than those who tend to skip this extremely important modeling step. This step helps identify the errors and inconsistencies - from sample selection to outliers to optimization to spatial ratios and CODs - in draft models, often to the extent that they get sent back and reworked from square one. The hold-out sample must have the same attributes of the modeling sample (and, in turn, of the population) so this test is one of the most established ways to finalize a model, leading to its successful application. Again, the Jurisdiction that methodically performs this step produces a more sound and reliable Roll, with potentially far fewer tax appeals than its counterpart that boldly skips it. Will the local reporter ask this question?

9. Forward Sales Ratio Study -- A forward sales ratio study would be an ideal way to begin the Roll investigation process. For example, if the Roll was developed off of 2018 calendar year sales, it could be tested against a set of forward sales ratios (comprising validated Q1/Q2-2019 sales, etc.). In order to bolster the forward sales sample, seasoned listings could also be added. The forward sales ratio test, when time-adjusted back to the valuation date, must produce results that closely parallel that of the published Roll. Therefore, before rushing to hire expensive consultants, the advocacy groups should consider hiring local analysts to compile the forward sales sample and run the simple sales ratio test. Those ratios must be studied multi-dimensionally, meaning spatially/major sub-markets, value ranges, non-waterfront vs. waterfront, Non-GIS vs. GIS, etc. If the results turn out very different, a challenger AVM is in order. At that point, instead of hiring a so-called industry consultant (who would not shoot himself in the foot), an outside economic consulting firm is always preferable as that firm will provide real analysis with a coordinated strategic action plan.   

So, what is the solution? We need to replace this inherently regressive Property Tax System with the middle-class friendly Progressive Consumption Taxes (see the link below).

No wonder, the well-respected billionaire businessmen like Warren Buffett and Sam Zell have written the print media off.

