Tag: Independent Testing

The Proper Way to Select an AVM

After determining that a transaction or property is suitable for valuation by an Automated Valuation Model (AVM), the first decision one must make is “Which AVM to use?” There are many options – over 20 commercially available AVMs – significantly more than just a few years ago.  While cost and hit rate may be considerations, model accuracy is the ultimate goal. A few additional estimates that are off by more than 20 percent can seriously increase costs. Inaccuracy can increase second-looks, cause loans not to close at all or even stimulate defaults down the road.

Which is the best AVM?

We test the majority of residential models currently available, and in the nationwide test in Figure #1 below, Model AM-39 (not its real name) was the top of the heap. It has the lowest average (absolute) error (MAE) by .1 over the 2nd place model.  Model AM-39 is a full percentage point better than the 5th ranked model, which is good, but that’s not everything. Model AM-39 has the highest percentage of estimates within +/- 10% (PPE10%). Model AM-39 has the 2nd lowest percentage of extreme overvaluations (>=20%, or RT20 Rate), an especially bad type of error indicating a significant overvaluation or Right Tailed error.

Figure 1: National AVM Ranking

If you were shopping for an AVM, you might think that Model AM-39 is the obvious choice. This model performs at the top of the list in just about every measure, right? Well, not so fast. Consider that those measurements are based on testing AVM’s across the entire nation, and if you are only doing business in certain geographies, you might only care about which model or AVM is most accurate in those areas. Figure 2 shows a ranking of models in Nevada, and if your heart was set on Model AM-39, then you would be relieved to see that it is still in the top 5. And, in fact, it performs even better when limited to the State of Nevada. However, three models outperform Model AM-39, with Model X-24 leading the pack in accuracy (albeit with a lower Hit Rate).

Figure 2 Nevada AVM Rankings

So, now you might be sold on Model X-24, but you might still look a little deeper. If, for example, you were a credit union in Clark County, you might focus on performance there. While Clark County is pretty diverse, it’s quite different from most other counties in Nevada. In this case, Figure 3 shows that the best model is still, Model X-24, and it performs very well at avoiding extreme overvaluations.

Figure 3 Clark County AVM Rankings

However, if your Clark County Credit Union is focused on entry level home loans with properties values below $100K, you might want to check just that segment of the market. Figure 4 shows that Model X-24 continues to be the best performer in Clark County for this price tier. Note that the other top models, including Model AM-39, show significant weaknesses as their overvaluation tendency climbs into the teens. This is not a slight difference, and it could be important. Model AM-39 is seven times more likely than Model X-24 to overvalue a property by 20%, and those are high-risk errors.

Figure 4 Clark County AVM Rankings, <$100K Price Tier

Look carefully at the model results in Figure 4 and you’ll see that Model X-24, while being the most accurate and precise, has the lowest hit rate. That means that about 40% of the time, it does not return a value estimate. The implication is: you really want a second and a third AVM option.

Now let’s consider a different lending pattern for the Clark County credit union. Consider a high value property lending program and look at figure 5, which is an analysis of the over-$650K properties and how the models perform in that price tier. Figure 5 shows that Model X-24 is no longer in the top five models. The best performer in Clark County for this price tier is Model AM-39, with 92% within +/-10% and zero overvaluation error in excess of 20%. The other models in the top five also do a good job of valuing properties in this tier.

Figure 5 Clark County AVM Ranking, >$650K Price Tier

Figure 6 summarizes this exercise, which demonstrates the proper thinking when selecting models. First, focus on the market segment that you do business in – don’t use the model that performs best outside your service area. Second, rather than using a single model, you should use several models prioritized into what we call a “Model Preference Table®” in which models are ranked #1, #2, #3 for every segment of your market. Then, as you need to request an evaluation, the system should call the AVM in the #1 spot, and if it doesn’t get an answer, try the next model(s) if available.

Figure 6 Summary of AVM Rankings

In this way, you get the most competent model for the job. Even though one model will test better overall, it won’t be the best model everywhere and for every property type and price range.  In our example, the #1 model in the nation was not the preferred model in every market segment we focused on. If we had focused on another geography or market segment, we almost certainly would have seen a reordering of the rankings and possibly even different models showing up in the top 5. The next quarter’s results might be different as well, because all the models’ developers are constantly recalibrating their algorithms; inputs and conditions are changing, and no one can afford to stand still.

The Wild, Wild West of Automated Valuations

Recently the OCC, FDIC and the Federal Reserve proposed raising the de minimis threshold for residential properties below which appraisals are not required to complete a home loan. Currently, most homes transacting at $250K and above require an appraisal, but Federal regulators propose to raise that level to $400K. A November 30th Wall Street Journal article raises some interesting issues about the topic. They reported that the number of appraisers is down 21% since the housing crisis, but more homes require an appraiser, since more and more homes exceed the threshold each year. The article also states that these factors open the door for cheaper, faster and “largely untested” property valuations based on computer algorithms, also known as Automated Valuation Models (AVMS).

At AVMetrics, we have been continuously testing AVMs for over 15 years, so we’ve seen how they’ve performed over time. As an example, the accompanying chart shows model performance accuracy as measured by mean absolute error, a statistical metric of valuation error.  We utilize many statistical measures of evaluating model accuracy and precision, and they all show significant improvement in AVMs over time. And, as these automated tools get better and the workforce of appraisers continues to shrink, the FFIEC members’ proposed change seems warranted, but that doesn’t mean they don’t have their critics.

Mean Absolute Error of all tested AVM models for the last 10 years

Ratish Bansal of Appraisal Inc was quoted in The Journal describing the state of AVMs as “a wild, wild West,” inviting, “abuse of all kind.” Furthermore, he contrasts that with the voluminous regulatory standards covering the use of appraisals.

We note much of those voluminous standards represent nearly the same quality control that was in place before the Credit Crisis.  In other words, appraisals are not a guarantee against collateral risk.  They are simply one tool in the toolbox – an effective, but comparatively time consuming and expensive tool. Also of note, far from being the “wild, wild west,” AVMs are also governed by regulators, most notably, Appendix B of the Appraisal and Evaluation Guidelines (OOC 2010-42) and Model Risk Management guidance (OCC 2011-12). These regulatory guidelines require that AVM developers be qualified, users of AVMs use robust controls, incentives be appropriate, and models be tested regularly and thoroughly with out-of-sample benchmarks. They require documentation of risk assessments and stipulate that a Board of Directors must oversee the use of all models. In other words, if AVMs were the “the wild, wild west” they would be rooted in a town with oversight of the legendary Wyatt Earp.

My strong feeling is that appraisals should not be a sole and exclusive tool when evaluations can be effectively employed in appropriate, lower-risk scenarios. Appraisers are a valuable and limited resource, and they should be employed at (to use appraisal terminology) their highest and best use.  Trying to be a “manual AVM” is not the highest and best use of a highly qualified appraiser.  Their expertise should be focused on the qualitative aspects of property valuation such as the property condition and market and locational influences. They should also be focused on performing complex valuation assignments in non-homogeneous markets.  AVMs do not capture and analyze the qualitative aspects of a property very well, and they still stumble in markets with highly diverse house stock or houses with less quantifiable attributes such as view properties.

However, several companies are developing ways of merging the robust data processing capabilities of an AVM with the qualitative assessment skills of appraisers.  Today, these products typically use an AVM at their core and then satisfy additionally required evaluation criteria (physical property condition, market and location influences) with an additional service.  For example, a lender can wrap a Property Condition Report (PCR) around the AVM and reconcile that data in support of a lending decision.  This type of “Hybrid valuation” is on the track we’re headed down.  Many companies have already created these types of products for commercial and proprietary use.

We at AVMetrics believe in using the right tool for the job, and we believe there is a place for automated valuations in prudent lending practices. We think the smarter approach would be to marginally raise the de minimis threshold, but simultaneously to provide additional guidance for considering other aspects of a lending decision, specifically, collateral considerations and eligibility criteria for appraisal exemptions such neighborhood homogeneity, property conformity, market conditions and more.

How AVMetrics Tests AVMs

Testing an AVM’s accuracy can actually be quite tricky.  It is easy to get an AVM estimate of value, and you can certainly accept that a fair sale on the open market is the benchmark against which to compare the AVM estimate, but that is really just the starting point.

There are four keys to fair and effective AVM testing, and applying all four can be challenging for many organizations.

  1. Your raw data must be cleaned up, to ensure that there aren’t any “unusable” or “discrepant” characters in the data; differences such as “No.” “#” and “Num,” must be normalized.
  2. Once your test data is “scrubbed clean” it must be assembled in a universal format and it must be large enough to provide reliable test results, even down to the segment level for each property type within each price level within each county, etc. and this might require hundreds of thousands of records. 
  3. Timing must be managed so that each model receives the same sample data at the same time with the same response deadline.
  4. Last, and most difficult, the benchmark sales data must not be available to the models being tested.  In other words, if the model has access to the very recent sales price, it will be able to provide a near-perfect estimate by simply estimating that the value hasn’t changed (or changed very little) in the days or weeks since the sale. 

AVMetrics tests every commercially available AVM continuously and aggregates this testing into a report quarterly; AVMetrics’ testing process meets these criteria and many more, providing a truly objective measure of AVM performance. 

The process starts with the identification of an appropriate sample of properties for which benchmark values have very recently been established.  These are the actual sales prices for arm’s-length transactions between willing buyers and sellers—the best and most reliable indicator of market value.  To properly conduct a “blind” test, these benchmark values must be unavailable or “unknown” to the vendors testing their model(s).  AVMetrics provides in excess of a half million test records annually to AVM vendors (without information as to their benchmark values).  The AVM vendors receive the records simultaneously, run these properties through their model(s) and return the predicted value of each property within 48 hours, along with a number of other model-specific outputs.  These outputs are received by AVMetrics, where the results are evaluated against the benchmark values.  A number of controls are used to ensure fairness, including the following:

  • ensuring that each AVM vendor receives the exact same property list (so no model has any advantage)
  • ensuring that each AVM is given the exact same parameters (since many allow input parameters that can affect the final valuation)
  • ensuring through multiple checks that no model had access the recent sale data, which would provide an unfair advantage

In addition to quantitative testing, AVMetrics circulates a comprehensive vendor questionnaire twice annually.  Vendors that wish to participate in the testing process complete, for each model being tested, roughly 100 parameter, data, methodology, staffing and internal testing questions.  These enable AVMetrics, and more importantly our clients, to understand model differences within both testing and production contexts, and it enables us and our clients to satisfy certain regulatory requirements describing the evaluation and selection of models (see OCC 2010-42).

AVMetrics next performs a variety of statistical analyses on the results, breaking down each individual market, each price range, and each property type, and develops results which characterize each model’s success in terms of precision, usability and accuracy.  AVMetrics analyzes trends at the global, market and individual model levels, identifying where there are strengths and weaknesses, and improvements or declines in performance.

The last step in the process is for AVMetrics to provide an anonymized comprehensive comparative analysis for each model vendor, showing where their models stack up against all of the models in the test; this invaluable information facilitates the continuous improvement of each vendor’s model offerings.

Same Scandal, New Perpetrator

It seems like only yesterday we were lamenting the hubris of Volkswagen, loading software into their TDI models to fake out emissions tests on tens of millions of vehicles.  Here we are again, this time with Mitsubishi.  The only real surprise is that these companies don’t learn.

Hyundai in 2012, Ford in 2014, Volkswagen in 2015, and now Mitsubishi, although this is not even their first scandal.  In the early 2000s, Mitsubishi was embarrassed by defects that were covered up.

It’s surprising that these companies cannot identify the root cause is faulty business processes.  Instead, they root out the responsible parties and do a mea culpa, or the CEOs resign in shame for their leadership failures (as in the case of Volkswagen last year).  Why doesn’t anyone realize that if your system is to self-test for emissions and mileage, eventually you are going to have a problem, because that is not a foolproof system?

The faulty business process is their lack of independent testing.  These emissions and mileage results are vital business inputs, and the integrity of those results is mission critical.  Where are their controls?

Our industry is financial services, where federal regulations have long required independent testing in many areas.  Our specific segment of the industry is the Automated Valuation Model (AVM) business, which has a regulatory mandate for independent validation.  Financial institutions use many different kinds of computer models to improve decision making, and AVMs are one kind of model.  They estimate property values, and for banks that makes loans on property, that comes in handy in dozens of ways.

But, if there are systematic problems with AVMs, for example, if they over-valued everything by 20%, it could cause a huge problem for banks and credit unions.  This is where we come in.  We independently test and validate every commercially available residential AVM on a continuous basis, thoroughly, rigorously and impartially.  And, the beneficiaries are everyone.  Banks and credit unions benefit, borrowers benefit, and even the AVM developers benefit because of the feedback we provide to them as well as the broader consumer confidence in their products.

Certainly it is incumbent upon leaders to create a culture of integrity.  One way of doing that is to do more than admonish people to be honest.  Instead, create a system where there is independent testing, and make sure that everyone knows that their results will be tested.  Voila!  When people know they are being checked, integrity soars, and everyone wins.  Don’t just demand integrity; build it into the process!

Lee Kennedy, principal and founder of AVMetrics in 2005, has specialized in collateral valuation, AVM testing and related regulation for over three decades.  Over the years, AVMetrics has guided companies through regulatory challenges, helped them meet their AVM validation requirements, and commented on pending regulations to help bring clarity and sanity to the situation.  Lee is an author, speaker and expert witness on the testing and use of AVMs.  Lee’s conviction is that independent, rigorous validation is the healthiest way to ensure that models serve their business purposes.  Every commercially available AVM vendor trusts AVMetrics to provide feedback to them on their models, facilitating each model’s continuous improvement.

Congressional Rep. Steve Knight and the Importance of Independent Oversight

Steve Knight and Lee KennedyOn March 10, 2016, AVMetrics™ was pleased to welcome Rep. Steve Knight to the offices.  The team talked to Rep. Knight about a wide range of issues, from the local economy to environmental issues.

Rep. Knight has been touring the District and meeting with constituents to develop an ever-greater understanding of the issues that his constituents face.  Lee and the team explained AVMetrics’™ core business: the independent testing of AVMs.  AVMetrics™ ‘ core philosophy is that independent testing creates an environment of integrity, and that businesses, systems, and markets with integrity do not suffer from fraud, corruption or bubbles.  AVMetrics’™ mission is to ensure integrity in the AVM market to guarantee that AVMs remain a respected and reliable tool.

Rep. Knight also discussed the recent Porter Ranch gas leak disaster, having recently testified before Congress on the subject.  The group discussed the advantages of AVMetrics’™ philosophy of independent oversight as an aspect of governmental response to prevent future disasters.

As a veteran of the U.S. Army, Rep. Knight was interested in AVMetrics’™ challenges and successes as a veteran-owned business.  AVMetrics™ has been a veteran-owned business for almost 12 years but only last year was certified as a “Veteran Owned Small Business” by the U.S. Dept of Veterans Affairs.