Tag: AVM Ranking

How AVMetrics Tests AVMs Using our New Testing Methodology

Testing an AVM’s accuracy can actually be quite tricky. You might think that you simply compare an AVM valuation to a corresponding actual sales price – technically a fair sale on the open market – but that’s just the beginning. Here’s why it’s hard:

  • You need to get those matching values and benchmark sales in large quantities – like hundreds of thousands – if you want to cover the whole nation and be able to test different price ranges and property types (AVMetrics compiled close to 4 million valid benchmarks in 2021).
  • You need to scrub out foreclosure sales and other bad benchmarks.
  • And perhaps most difficult, you need to test the AVMs’ valuations BEFORE the corresponding benchmark sale is made public. If you don’t, then the AVM builders, whose business is up-to-date data, will incorporate that price information into their models and essentially invalidate the test. (You can’t really have a test where the subject knows the answer ahead of time.)

Here’s a secret about that third part: some of the AVM builders are also the same companies that are the premier providers of real estate data, including MLS data. What if the models are using MLS data listing price feeds to “anchor” their models based on the listing price of a home? If they are the source of the data, how can you test them before they get the data? We now know how.

We have spent years developing and implementing a solution because we wanted to level the playing field for every AVM builder and model. We ask each AVM to value every home in America each month. They each provide +/-110 million AVM valuations each month. There are over 25 different commercially available AVMs that we test regularly. That adds up to a lot of data.

A few years ago, it wouldn’t have been feasible to accumulate data at that scale. But now that computing and storage costs make it feasible, the AVM builders themselves are enthusiastic about it. They like the idea of a fair and square competition. We now have valuations for every property BEFORE it’s sold, and in fact, before it’s listed.

As we have for well over a decade now, we gather actual sales to use as the benchmarks against which to measure the accuracy of the AVMs.  We scrub these actual sales prices to ensure that they are for arm’s-length transactions between willing buyers and sellers — the best and most reliable indicator of market value. Then we use proprietary algorithms to match benchmark values to the most recent usable AVM estimated value. Using our massive database, we ensure that each model has the same opportunity to predict the sales price of each benchmark.

AVMetrics next performs a variety of statistical analyses on the results, breaking down each individual market, each price range, and each property type, and develops results which characterize each model’s success in terms of precision, usability, error and accuracy.  AVMetrics analyzes trends at the global, market and individual model levels. We also identify where there are strengths and weaknesses and where performance improved or declined.

In the spirit of continuous improvement, AVMetrics provides each model builder an anonymized comprehensive comparative analysis showing where their models stack up against all of the models in the test; this invaluable information facilitates their ongoing efforts to improve their models.

Finally, in addition to quantitative testing, AVMetrics circulates a comprehensive vendor questionnaire semi-annually.  Vendors that wish to participate in the testing process answer roughly 100 parameter, data, methodology, staffing and internal testing questions for each model being tested.  These enable AVMetrics and our clients to understand model differences within both testing and production contexts. The questionnaire also enables us and our clients to satisfy certain regulatory requirements describing the evaluation and selection of models (see OCC 2010-42 and 2011-12).

 

 

 

The Proper Way to Select an AVM

After determining that a transaction or property is suitable for valuation by an Automated Valuation Model (AVM), the first decision one must make is “Which AVM to use?” There are many options – over 20 commercially available AVMs – significantly more than just a few years ago.  While cost and hit rate may be considerations, model accuracy is the ultimate goal. A few additional estimates that are off by more than 20 percent can seriously increase costs. Inaccuracy can increase second-looks, cause loans not to close at all or even stimulate defaults down the road.

Which is the best AVM?

We test the majority of residential models currently available, and in the nationwide test in Figure #1 below, Model AM-39 (not its real name) was the top of the heap. It has the lowest average (absolute) error (MAE) by .1 over the 2nd place model.  Model AM-39 is a full percentage point better than the 5th ranked model, which is good, but that’s not everything. Model AM-39 has the highest percentage of estimates within +/- 10% (PPE10%). Model AM-39 has the 2nd lowest percentage of extreme overvaluations (>=20%, or RT20 Rate), an especially bad type of error indicating a significant overvaluation or Right Tailed error.

Figure 1: National AVM Ranking

If you were shopping for an AVM, you might think that Model AM-39 is the obvious choice. This model performs at the top of the list in just about every measure, right? Well, not so fast. Consider that those measurements are based on testing AVM’s across the entire nation, and if you are only doing business in certain geographies, you might only care about which model or AVM is most accurate in those areas. Figure 2 shows a ranking of models in Nevada, and if your heart was set on Model AM-39, then you would be relieved to see that it is still in the top 5. And, in fact, it performs even better when limited to the State of Nevada. However, three models outperform Model AM-39, with Model X-24 leading the pack in accuracy (albeit with a lower Hit Rate).

Figure 2 Nevada AVM Rankings

So, now you might be sold on Model X-24, but you might still look a little deeper. If, for example, you were a credit union in Clark County, you might focus on performance there. While Clark County is pretty diverse, it’s quite different from most other counties in Nevada. In this case, Figure 3 shows that the best model is still, Model X-24, and it performs very well at avoiding extreme overvaluations.

Figure 3 Clark County AVM Rankings

However, if your Clark County Credit Union is focused on entry level home loans with properties values below $100K, you might want to check just that segment of the market. Figure 4 shows that Model X-24 continues to be the best performer in Clark County for this price tier. Note that the other top models, including Model AM-39, show significant weaknesses as their overvaluation tendency climbs into the teens. This is not a slight difference, and it could be important. Model AM-39 is seven times more likely than Model X-24 to overvalue a property by 20%, and those are high-risk errors.

Figure 4 Clark County AVM Rankings, <$100K Price Tier

Look carefully at the model results in Figure 4 and you’ll see that Model X-24, while being the most accurate and precise, has the lowest hit rate. That means that about 40% of the time, it does not return a value estimate. The implication is: you really want a second and a third AVM option.

Now let’s consider a different lending pattern for the Clark County credit union. Consider a high value property lending program and look at figure 5, which is an analysis of the over-$650K properties and how the models perform in that price tier. Figure 5 shows that Model X-24 is no longer in the top five models. The best performer in Clark County for this price tier is Model AM-39, with 92% within +/-10% and zero overvaluation error in excess of 20%. The other models in the top five also do a good job of valuing properties in this tier.

Figure 5 Clark County AVM Ranking, >$650K Price Tier

Figure 6 summarizes this exercise, which demonstrates the proper thinking when selecting models. First, focus on the market segment that you do business in – don’t use the model that performs best outside your service area. Second, rather than using a single model, you should use several models prioritized into what we call a “Model Preference Table®” in which models are ranked #1, #2, #3 for every segment of your market. Then, as you need to request an evaluation, the system should call the AVM in the #1 spot, and if it doesn’t get an answer, try the next model(s) if available.

Figure 6 Summary of AVM Rankings

In this way, you get the most competent model for the job. Even though one model will test better overall, it won’t be the best model everywhere and for every property type and price range.  In our example, the #1 model in the nation was not the preferred model in every market segment we focused on. If we had focused on another geography or market segment, we almost certainly would have seen a reordering of the rankings and possibly even different models showing up in the top 5. The next quarter’s results might be different as well, because all the models’ developers are constantly recalibrating their algorithms; inputs and conditions are changing, and no one can afford to stand still.

Cascade vs Model Preference Table® – What’s the Difference?

In the AVM world, there is a bit of confusion about what exactly is a “cascade.” It’s time to clear that up.  Over the years, the terms “cascade” and “Model Preference Table®” have been used interchangeably, but at AVMetrics, we draw an important distinction that the industry would do well to adopt as a standard.

In the beginning, as AVM users contemplated which of several available models to use, they hit on the idea of starting with the preferred model, and if it failed to return a result, trying a second model, and then a third, etc.  This rather obvious sequential logic required a ranking, which was available from testing, and was designed to avoid “value shopping.”[1]  More sophisticated users ranked AVMs across many different niches, starting with geographical regions, typically counties.  Using a table, models were ranked across all regions, providing the necessary tool to allow a progression from primary AVM to secondary AVM and so on.

We use the term “Model Preference Table” for this straightforward ranking of AVMs, which can actually be fairly sophisticated if they are ranked within niches that include geography, property type and price range.

More sophisticated users realized that just because a model returned a value does not mean that they should use it.  Models typically deliver some form of confidence in the estimate, either in the form of a confidence score, reliability grade, a “forecasted standard deviation” (FSD) or similar measure derived through testing processes.  Based on these self-measuring outputs from the model, an AVM result can be accepted or rejected (based on testing results) in favor of the next AVM in the Model Preference Table.  This application reflects the merger of MPT rankings with decision logic, which in our terminology makes it a “cascade.”

CriteriaAVMMPT®Cascade“Custom” Cascade
Value EstimateXXXX
AVM RankingXXX
Logic + RankingXX
Risk Tolerance + Logic + RankingX

 

The final nuance is between a simple cascade and a “custom” cascade.  The former simply sets across-the-board risk/confidence limits and rejects value estimates when they fail to meet the standard.  For example, the builder of a simple cascade could choose to reject any value estimate with an FSD > 25%.  A “custom cascade” integrates the risk tolerances of the organization into the decision logic.  That might include lower FSD limits in certain regions or above certain property values, or it might reflect changing appetites for risk based on the application, e.g., HELOC lending decisions vs portfolio marketing applications.

We think that these terms represent significant differences that shouldn’t be ignored or conflated when discussing the application of AVMs.

 

Lee Kennedy, principal and founder of AVMetrics in 2005, has specialized in collateral valuation, AVM testing and related regulation for over three decades.  Over the years, AVMetrics has guided companies through regulatory challenges, helped them meet their AVM validation requirements, and commented on pending regulations. Lee is an author, speaker and expert witness on the testing and use of AVMs. Lee’s conviction is that independent, rigorous validation is the healthiest way to ensure that models serve their business purposes.

[1] OCC 2005-22 (and the 2010 Interagency Appraisal and Evaluation Guidelines) warn against “value shopping” by advising, “If several different valuation tools or AVMs are used for the same property, the institution should adhere to a policy for selecting the most reliable method, rather than the highest value.”

Cascade vs Model Preference Table – What’s the Difference?

In the AVM world, there is a bit of confusion about what exactly is a “cascade.” It’s time to clear that up.  Over the years, the terms “cascade” and “Model Preference Table”TM have been used interchangeably, but at AVMetrics, we draw an important distinction that the industry would do well to adopt as a standard.

In the beginning, as AVM users contemplated which of several available models to use, they hit on the idea of starting with the preferred model, and if it failed to return a result, trying a second model, and then a third, etc.  This rather obvious sequential logic required a ranking, which was available from testing, and was designed to avoid “value shopping.”[1]  More sophisticated users ranked AVMs across many different niches, starting with geographical regions, typically counties.  Using a table, models were ranked across all regions, providing the necessary tool to allow a progression from primary AVM to secondary AVM and so on.

We use the term “Model Preference Table” for this straightforward ranking of AVMs, which can actually be fairly sophisticated if they are ranked within niches that include geography, property type and price range.

More sophisticated users realized that just because a model returned a value does not mean that they should use it.  Models typically deliver some form of confidence in the estimate, either in the form of a confidence score, reliability grade, a “forecasted standard deviation” (FSD) or similar measure derived through testing processes.  Based on these self-measuring outputs from the model, an AVM result can be accepted or rejected (based on testing results) in favor of the next AVM in the Model Preference Table.  This application reflects the merger of MPT rankings with decision logic, which in our terminology makes it a “cascade.”
MPT vs Cascade vs Custom Cascade

The final nuance is between a simple cascade and a “custom” cascade.  The former simply sets across-the-board risk/confidence limits and rejects value estimates when they fail to meet the standard.  For example, the builder of a simple cascade could choose to reject any value estimate with an FSD > 25%.  A “custom cascade” integrates the risk tolerances of the organization into the decision logic.  That might include lower FSD limits in certain regions or above certain property values, or it might reflect changing appetites for risk based on the application, e.g., HELOC lending decisions vs portfolio marketing applications.

We think that these terms represent significant differences that shouldn’t be ignored or conflated when discussing the application of AVMs.

 

Lee Kennedy, principal and founder of AVMetrics in 2005, has specialized in collateral valuation, AVM testing and related regulation for over three decades.  Over the years, AVMetrics has guided companies through regulatory challenges, helped them meet their AVM validation requirements, and commented on pending regulations. Lee is an author, speaker and expert witness on the testing and use of AVMs. Lee’s conviction is that independent, rigorous validation is the healthiest way to ensure that models serve their business purposes.

[1] OCC 2005-22 (and the 2010 Interagency Appraisal and Evaluation Guidelines) warn against “value shopping” by advising, “If several different valuation tools or AVMs are used for the same property, the institution should adhere to a policy for selecting the most reliable method, rather than the highest value.”