Author: AVMetrics

Introducing PTM™ – Revolutionizing AVM Testing for Accurate Property Valuations

When it comes to residential property valuation, Automated Valuation Models (AVMs) have a lurking problem. AVM testing is broken and has been for some time, which means that we don’t really know how much we can or should rely on AVMs for accurate valuations.

Testing AVMs seems straightforward: take the AVM’s estimate and compare it to an arm’s length market transaction. The approach is theoretically sound and widely agreed upon but unfortunately no longer possible.

Once you see the problem, you cannot unsee it. The issue lies in the fact that most, if not all, AVMs have access to multiple listing data, including property listing prices. Studies have shown that many AVMs anchor their predictions to these listing prices. While this makes them more accurate when they have listing data, it casts serious doubt on their ability to accurately assess property values in the absence of that information.

Three months of data showing estimates by three AVMs for a single property in Austin, TX.
Three AVMs valuing a home before and after it was listed in the MLS from Realtor.com’s RealEstimateSM.

All this opens up the question: what do we want to use AVMs for? If all we want is to get a good estimate of what price a sale will close at, once we know the listing price, then they are great. However, if the idea is to get an objective estimate of the property’s likely market value to refinance a mortgage or to calculate equity or to measure default risk, then they are… well, it’s hard to say. Current testing methodology can’t determine how accurate they are.

But there is promise on the horizon. After five years of meticulous development and collaboration with vendors/models, AVMetrics is proud to unveil our game-changing Predictive Testing Methodology (PTM™), designed specifically to circumvent the problem that is invalidating all current testing. AVMetrics’ new approach will replace the current methods cluttering the landscape and finally provide a realistic view of AVMs’ predictive capabilities.1

At the heart of PTM™ lies our extensive Model Repository Database (MRD™), housing predictions from every participating AVM for every residential property in the United States – an astonishing 100 to 120 million properties per AVM. With monthly refreshes, this database houses more than a billion records per model and thereby offers unparalleled insights into AVM performance over time.

But tracking historical estimates at massive scale wasn’t enough. To address the influence of listing prices on AVM predictions, we’ve integrated a national MLS database into our methodology. By pinpointing the moment when AVMs gained visibility into listing prices, we can assess predictions for sold properties just before this information influenced the models, which is the key to isolating confirmation bias. While the concept may seem straightforward, the execution is anything but. PTM™ navigates a complex web of factors to ensure a level playing field for all models involved, setting a new standard for AVM testing.

So, how do we restore confidence in AVMs? With PTM™, we’re enabling accurate AVM testing, which in turn paves the way for more accurate property valuations. Those, in turn, empower stakeholders to make informed decisions with confidence. Join us in revolutionizing AVM testing and moving into the future of improved property valuation accuracy. Together, we can unlock new possibilities and drive meaningful change in the industry.

1The majority of the commercially available AVMs support this testing methodology, and there is over two solid years of testing that has been conducted for over 25 models.

Feds to Lenders: Take AVMs Seriously

Regulators are signaling that they are going to be looking at how AVMs are used and whether lenders have appropriately tested them and continuously monitor them for valuation discrimination. This represents a change in the focus on AVMs and the need for all lenders to focus on AVM validation to avoid unfavorable attention from government regulators.

On Feb 12, the FFIEC issued a statement on examinations from regulators. It specifically stated that it didn’t represent a change in principles, nor a change in guidance, and not even a change in focus. It was just a friendly announcement about the exam process, which will focus on whether institutions can identify and mitigate bias in residential property valuations.

Law firm Husch Blackwell published their interpretation a week later. Their analysis included consideration of the June 2023 FFIEC statement on the proposed AVM quality control rule, which would include bias as a “fifth factor” when evaluating AVMs. They interpret these different announcements as part of a theme, an extended signal to the industry that all valuations, and AVMs in particular, are going to receive additional scrutiny. Whether that is because bias is as important as quality or because being unbiased is an inherent aspect of quality, the subject of bias is drawing attention, but the result will be a thorough examination of all practices around valuation, including AVMs, from oversight to validation, training, auditing, etc.

AVM quality has theoretically been an issue that could be enforced by regulators in some circumstances for over a decade. What we’re seeing is not just an expansion from accuracy into questions of bias. We’re also seeing an expansion from banks into all lenders, including non-bank lenders. And, they are signaling that examinations will focus on bias, which is an expansion from the theoretical requirement to an actual, manifest, serious requirement.

#1 AVM in Each County Updated for Q4 2023

Every quarter we analyze all the top AVMs and compile the results. Click on this GIF to see the top AVM in each county for each quarter. As you watch the quarters change, you can see that the colors representing the top honors change frequently.

A gif showing the most recent 8 quarters of AVM performance with the #1 AVM in each county represented by a unique color
The number 1 AVM in each county for the last two years. Each AVM is represented by a unique color.

The main point is how frequently AVM performance changes. That should be no surprise, since market conditions change and AVM’s have different strengths and tendencies. Phoenix has more tract housing, and some AVMs are optimized for that. Cities in the northeast have more row housing, and some models are better there. But AVMs also change – a lot. Whole new models are introduced, but every model is constantly being improved as builders add new data feeds and use new techniques to get better results (with respect to new techniques, over at the AVMNews, we curate articles about AVMs, and we highlight several hundred new research articles about AVMs every year).

Q4 Change Highlights

As ever, if you watch a part of the map, you’ll see several changes. But, in Q3, as markets stabilized at higher interest rate levels, we saw a changing of the guard. Here are some places to watch:

  1. On the the west coast, leadership changed in Los Angeles County and Seattle’s King County.
  2. Most of the counties of Atlanta, GA changed, as did the main counties of Charlotte, NC.
  3. Some less-populated areas had almost wholesale changes, such as Idaho, the Dakotas, Montana, Colorado, Iowa and rural Michigan (but not New Mexico or Utah).

Takeaways

  1. Things change – a lot. Don’t rely on the results from last year or earlier this year. Heck, you can’t even trust last quarter! We compile these results quarterly, but our testing is non-stop, and we can produce new optimizations monthly based on a rolling 3 months or any other time period. Often, 3 months’ of data are required to get a large enough sample in smaller regions, but we can slice it every way imaginable.
  2. Use more than one AVM. It’s not obvious from a map showing just one AVM in each county, but if you think about what’s going on to produce these results, you’ll realize that AVMs have different strengths and there are a lot of them climbing all over each other to get to the top of the ranking. So, when you’re valuing a particular property, you just don’t know if it will be a good candidate for even the best AVM. When that AVM produces a result with low confidence, there’s a very good chance that another AVM will produce a reasonable estimate. Why not be able to take three, four or five bites at the apple?

#1 AVM in Each County Updated for Q3 2023

Every quarter we analyze all the top AVMs and compile the results. Click on this GIF to see the top AVM in each county for each quarter. As you watch the quarters change, you can see that the colors representing the top honors change frequently.

map of the united states cycling between 8 images showing a different color for each AVM that is #1 in the county. The colors change rapidly and substantially indicating a very dynamic market where leadership as "the best AVM" changes a lot.
Q3 2023 update

The main point is how frequently AVM performance changes. That should be no surprise, since market conditions change, and AVM’s have different strengths and tendencies. Phoenix has more tract housing, and some AVMs are optimized for that. Cities in the northeast have more row housing, and some models are better there. But AVMs also change – a lot. Whole new models are introduced, but every model is constantly being improved as builders add new data feeds and use new techniques to get better results (with respect to new techniques, over at the AVMNews, we curate articles about AVMs, and we highlight several dozen new research articles about AVMs every year).

Q3 Change Highlights

As ever, if you watch a part of the map, you’ll see several changes. But, in Q3, as markets stabilized at higher interest rate levels, we saw a changing of the guard. Here are some places to watch:

  1. On the the west coast, leadership changed in Orange County and many smaller counties.
  2. Several less-populated states had almost wholesale changes, such as the Dakotas, Montana, New Mexico and Mississippi.
  3. Dozens of suburban counties changed around other metro areas, from Houston and Dallas to Chicago and D.C.

Takeaways

Things change – a lot. Don’t rely on the results from last year or earlier this year. Heck, you can’t even trust last quarter! We compile these results quarterly, but our testing is non-stop, and we can produce new optimizations monthly based on a rolling 3 months or any other time period. Often, 3 months’ of data are required to get a large enough sample in smaller regions, but we can slice it every way imaginable.

Use more than one AVM. It’s not obvious from a map showing just one AVM in each county, but if you think about what’s going on to produce these results, you’ll realize that AVMs have different strengths and there are a lot of them climbing all over each other to get to the top of the ranking. So, when you’re valuing a particular property, you just don’t know if it will be a good candidate for even the best AVM. When that AVM produces a result with low confidence, there’s a very good chance that another AVM will produce a reasonable estimate. Why not be able to take three bites at the apple?

Our Perspective on Brookings’ AVM Whitepaper

As the publisher of the AVMNews, we felt compelled to respond to the Brookings’ very thorough whitepaper on AVMs (Automated Valuation Models) published on October 12, 2023, and share our thoughts on the recommendations and insights presented therein.

First and foremost, I would like to acknowledge the thoroughness and dedication with which Brookings conducted their research. Their whitepaper contains valuable observations, clear explanations and wise recommendations that unsurprisingly align with our own perspective on AVMs.

Here’s our stance on key points from Brookings’ whitepaper:

  1. Expanding Public Transparency: We wholeheartedly support increased transparency in the AVM industry. In fact, Lee’s recent service on the TAF IAC AVM Task Force led to a report recommending greater transparency measures. Transparency not only fosters trust but also enhances the overall reliability of AVMs.
  2. Disclosing More Information to Affected Individuals: We are strong advocates for disclosing AVM accuracy and precision measures to the public. Lee’s second Task Force report also recommended the implementation of a universal AVM confidence score. This kind of information empowers individuals with a clearer understanding of AVM results.
  3. Guaranteeing Evaluations Are Independent: Ensuring the independence of evaluations is paramount. Compliance with this existing requirement should be non-negotiable, and we fully support this recommendation.
  4. Encouraging the Search for Less Discriminatory AVMs: Promoting the development and use of less discriminatory AVMs aligns with our goals. We view this as a straightforward step toward fairer AVM practices.

Regarding Brookings’ additional points 5, 6, and 7, we find them to be aspirational but not necessarily practical in the current landscape. In the case of #6, regulating Zillow, it appears that existing and proposed regulations adequately cover entities like Zillow, provided they use AVMs in lending.

While we appreciate the depth of Brookings’ research, we would like to address a few misconceptions within their paper:

  1. Lender Grade vs. Platform AVMs: We firmly believe that there is a distinction between lender-grade and platform AVMs, as evidenced by our testing and assessments. Variations exist not only between AVM providers but also within the different levels of AVMs offered by a single provider.
  2. “AVM Evaluators… Are Not Demonstrably Informing the Public:” We take exception to this statement. We actively contribute to public knowledge through articles, analyses, newsletters (AVMNews and our State of AVMs), quarterly GIF, a comprehensive Glossary, and participation in industry groups, task forces. We also serve the public by making AVM education available, and we would have been more than willing to collaborate or consult with Brookings during their research.

But, we’re obligated not to just give away our analysis or publish it. Our partners in the industry provide us their value estimates and we provide our analysis back to them. It’s a major way in which they improve, because they’re able to see 1) an independent test of accuracy, and 2) a comparison to other AVMs. They can see where they’re being beaten, which means opportunity for improvement. But, in order to participate, they require some confidentiality to protect their IP and reputation.

We should comment on the concept of independence that Brookings emphasized. Independent evaluation is exceedingly important in our opinion, as the only independent AVM evaluator. Brookings mentioned in passing that Mercury is not independent, but they also mentioned Fitch as an independent evaluator. We agree with Brookings that a vendor who also sells, builds, resells, uses or advocates for certain AVMs may be biased (or may appear to be biased) in auditing them; validation must be able to “effectively challenge” the models being tested.

We do not believe Fitch satisfies ongoing independent testing, validation and documentation of testing which requires resources with the competencies and influences to effectively challenge AVM models. Current guidelines require validation to be performed in real-world conditions, to be ongoing, and to be reported on at least annually.  When there are changes to the models, the business environment or the marketplace, the models need to be re-validated.

Fitch’s assessment of AVM providers is focused on each vendor’s model testing results, review of management and staff experience, data sourcing, technology effectiveness and quality control procedures. Fitch’s methodology of relying on analyses obtained from the AVM providers’ model testing results would not categorize them as an “independent AVM evaluator,” as reliance on testing done by the AVM providers themselves does not meet any definition of “independent” per existing regulatory guidance. AVMetrics is in no way beholden to the AVM developers or the resellers in any way; we draw no income from selling, developing, or using AVM products.

For almost two decades, we have continued to test AVMs against hundreds of thousands (sometimes millions) of transactions per quarter and use a variety of techniques to level the playing field between AVMs. We provide detailed and transparent statistical summaries and insights to our newsletter readers, and we publish charts that give insights into the depth and thoroughness of our analysis, whereas we have not observed this from other testing entities. Our research spanning eighteen years shows that even overall good-preforming models are less reliable in certain circumstances, so one of the less obvious risks that we would highlight is reliance on a “good” model that is poor in a specific geography, price level or property type. Models should be tested in each one of these subcategories in order to assess their reliability and risk profile. Identifying “reliable models” isn’t straightforward. Performance varies over time as market conditions change and models are tweaked. Performance also varies between locations, so a model that is extremely reliable overall may not be effective in a specific region. Furthermore, models that are effective overall may not be effective at all price levels, for example: low-priced entry-level homes or high-priced homes. Finally, very effective models will also produce estimates that they admit have lower confidence scores (and higher FSDs), and which should in all prudence be avoided, but without adequate testing and understanding may be inadvertently relied upon. Proper testing and controls can mitigate these problems.

Regarding cascades, the Brookings’ paper leans on cascades as an important part of the solution for less discriminatory AVMs. We agree with Brookings: a cascade is the most sophisticated way to use AVMs.  It maximizes accuracy and minimizes forecast error and risk. By subscribing to multiple AVMs, you can rank-order them to choose the highest performing AVM for each situation, which we call using a Model Preference Table™. The best possible AVM selection approach is a cascade, which combines that MPT™ with business logic to define when an AVM’s response is acceptable and when it should be set aside for the next AVM or another form of valuation.  The business logic can incorporate the Forecast Standard Deviation provided by the model and the institution’s own risk-tolerance to determine when a value estimate is acceptable.

Mark Sennott (industry insider) recently published a whitepaper describing current issues with cascades, namely that some AVM resellers will give favorable positions to AVMs based on favors, pricing or other factors that do NOT include performance as evaluated by independent firms like AVMetrics. This goes to the additional transparency for which Brookings’ advocates. We’re all in favor.

We actually see a strong parallel between Mark Sennott’s whitepaper and the Brookings’ paper. Brookings makes the case to regulators, whereas Sennott was speaking to the AVM industry, but both of them argue for more transparency and responsible leadership by the industry. Sennott appears to be very prescient, in retrospect.

In order to ensure that adequate testing is done regularly we recommend that a control be implemented to create transparency around how the GSE’s or other originators are performing their testing. This could be done in a variety of ways. One method might require the GSE or lending institution to indicate their last AVM testing date on each appraisal waiver. Regardless of how it’s done, the goal would be to create a mechanism that would increase commitment to appropriate testing. The GSE’s could provide a leadership role by demonstrating how they would like lending institutions to demonstrate their independent AVM testing as required by OCC 2010-42 and 2011-12.

In conclusion, we appreciate Brookings’ dedication to asking questions and providing perspective on the AVM industry. We share their goals for transparency, fairness, and accuracy. We believe that open dialogue and collaboration by all the valuation industry participants are the keys to advancing the responsible use of AVMs.

We look forward to continuing our contributions to the AVM community and working toward a brighter future for this essential technology.

#1 AVM in Each County Updated for Q2 2023

Every quarter we analyze all the top AVMs and compile the results. This GIF shows the top AVM in each county for each quarter, and as it spools through the quarters, you can see that the top honors change hands frequently.

map of the united states cycling between 8 images showing a different color for each AVM that is #1 in the county. The colors change rapidly and substantially indicating a very dynamic market where leadership as "the best AVM" changes a lot.
Click the image to see the GIF cycle between quarters.

The main point is how frequently AVM performance changes. That should be no surprise, since market conditions change, and AVM’s have different strengths and tendencies. Phoenix has more tract housing, and some AVMs are optimized for that. Cities in the northeast have more row housing, and some models are better there. But AVMs also change – a lot. Whole new models are introduced, but every model is constantly being improved as builders add new data feeds and use new techniques to get better results. (With respect to new techniques, over at the AVMNews, we curate articles about AVMs, and we highlight several dozen new research articles about AVMs every year.)

Q2 Change Highlights

As ever, if you watch a part of the map, you’ll see several changes. But, in Q2, as markets stabilized at higher interest rate levels, we saw a changing of the guard. Here are some places to watch:

  1. On the the west coast, leadership in Los Angeles, Silicon Valley and Seattle changed.
  2. Almost all of the Rocky Mountain states changed.
  3. Most of the counties around Washington D.C. and New York City changed. 

Takeaways

Things change – a lot. Don’t rely on the results from last year or earlier this year. Heck, you can’t even trust last quarter! We compile these results quarterly, but our testing is non-stop, and we can produce new optimizations monthly based on a rolling 3 months or any other time period. Often, 3 months’ of data are required to get a large enough sample in smaller regions, but we can slice it every way imaginable.

Use more than one AVM. It’s not obvious from a map showing just one AVM in each county, but if you think about what’s going on to produce these results, you’ll realize that AVMs have different strengths and there are a lot of them climbing all over each other to get to the top of the ranking. So, when you’re valuing a particular property, you just don’t know if it will be a good candidate for even the best AVM. When that AVM produces a result with low confidence, there’s a very good chance that another AVM will produce a reasonable estimate. Why not be able to take three bites at the apple?

#1 AVM in Each County Updated for Q4 2022

Q4’s update is remarkable for the amount of change in the map. Every quarter we analyze all the top AVMs and compile the results. This GIF shows the top AVM in each county for each quarter, and as it spools through the quarters, you can see where the top honors change hands.

Map of the united states in which every county is a color representing the best AVM according to the legend. Every second the map updates to the next quarter, starting in Q1 of 2021 and going through 7 updates to Q4 of 2022. The colors change quite rapidly, showing a lot of dynamism in the AVM rankings.
Click the image to see all eight quarters of 2021 and 2022. The number one AVM in each county is represented by its corresponding color in the legend.

The main point is how frequently AVM performance changes. That should be no surprise, since market conditions change, and AVM’s have different strengths and tendencies. Phoenix has more tract housing, and some AVMs are optimized for that. Cities in the northeast have more row housing, and some models are better there. But AVMs also change – a lot. Whole new models are introduced, but every model is constantly being improved as builders add new data feeds and use new techniques to get better results. (With respect to new techniques, over at the AVMNews, we curate articles about AVMs, and we highlight several dozen new research articles about AVMs every year.)

Q4 Change Highlights

As ever, if you watch a part of the map, you’ll see several changes. But, in Q4, with markets changing significantly as interest rates rose and then fell, we saw a real upending of the order. Here are some places to watch:

  1. Most of the west coast changed from blue to the orange of Model B, except Orange County, ironically, which is tan for Model H.
  2. Seattle and Portland changed from blue to the Model B orange.
  3. Several upper Rocky Mountain states changed from pink to the green of Model K. (Visually it’s striking, but in terms of population, admittedly less important.)
  4. Almost every county in Utah changed.
  5. A lot of rural Texas changed from gray to the blue of Model A, so those guys took some territory back.
  6. But, Model A also gave away leadership in Chicago and the surrounding counties, which went from blue to orange (Model B) or tan (Model H).
  7. New York was completely shuffled. Surprisingly, the same changes held in NY City and upstate: counties changed from orange to blue (Model A got some more back), and those that were green or blue changed to orange or tan.
  8. All the counties around Washington D.C. went from blue to orange (Model B wins again).
  9. Just west of that, in West Virginia, everything changed from blue to the Kelly green of Model AA.

Takeaways

Things change – a lot. Don’t rely on the results from last year or earlier this year. Heck, you can’t even trust last quarter! We compile these results quarterly, but our testing is non-stop, and we can produce new optimizations monthly based on a rolling 3 months or any other time period. Often, 3 months’ of data are required to get a large enough sample in smaller regions, but we can slice it every way imaginable.

Use more than one AVM. It’s not obvious from a map showing just one AVM in each county, but if you think about what’s going on to produce these results, you’ll realize that AVMs have different strengths and there are a lot of them climbing all over each other to get to the top of the ranking. So, when you’re valuing a particular property, you just don’t know if it will be a good candidate for even the best AVM. When that AVM produces a result with low confidence, there’s a very good chance that another AVM will produce a reasonable estimate. Why not be able to take three bites at the apple?

Honors for the #1 AVM Changes Hands in Q3

Graphic showing which AVM was tops in each county over the last 8 quarters. Shows constantly changing colors. 16 or 17 AVMs claim the top spot in at least one county each quarter.
The graphic shows which AVM was tops in each county over the last 8 quarters.

We’ve got the update for Q3 2022. Our top AVM GIF shows the #1 AVM in each county going back 8 quarters. This graphic demonstrates why we never recommend using a single AVM. Again, there are 19 AVMs in the most recent quarter that are “tops” in at least one county!

The expert approach is to use a Model Preference Table® to identify the best AVM in each region. (Actually, our MPT® typically identifies the top 3 AVMs in each county.) Or, you could use a cascade to tap into the best AVM for whatever your application.

This time, the Seattle area and the Los Angeles region stayed light blue, just like the previous quarter. But, most of the populous counties in Northern California changed hands. Sacramento was the exception, but Santa Clara, Alameda, Contra Costa, San Mateo and some smaller counties like Calaveras (which means “skulls”) changed sweaters. Together they account for 6 million northern Californians who just got a new champion AVM.

A number of rural states changed hands almost completely… again. New Mexico, Wyoming, North Dakota, South Dakota, Montana and Nebraska as well as Arkansas, Mississippi, Alabama and rural Georgia crowned different champions for most counties. I could go on.

All that goes to show the importance of using multiple AVMs and getting intelligence on how accurate and precise each AVM is.