By Joseph R. Barr, Polemicist & Statistician.
Full of emotional turmoil, the ups and the downs, the topsy and the turvy, this has been an interesting election year – indeed! With polls pointing at every possible direction, occasionally at the direction of the truth, whichever it happens to be, one might wonder if it’s possible to predict election results, and if so, why did they get Brexit and 2016 US Presidential election wrong.
One might wonder how is it possible for polls to simultaneously point at opposite directions. One day Hilary on top, the next Trump. As The Simpsons fan, Homer, the cartoonish character is apt to changing his views at a drop of a hat, depending on a TV commercial he’d just viewed. Of course, the writers of The Simpsons intended to create a maniacal Homer as a caricature of a [sic] mythical American, but given the event of the past few weeks, one may wonder whether or not the caricature represents a real persona that profusely populate this land.
Apropos, one may wonder how is it possible that one day shows Clinton ahead by a 4 percentage points and the next Trump by 3. Does that mean that 7 percent of the electorate have changed their minds? If so, was it in respond to something on TV or social media? The question is whether this society is completely manic or whether statistical techniques, those pertinent to polling were lacking and if so, whether it’s possible, given obvious constraints, to design a better, a more robust polling techniques.
In the view of this author, home price index offers an apt comparison inasmuch as sample selection is problematic, equally snagging both election predictions and home price futures.
Homes are bought and sold in the open market and transaction information is generally in the public view. So it seems rather straightforward that an index could be built by aggregating sales. For example, one may take all the sales over some time period in some city and average out the prices (or do it more robustly by trimming the means or taking median prices.) However, econometricians are aware that this simple aggregation method, employed by, for example the National Association of Realtors1 (NAR), doesn’t quite faithfully represent an economic reality.
The problem is that annualized level of transactions rarely exceeds 2.5 percent of housing stocks, and furthermore, certain areas are more traded than others; urban and suburban areas tend to transact more often than exurban or rural areas. In an area where few transactions take place, average home price (or trimmed mean, or median price) provides an incomplete information about home pricing. Indeed, it’s common that cheaper homes are sold during downtime as more affluent people are able to wait and sell when prices are more favorable.
In other words, it’s necessary to calibrate the measurements, home sales, in order to glean better insight on pricing. The repeat sale methodology (e.g., the celebrated Case-Shiller) strives to calibrate home price by measuring repeat sales (two sales of the same house.) Other methods (e.g., the Cross-Sectional Gradient Boosting2), evaluates every home and rolls up prices to specific geography (e.g., a city.) Clearly, no method is flawless, each depends on quality of data, especially the velocity of arm’s length3 transactions. Figure 1 depicts cross-sectional gradient boosting home price index.
Figure 1. A cross-sectional gradient boosting home price index, ZIP code 45212 (Cincinnati, Ohio).
Pollsters have failed to predict the Brexit and earlier, they got the outcome of the March 2015 Israel Knesset’s election wrong. Obviously it’s hard to predict close elections, but neither the Brexit, nor the Knesset election outcomes were all that close, in fact although Brexit won over the “Stay” by 3 percentage points, the Knesset elections weren’t close at all; the Likud won 30 Knesset seats to Zionist Union (formerly known as the Labor Party) 24, a wide gap by all account.
Predicting elections and forecasting home prices is an imperfect art; both cases, statistical methodology helps transform post-mortem analysis into an art form (irony intended.) Pre-mortem is sensitive to sample bias whose elimination remains, at times, elusive, as it’s the case in this year’s election.
Figure 2. Predicted by news organizations versus election results with gray representing tossup states.
Bio: Joe Barr is the chief analytics officer at www.HomeUnion.com.
 Joseph R. Barr, Christian L. Redfearn, Narayanan Srinivasan,et al., A Home Price Index: machine learning approach with cross-sectional gradient boosting, forthcoming, 2017
- Trump, Failure of Prediction, and Lessons for Data Scientists
- An NLP Approach to Analyzing Twitter, Trump, and Profanity
- Top KDnuggets tweets, Jun 22-28: #Bayesian #Statistics explained in Simple English; Brexit