A (non-exhaustive) list of ways how to combine forecasts

…or, more generally, probability measures.

  • Weighted average. For binary questions we have even more freedom: Do we average the probabilities \(\mathbb P(A)\) or do we average the odds \(\mathbb P(A)\over\mathbb 1-P(A)\)? Log-odds? Using
    • linear combinations, more commonly known as Bayesian Model averaging, or
    • geometric combinations (for binary questions)?
  • Extremising. For binary questions. If forecasters with independent sources arrive at the same conclusion—say, some event will happen with probability 80%—then we should in fact update towards something higher than 80%.
  • With restrictions on the probability distribution. Sometimes we want the resulting distribution to be in a certain class of distributions, e.g. Gaussian.
  • Via markets/betting. Note that beliefs (probability distributions) generally induce a set of bets you should be happy to take and vice versa. Now imagine a number of traders with a certain amount of money to bet on some event derivatives market. Prices in this market will be moved by these bets and will, in turn, reveal the market participants’ combined beliefs (weighted by the amount of money they traded with). The way we combine beliefs, however, depends a lot on how traders act. The first line in this paragraph already makes the assumption that they are rational (e.g. if they believe an event is virtually certain to happen, they won’t take a bet where they lose 1$ if it happens and win 1$ otherwise). But what about their utility functions? Are they Kelly bettors? Fractional Kelly bettors? Or, more realistically, much more conservative? What about their trades on other markets? Are some of their trades hedges? It turns out that, in an idealised setting where all traders are Kelly-bettors, markets are not only a wealth-weighted average of traders’ beliefs, but the market learns at the optimal rate, the market price reacts exactly as if updating according to Bayes’ Law. The same paper suggests that such a market also performs relatively well compared to the best trader – although I suspect that this might be an artifact of the unrealistic assumption of particularly aggressive traders (i.e. Kelly bettors).

Comparisons

  • Effective Altruism forum user Jsevillamol wrote three posts collecting theoretical properties of, and comparing different aggregation methods (both theoretically and empirically); primarily for binary questions.
    • He concludes that the geometric mean of odds \(\sqrt[n]{o_1\dots o_n}\), where \(o_i = {p_i\over 1-p_i}\), should be one's first choice – this method of combining probabilities \(p_i\) has some theoretical appeal (it satisfies "external Bayesianity") and performs well on actual data.
    • Extremising may improve on this, but by how much (if at all) you want to extremise is very situation-dependent. (Sometimes you’ll want to extremise because markets are underconfident, a psychological bias, sometimes because different market participants actually have access to different pieces of information.)
    • He also notes that arithmetic mean of probabilities ignores information from extreme predictions, i.e. uncertainty overrides nuanced predictions. For example, \({13\%+0.0001\%\over2}\approx{13\%+1\%\over2}\), but the difference between 0.0001% (one in a million) and 1% (one in a hundred) can be huge.
    • One of the main takeaways (from user Simon_M in the comments), however, is that weighting is much more important than how you aggregate your probabilities. That is, you want to give more weight to better forecasters and more recent forecasts.