Many examples are from this Effective Altruism forum post. A more exhaustive (but less accessible) list of priors, edited by Andrew Gelman, can be found here. Without much further ado:

# Practical priors

• Basic (reading) comprehension failure seems to be around 2%.
• Curiously, this is around half of the Lizardman’s constant which is ~4% – referring to the approximate percentage of responses to a poll, survey, or quiz that are comedic or malicious in nature.
• There is a 93.75% chance that the median of a population is between the smallest and largest values in any random sample of five from that population.
• Laplace’s rule of succession.
• TL;DR: After observing some (few) independent samples of the same binary random variable (“coin tosses”, say, with H occurrences of heads and T occurrences of tails) then we should estimate the probability to get heads not by the frequentist guess $$H\over H+T$$, but with $$H+1\over H+1+T+1$$.
• This boils down to being more Bayesian than frequentist, i.e. not only considering data before us, but also starting with a prior beta prior.
• Another prior gives Jeffrey’s prior, where we estimate the probability of getting heads with $$H+{1\over2}\over H+T+1$$.
• Gott’s principle/The Copernican principle.
• Often summarised as: With 50% probability, things will last twice as long as they already have.
• This is a continuous analogue of the special case of the German tank problem where we observed one tank.
• Let $$U={t\over T}$$, where $$t$$ is the time something existed until now and $$T$$ is the (unknown; to be estimated) total time of existence.
• In absence of further information, we should assume $$U$$ to be uniformly distributed in $$[0,1]$$ (“there is nothing special about this moment right now”).
• Thus $$T = {t\over U}$$, a special case of a Pareto distribution.
• Some properties of this distribution (without loss of generality, normalise time so that $$t=1$$):
• $$\mathbb P(T=x)={1\over x^2}dx$$ for all $$x\geq 1$$,
• since $$\mathbb P(T=1/U\geq x) = \mathbb P(U\leq 1/x) = 1/x$$ (take the derivative to get the density)
• This distribution is heavy-tailed, in particular it doesn’t even have a finite expectation.
• Things will stay mostly as they have.
• Liquid markets without too much friction (withdrawal costs, maximal deposits, trading fees, etc.) imply probabilities for everything they trade on.

# Theory

• The principle of maximum entropy
• For example the maximum entropy distribution
• …on a finite set is the uniform distribution,
• …on the positive reals with fixed mean is the exponential distribution,
• …on the reals with fixed mean and variance is the normal distribution.
• Universality/Looking for fixed points
• If some quantity to be estimated can be seen to be (roughly) invariant under some flow, its probability distribution has to be (close to) a fixed point of that flow.
• The by far most common instances of this are the “flows” of adding/multiplying another independent variable and normalising. The multiplicative case can be reduced to the additive one by taking logarithms.
• In the case of finite variances, the additive case leads to a normal distribution, by the central limit theorem. With infinite variances, we may follow a similar approach to get α-stable Lévy processes.
• Example: The usual derivation of stock market prices being modelled as geometric Brownian motions.
• Another set of examples is that of flows whose fixed points are scale invariant. In this case we expect power laws to appear.

# Combining the above to get a forecast

More often than not, the questions we are interested in are more complex than any of the above. However, we may be able to break them down into subquestions for which the above priors are helpful, see e.g. Fermi problems.

Note, however, that the more subquestions there are to be estimated, the bigger the expected error (although, if the indiviual estimates are unbiased and independent, this error is smaller than one might think). Moreover, one needs to be careful as soon as one ends up using information coming from the tail of a distribution (e.g. a 95% or 99% confidence interval); slightly paraphrasing SimonM:

1. People (and their models) in general are not calibrated well, especially at the tails.
2. If they are, it takes a while to tell apart those that are from those that are not.
3. Tails are often dominated by model failure, so asking about 95% CIs tells you more about their model than about “real” probabilities.