Eldrow is a Jane Street puzzle—see the link for a detailed explanation.


Pool of potential solutions

Given a sequence of guesses, think of the “pool of admissible next guesses”:

  • This set, as a function of “time”, i.e. the number of guesses, is strictly decreasing; in other words, one more guess can’t increase the number of admissible future guesses.
  • This set can be partitioned into up to |{🟩,🟨,⬛}\(^5=3^5=243\) equivalence classes. In particular there is a 1-to-1 correspondence between the marking of a guess and the equivalence class of the next guess.
  • So instead of computing the marking (by already assuming a solution), we may just fix the next guess; this also determines the marking.

Greedy strategies need not be optimal

  • Intuitively, the best initial guess ought to keep as many words as possible on the table; i.e. given a solution, we might want to choose an initial guess A with 200 possibly correct second guesses (given the constraints after guessing A) over an initial guess with 100 possibly correct second guesses.
  • To see why this is not necessarily correct, imagine playing Eldrow in a different language “English++” that consists of all words AAAAA,AAAAB, …, AAAAZ, and the words of the English language. Assume the solution is AAAAZ. Guessing AAAAA “only” leaves us with a pool of 25 possible correct guesses, while guessing any English word without As, say BIRCH, leaves us with however many English words there are without any of the letters B,I,R,C, and H – presumably significantly more than 25. But by design any of the made-up words except AAAAZ (the solution) would have lead to the longest possible sequence of admissible guesses, whereas having ruled out the letters B,I,R,C, and H almost certainly will leave us with a strictly shorter longest possible sequence of admissible guesses.

Exhaustive strategies are slow

  • In the made-up language that consists of words of the form AAAA?, where ? may be any letter from A to Z, any exhaustive strategy will have to loop through \(26!\approx 4\times10^{26}\) possibilities, whereas any human will immediately spot that any permutation of these “words” yields an admissible sequence of guesses attaining the maximal length.
  • It seems hard to estimate this for the English language: From eyeballing 3Blue1Brown’s video it looks like “typically” a word reveals between 2-5 bits; this gives a (geometric) average of about 3 bits or, equivalently, the claim that “a typical word roughly cuts the pool into an \(1\over8\)th its size”. But the total number of sequences can be written as a multiple of an expectation over a product of not really independent random variables, each \(\geq 1\), so this is only of limited use: One might expect such an expectation to be dominated by the tails of the individual random variables—this is somewhat similar to the “ergodicity problem in betting/economics”.

When can we be greedy?

…in order not to have to construct every possible sequence of guesses. More precisely, what are conditions on a guess \(g\), the pool of admissible subsequent guesses \(P\), and \(a\in P\) such that the set of sequences of guesses attaining maximum length (with the given constraints) contains a sequence that continues with \(g,a,\dots\)?

(Some notation: Call a position \(i\in{1,2,3,4,5}\) fixed if \(\forall b\in P: a_i = g_i = b_i\).)

  • The following criterion is sufficient: (I.e. if the following holds for guesses \(g\) and \(a\in P\), then we have that for every admissible sequence of guesses \(g,b,b',b'',\dots\), where \(b,b',b'',\dots\in P\setminus\{a\}\), that \(g,a,b,b',b'',\dots\) is an admissible sequence of guesses as well. Hence we may proceed greedily and assume \(a\) to be the next guess.)
    • Without loss of generality, we assume that no position is fixed (so we potentially have fewer than 5 positions),
      • e.g. for \(g=\)QUACK, \(a=\)QUINA, and \(P=\){QUOTE, QUEUE, QUINA}, we identify \(g\) with ACK, \(a\) with INA, and \(P\) with {OTE,EUE,INA}.
    • All such “reduced words” (with potentially less than 5 letters) satisfy for each letter \(\ell\) that
      1. \(\forall b\in P\setminus\{a\}: \#\{\ell\in b\} \geq \#\{\ell\in g\}\), if \(\# \{\ell\in a\}\geq \# \{\ell\in g\}\),
      2. \(\forall b\in P\setminus\{a\}: \#\{\ell\in b\} = \#\{\ell\in g\}\), if \(\#\{\ell\in a\}< \#\{\ell\in g\}\),
      3. \(\forall b\in P\setminus\{a\},i: a_i=b_i\Leftrightarrow g_i=a_i\),
      4. if \(\ell=g_i\neq a_i\), but \(a_j=\ell\) (i.e. \(g\) followed by \(a\) marks the i-th position of \(g\) yellow), then
        • either \(\forall b\in P\setminus\{a\}: b_i,b_j\neq\ell\),
        • or \(\forall b\in P\setminus\{a\}: b_j=\ell\neq b_i\).
  • Note that points 1.–3. above just imply that the sequence of guesses \(g,a,b\) is an admissible one:
    1. If \(a\) has at least as many As as \(g\), then all As of \(g\) must have been marked green or yellow, hence their number cannot decrease anymore.
    2. If \(a\) has fewer As than \(g\), then there is at least one A in \(g\) that must have been marked grey and we must have been able to infer the correct number of As with our guess \(a\) – hence all subsequent guesses must have this exact number of As.
    3. Clearly if the letters at the i-th position of the first two guesses coincide, then this determines the letter at the i-th position of the third guess. If, however, the letters at the i-th position of the first two guesses don’t coincide, then this means that the letter at the i-th position of the first guess was not supposed to be there, hence the letter at the i-th position of the third guess must not coincide with that of the first guess.
    4. This is vacuously true if \(g,a\) (and hence, by 2., \(g,b\)) don’t share any letters. If there exists such a letter \(\ell\), however, say A in the first position of \(a\), then 4. tells us that
      • either all \(b\)s have an A in positions that are different from where \(g\) and \(a\) have As,
      • or A will be marked green (in \(a\)), i.e. all \(b\)s must have an A where \(a\) has it.
  • Examples:
    • This covers our toy language consisting of words AAAA?: The first four letters are fixed, so we only need to consider the last letter, which for any three different guesses \(g,a,b\) is always different so that 4. is vacuously true.
    • For \(g=\)ACK, \(a=\)INA this is satisfied by \(P=\){EAE,PAN,INA} (the first subcase of 4.) and by \(P=\){XXA,XYA,INA} (the second subcase).
    • It is not satisfied in our example above since OTE (and EUE) don’t contain any As, so ACK,INA,OTE is not an admissible sequence.
    • A less trivial counterexample would be \(g=\)ACK, \(a=\)INA, \(P=\){EAE,POA,INA}: Now both ACK,INA,POA and ACK,INA,EAE would be admissible sequences of guesses, but the two sequences fix the A in different positions, so while ACK,POA,EAE (\(g,b,b’\) in our notation) and ACK,EAE,POA (\(g,b’,b\)) are both admissible sequences of guesses, \(g,a,b,b’\) and \(g,a,b’,b\) are not admissible.

Ways of sampling

  1. Loop through pairs of initial_guess and solution; fixing the solution determines the marking for each guess. For each such pair, loop through all possible next guesses, mark each one, then loop through all possible next guesses, etc.
  2. Loop through initial_guesses; not fixing the solution, we have up to \(3^5\) possible markings for each guess and then have to choose from one of the compatible guesses. For each marking, we loop through all possible next guesses, etc.
    • This is basically like Absurdle, which greedily selects the marking that results in the largest pool of future guesses (i.e. possible solutions).
    • Note that the above problem with greedy strategies seems to thwart rigorous results again: There is no guarantee that starting with all grey boxes actually is the “most adversarial way possible”. Neither is it (necessarily) the one that leaves the largest pool of solutions/future guesses.
  3. Loop through initial_guesses and second_guesses; this determines the marking of the initial guess; subsequent guesses can be found without ever marking guesses. Since an exhaustive search needs to loop through all possible markings anyway, this seems to be a strict improvement over the previous two methods.

Attempt #1 (Python)

Here is a brief implementation of the third way of sampling above. Note that this could be further optimised by incorporating the above observation about when we may be greedy.

words = [ABACK, ABASE, ..., ZONAL]

# Computes the length of the longest sequence 
# of admissible guesses with initial guess `g0`.
def depth(g0: str, pool: list[str]) -> int:
    if pool == []:
        return 0
        return 1+max([depth(g1,new_pool(g0,g1,pool)) for g1 in pool])

# Given the set of admissible guesses after `g0`, 
# this function returns the set of admissibel guesses 
# after guessing `g0` followed by `g1`.
def new_pool(g0: str,g1: str,pool: list[str]) -> list[str]:
    return [w for w in pool if satisfies(g0,g1,w)]

# Returns true iff the sequence of guesses `g0`,`g1`,`w` is admissible.
def satisfies(g0: str,g1: str,w: str) -> bool:
    if w == g0 or w == g1:
        return False
    for i in range(5):
        if g0[i] == g1[i]:
            if w[i] != g1[i]:
                return False
        elif w[i] == g0[i]:
            return False
    for l in set(g0+g1): # the set of letters in g0 or g1
        if g0.count(l) > g1.count(l):
            if w.count(l) != g1.count(l):
                return False
        elif w.count(l) < g0.count(l):
            return False
    return True