Eldrow
Eldrow is a Jane Street puzzle—see the link for a detailed explanation.
Observations
Pool of potential solutions
Given a sequence of guesses, think of the “pool of admissible next guesses”:
- This set, as a function of “time”, i.e. the number of guesses, is strictly decreasing; in other words, one more guess can’t increase the number of admissible future guesses.
- This set can be partitioned into up to |{🟩,🟨,⬛}\(^5=3^5=243\) equivalence classes. In particular there is a 1-to-1 correspondence between the marking of a guess and the equivalence class of the next guess.
- So instead of computing the marking (by already assuming a solution), we may just fix the next guess; this also determines the marking.
Greedy strategies need not be optimal
- Intuitively, the best initial guess ought to keep as many words as possible on the table; i.e. given a solution, we might want to choose an initial guess A with 200 possibly correct second guesses (given the constraints after guessing A) over an initial guess with 100 possibly correct second guesses.
- To see why this is not necessarily correct, imagine playing Eldrow in a different language “English++” that consists of all words
AAAAA
,AAAAB
, …,AAAAZ
, and the words of the English language. Assume the solution isAAAAZ
. GuessingAAAAA
“only” leaves us with a pool of 25 possible correct guesses, while guessing any English word withoutA
s, sayBIRCH
, leaves us with however many English words there are without any of the lettersB
,I
,R
,C
, andH
– presumably significantly more than 25. But by design any of the made-up words exceptAAAAZ
(the solution) would have lead to the longest possible sequence of admissible guesses, whereas having ruled out the lettersB
,I
,R
,C
, andH
almost certainly will leave us with a strictly shorter longest possible sequence of admissible guesses.
Exhaustive strategies are slow
- In the made-up language that consists of words of the form
AAAA?
, where?
may be any letter fromA
toZ
, any exhaustive strategy will have to loop through \(26!\approx 4\times10^{26}\) possibilities, whereas any human will immediately spot that any permutation of these “words” yields an admissible sequence of guesses attaining the maximal length. - It seems hard to estimate this for the English language: From eyeballing 3Blue1Brown’s video it looks like “typically” a word reveals between 2-5 bits; this gives a (geometric) average of about 3 bits or, equivalently, the claim that “a typical word roughly cuts the pool into an \(1\over8\)th its size”. But the total number of sequences can be written as a multiple of an expectation over a product of not really independent random variables, each \(\geq 1\), so this is only of limited use: One might expect such an expectation to be dominated by the tails of the individual random variables—this is somewhat similar to the “ergodicity problem in betting/economics”.
When can we be greedy?
…in order not to have to construct every possible sequence of guesses. More precisely, what are conditions on a guess \(g\), the pool of admissible subsequent guesses \(P\), and \(a\in P\) such that the set of sequences of guesses attaining maximum length (with the given constraints) contains a sequence that continues with \(g,a,\dots\)?
(Some notation: Call a position \(i\in{1,2,3,4,5}\) fixed if \(\forall b\in P: a_i = g_i = b_i\).)
- The following criterion is sufficient: (I.e. if the following holds for guesses \(g\) and \(a\in P\), then we have that for every admissible sequence of guesses \(g,b,b',b'',\dots\), where \(b,b',b'',\dots\in P\setminus\{a\}\), that \(g,a,b,b',b'',\dots\) is an admissible sequence of guesses as well. Hence we may proceed greedily and assume \(a\) to be the next guess.)
- Without loss of generality, we assume that no position is fixed (so we potentially have fewer than 5 positions),
- e.g. for \(g=\)
QUACK
, \(a=\)QUINA
, and \(P=\){QUOTE
,QUEUE
,QUINA
}, we identify \(g\) withACK
, \(a\) withINA
, and \(P\) with {OTE
,EUE
,INA
}.
- e.g. for \(g=\)
- All such “reduced words” (with potentially less than 5 letters) satisfy for each letter \(\ell\) that
- \(\forall b\in P\setminus\{a\}: \#\{\ell\in b\} \geq \#\{\ell\in g\}\), if \(\# \{\ell\in a\}\geq \# \{\ell\in g\}\),
- \(\forall b\in P\setminus\{a\}: \#\{\ell\in b\} = \#\{\ell\in g\}\), if \(\#\{\ell\in a\}< \#\{\ell\in g\}\),
- \(\forall b\in P\setminus\{a\},i: a_i=b_i\Leftrightarrow g_i=a_i\),
- if \(\ell=g_i\neq a_i\), but \(a_j=\ell\) (i.e. \(g\) followed by \(a\) marks the i-th position of \(g\) yellow), then
- either \(\forall b\in P\setminus\{a\}: b_i,b_j\neq\ell\),
- or \(\forall b\in P\setminus\{a\}: b_j=\ell\neq b_i\).
- Without loss of generality, we assume that no position is fixed (so we potentially have fewer than 5 positions),
- Note that points 1.–3. above just imply that the sequence of guesses \(g,a,b\) is an admissible one:
- If \(a\) has at least as many
A
s as \(g\), then allA
s of \(g\) must have been marked green or yellow, hence their number cannot decrease anymore. - If \(a\) has fewer
A
s than \(g\), then there is at least oneA
in \(g\) that must have been marked grey and we must have been able to infer the correct number ofA
s with our guess \(a\) – hence all subsequent guesses must have this exact number ofA
s. - Clearly if the letters at the i-th position of the first two guesses coincide, then this determines the letter at the i-th position of the third guess. If, however, the letters at the i-th position of the first two guesses don’t coincide, then this means that the letter at the i-th position of the first guess was not supposed to be there, hence the letter at the i-th position of the third guess must not coincide with that of the first guess.
- This is vacuously true if \(g,a\) (and hence, by 2., \(g,b\)) don’t share any letters. If there exists such a letter \(\ell\), however, say
A
in the first position of \(a\), then 4. tells us that- either all \(b\)s have an
A
in positions that are different from where \(g\) and \(a\) haveA
s, - or
A
will be marked green (in \(a\)), i.e. all \(b\)s must have anA
where \(a\) has it.
- either all \(b\)s have an
- If \(a\) has at least as many
- Examples:
- This covers our toy language consisting of words
AAAA?
: The first four letters are fixed, so we only need to consider the last letter, which for any three different guesses \(g,a,b\) is always different so that 4. is vacuously true. - For \(g=\)
ACK
, \(a=\)INA
this is satisfied by \(P=\){EAE
,PAN
,INA
} (the first subcase of 4.) and by \(P=\){XXA
,XYA
,INA
} (the second subcase). - It is not satisfied in our example above since
OTE
(andEUE
) don’t contain anyA
s, soACK
,INA
,OTE
is not an admissible sequence. - A less trivial counterexample would be \(g=\)
ACK
, \(a=\)INA
, \(P=\){EAE
,POA
,INA
}: Now bothACK
,INA
,POA
andACK
,INA
,EAE
would be admissible sequences of guesses, but the two sequences fix theA
in different positions, so whileACK
,POA
,EAE
(\(g,b,b’\) in our notation) andACK
,EAE
,POA
(\(g,b’,b\)) are both admissible sequences of guesses, \(g,a,b,b’\) and \(g,a,b’,b\) are not admissible.
- This covers our toy language consisting of words
Ways of sampling
- Loop through pairs of
initial_guess
andsolution
; fixing the solution determines the marking for each guess. For each such pair, loop through all possible next guesses, mark each one, then loop through all possible next guesses, etc. - Loop through
initial_guess
es; not fixing the solution, we have up to \(3^5\) possible markings for each guess and then have to choose from one of the compatible guesses. For each marking, we loop through all possible next guesses, etc.- This is basically like Absurdle, which greedily selects the marking that results in the largest pool of future guesses (i.e. possible solutions).
- Note that the above problem with greedy strategies seems to thwart rigorous results again: There is no guarantee that starting with all grey boxes actually is the “most adversarial way possible”. Neither is it (necessarily) the one that leaves the largest pool of solutions/future guesses.
- Loop through
initial_guess
es andsecond_guess
es; this determines the marking of the initial guess; subsequent guesses can be found without ever marking guesses. Since an exhaustive search needs to loop through all possible markings anyway, this seems to be a strict improvement over the previous two methods.
Attempt #1 (Python)
Here is a brief implementation of the third way of sampling above. Note that this could be further optimised by incorporating the above observation about when we may be greedy.
Attempt #1.1 (OCaml)
My first dive into OCaml, and proper functional programming in general, proved a mixed bag (although ultimately I rather enjoyed it):
- Compiled OCaml is fast and doesn’t crash like Python due to memory errors/max recursion limits.
- The latter seems hard to get around since tail recursions are not supported in Python (although some dirty workarounds exist), while OCaml encourages everyone to make recursions tail recursive by providing the tail recursive
List.fold
method. Presumably my implementation still doesn’t make the most of this and various other features of OCaml—please drop me an email if you have ideas how to improve it! - OCaml feels pretty bare-bones-y compared to Python. Apparently there is no native implementation to get the maximum of a list of integers! This results in a fairly steep learning curve. But needing to think how all these convenient helpers are actually implemented also makes you realise how expensive your subroutines are and adapt accordingly, i.e. write efficient code.
- But OCaml is not only faster, I noticed that I never had to look for errors; in fact, whenever code is ready to compile, it ends up doing exactly what I want. I wish I could say the same about Python, but confusing conventions (e.g. concerning lists) regularly send me looking for errors.
- Getting OCaml code to run on/compile for Windows machines is too complicated. After failing to get cross compilation via Dune and various other tutorials to work, I gave up, and continued to run my code on my (slow) laptop.
This outputs the 14-word sequence CRANE
, TIGER
, QUEER
, ODDER
, JOKER
, HOVER
, FOYER
, BOXER
, WOOER
, SOWER
, ROWER
, POWER
, MOWER
, LOWER
.
Presumably this is still far from optimal:
CRANE
was one of the first words I tried,- it has fairly high entropy, i.e. for a randomly chosen solution it reduces the pool of possible solutions by quite a bit (on average),
- in particular it has higher entropy than
CIVIL
, which leaves a huge pool of possible solutions, but gives the shorter 13-word sequenceCIVIL
,ZEBRA
,QUEER
,ODDER
,JOKER
,HOMER
,GONER
,FOYER
,WOOER
,TOWER
,SOWER
,ROWER
,POWER
, hence providing more evidence against choosing initial guesses greedily,
- in particular it has higher entropy than
- and running this algorithm with the initial guess
VIVID
takes a lot longer.
(Keep in mind that ending in ..(OW)ER
might just be an artifact from the potential existence of many 14/13-word sequences of guesses starting with CRANE
/CIVIL
, respectively, and this algorithm’s bias to traverse them in an alphabetical manner.)