Update (15.02.2018): The paper has been published on arXiv, slides of the talk can be found here, the .tex file with comments for each slide here. They are more up-to-date than the material presented below.

This article does by no means intend to present an entire proof. Instead the following should be viewed as complementary material motivating the most interesting steps, much like a talk would do.

## The Setting

The central object of interest is a (big) matrix $H=\frac{1}{\sqrt N}(X_{ij})_ {i,j=1}^N,$ where $X_{ij}$ are complex, centered random variables that are independent bar the constraint that $X_{ij}=\bar{X_{ji}}.$ There is also a growth condition on their moments, but in this expository article we try to focus on what is most important and hence will not go into the more technical details for which this is needed.

Let furthermore $S=(\mathbb E |H_{ij}|^2)_ {i,j=1}^N$ such that its entries are of the same order of magnitude (in $N$).

Denote the largest eigenvalue of $H$ by $\lambda_\text{max}$.

## Eigenvalues ↔ Trace

While bounding $\mathbb E\lambda_\text{max}$ seems hard, we could instead relate it to an object like the trace we already know quite a bit about from moment method/Stieltjes transform proofs.

To do that use Jensen to get $\mathbb E\lambda_\text{max} \leq \mathbb E(\lambda_\text{max}^{2k})^\frac{1}{2k}$ for positive integers $k$. Now since taking even powers makes all eigenvalues positive we have $\mathbb E\lambda_\text{max}^{2k} \leq \mathbb E tr H^{2k}.$

## Trace ↔ Trees

Standard arguments (e.g. following Zeitouni at al., in particular the pages following the definition of a graph associated with an S-word) reveal that $\mathbb E tr H^{2k}\rightarrow \sum_{\pi\in T_k}\text{val}(\pi),$ as $N\rightarrow\infty$ where $T_k$ is the set of planar, ordered and rooted trees with $k$ edges. We will refrain from giving a rigorous definition of $\text{val}(\pi)$ here and instead illustrate what one should think of it with the following example:

Let $\pi$ be the following tree:

Then $\text{val}(\pi) = \sum_{a,b,c,d,e=1}^N S_{ab}S_{bc}S_{cd}S_{be}.$ Note that non-isomorphic trees (in general) do give different values! Also note that this “definition” is slightly different from the one in the paper, since the paper is tailored to an efficient and logically coherent presentation rather than introducing concepts as close as possible to already existing ones. Conceptually it doesn’t make a difference though.

### Re-deriving a Known Result

In the “Stieltjes transform setting” one can use theorem 2.1 of Erdös et al. to establish the bound $\mathbb E\lambda_\text{max}\leq 2\|S\|^\frac{1}{2}$. This bound follows trivially from the above convergence result $\mathbb E tr H^{2k}\rightarrow \sum_{\pi\in T_k}\text{val}(\pi)$ by taking suprema over all the $S$ (i.e. $\sum_b\sup_a S_{ab} = \|S\|$).

### Improving the Result

The central idea of the paper is that we can actually bound the sums over the values of the trees more efficiently and in particular in a way that can be read off each summand’s tree representation.

Let us first show how we can do better in the case of the previous example:

which, by submultiplicativity of the norm, is an improvement by a factor of $z_3 := \frac{\|S^3\|}{\|S\|^3}\leq 1$ (note that we can get rid of the factor $N$ with the help of the $2k$-th root - to let $N$ and $k$ go to infinity simultaneously requires some conditions on the moments of the random variables though). In a similar manner (group $ab,be$ and $bc,cd$ together) we also get the (in general) different upper bound $\|S^2\|^2$, giving an improvement by a factor of $z_2^2 := (\frac{\|S^2\|}{\|S\|^2})^2$. More generally, define $z_j := \frac{\|S^j\|}{\|S\|^j}$ to be the improvement of a sequence of the form $S_{i_1i_2}S_{i_2i_3}\dots S_{i_{j-2}i_{j-1}}$.

## Trees ↔ Dyck Paths

To generalize this we interpret this method of obtaining upper bounds in a more intuitive (i.e. graph(ical)) setting: Note that those sequences correspond to branches of length $j$ in the tree representation. Now trees didn’t turn out to be the easiest representation to work with explicitly. When trying to write computer simulations to rule out some conjectures the bijection between ordered trees and Dyck paths turned out to be helpful and indeed, there is a nice interpretation of above-mentioned sequences of length $j$ in terms of Dyck paths given by the following bijection:

• Start at the root of the tree.
• Go around the tree clockwise, say.
• Every time you go from a parent to its child add an “up” to the Dyck path, every time you go from a child to its parent add a “down” to the Dyck path.

Now the “outermost” sequences translate into $j$-runs in the Dyck path. Note that not every sequence of length $j$ has a corresponding $j$-run in the Dyck path obtained by this method, but this can both improve or worsen the bound (think about that!).

Rewrite the sum over trees

where the expectation is not the one over the random entries of $H$, but over all trees equipped with the uniform measure. We might as well pull out the deterministic part $\|S\|^k$ to get

where $T_j(\pi)$ might count the $j$-up-runs, the $j$-down-runs or some combination of both in the Dyck path corresponding to the tree $\pi$. However, as it turns out we have to get a little bit more creative what to count if we want to obtain explicit results. To figure out what we want to count we first analyze the space of uniformly distributed Dyck paths to figure out what we can count.

## Dyck Paths ↔ Random Walks

While “uniformly distributed” sounds like something that should be easy to analyze it’s actually a non-trivial task to get uniform samples in a systematic (and somewhat efficient) way. Fortunately computer scientists already had to devise means to do so and Arnold et al’s paper provided the following handy formula for the probability of having to go up being at “time” $t$ and height $h$:

While this is slightly harder to analyze than, say, a fair coin toss (where $p_{t,h}$ would simply be $1/2$ for all $t,h$) it still has one important property: It is Markovian (though not time homogeneous). This allows for a discretisation of the time axis.

In the following it is helpful to think of this construction of uniform Dyck paths as a random walk in a potential supported in a triangle with repelling boundary effects. Essentially the idea boils down to the following: In the regions where strong repulsion takes place (near $h=0$ and $2k-t-h=0$) we have to get our hands dirty and prove everything “manually”, but everywhere else we can simply approximate it with constant probabilities.

In particular we have a strong upwards drift when we’re close to $h=0$ and a strong downwards drift when we’re close to $2k-t-h$ – play around a bit, taking different limits in $p_{t,h}$! E.g. $% $ for large $h$ and $p_{t,h}\approx \frac{1}{2} \frac{h+2}{h+1}>\frac{1}{2}$ for small $h$.

So we would like to count up-runs if $h$ is small and down-runs everywhere else. This (i.e. switching between what we count at different positions) introduces some problems of overcounting which can be overcome with the help of the $2k$-th root. So if we set $T_j$ to count just that we end up with the problem of estimating $\mathbb E_\frac{1}{2} \prod_j z_j^{T_{j}}$ instead, where $\mathbb E_\frac{1}{2}$ is the measure over paths of length $n$, say, where at all times and heights the probability of going up is $1/2$.

If we want to get $\lim_{n\rightarrow\infty}(\mathbb E_\frac{1}{2} \prod_j z_j^{T_{j}})^\frac{1}{n}$ (some simple calculations which can be found in the paper lead to this) we could of course try to compute $\mathbb E_\frac{1}{2} \prod_j z_j^{T_{j}}=: a_n$ explicitly for large $n$ and take the $n$-th root afterwards.

## Random Stopping Times

A more elegant way of approaching this problem is to remember Cauchy-Hadamard’s theorem, relating $\lim_n \|a_n\|^{1/n}$ to the radius of convergence of the power series $\sum_n a_n x^n$. We get this power series by introducing a random geometric stopping time $n^*$ s.t. $\mathbb P(n^* = n)=(1-w)w^{n-1}$ and taking expectation over this random stopping time:

where, by abuse of notation, we ignored dependence on $j$, highlighted the function’s dependence on the length of the path $n$ by introducing it as a parameter and (on the l.h.s.) take expectation over both the space of paths and the random stopping time.

To determine the radius of convergence of $\mathbb E z^{T(n^* -1)}$ we use the fact that we actually have a fairly explicit formula for it (following Holst et al., theorem 1).

For further details feel free to take a look at the actual paper and drop me a line!