The most up to date version of my bachelor’s thesis on this subject can be found on Github.
Now the title may beg the question why one should even be interested in the distribution of critical points of some spin glass model. One possible answer is provided by LeCun’s paper on The Loss Surfaces of Multilayer Networks, which shows that, in the appropriate limit, the distribution of critical points (i.e. local/global minima) of deep neural networks are just those of spherical spin glasses.
This is interesting because it partly explains why neural networks work that well in practice where we cannot hope to find the global minimum and have to work with a local one: Indeed, it turns out that, again in the appropriate limit (roughly “the size of the network going to infinity”), “almost all” local minima have to be arbitrarily close to the global one. In other words: Finding a local minimum in a big neural network, there is a high probability that the corresponding configuration of parameters will perform about as well as the one of the global minimum (or even better because global minima tend to be prone to overfitting).
To be precise the results are not only about local minima, but generally about critical point of finite index. Hence these results are also interesting for numerical considerations (like which optimization scheme to pick), since they also give an idea how saddle points are distributed. At this point one should also mention The statistics of critical points of Gaussian fields on large-dimensional spaces, suggesting that the distribution of eigenvalues of the Hessian of a critical point is a shifted semicircle.
In a recent paper Auffinger, Ben Arous and Černý gave an asymptotic evaluation of the complexity of spherical p-spin spin glass models via random matrix theory. This yields an interesting layered structure of the low critical values for the Hamiltonian of these models. This work aims to provide an overview of findings needed to prove and thoroughly understand the above mentioned results, omitting the more technical proofs to keep it compact.
Intermediary results of independent interest include Wigner’s semicircle law and various large deviation principles.