proof by exampleMy personal blog on mathematics, machine learning and languages.
http://peter.muehlbacher.me/
How far off is Cauchy-Schwarz?<blockquote>
<p>This post can also be found <a href="https://iidposts.wordpress.com/2019/05/27/how-far-off-is-cauchy-schwarz/">here</a> on a blog that is run jointly with <a href="https://warwick.ac.uk/fac/sci/maths/people/staff/mastrostefano/">D. Mastrostefano</a>.</p>
</blockquote>
<h1 id="introduction">Introduction</h1>
<p>Cauchy-Schwarz is both a very simple and a very powerful bound. We won’t even try to give an extensive list of possible applications, but concentrate instead on a classical one in probability theory.</p>
<p>First notice that Cauchy-Schwarz can simply be interpreted as a fancy way of saying that $\vert\cos(x)\vert\leq 1$ for all real $x$ by the following identity:</p>
<script type="math/tex; mode=display">\vert\langle a,b\rangle\vert = \vert\cos(\theta)\vert\|a\|\|b\| \leq \|a\|\|b\|,</script>
<p>where $\theta=\measuredangle(a,b)$ is the angle between $a$ and $b$.</p>
<p>In several scenarios the dimension of the underlying vector space is very large or infinite; think of random matrix theory, where one typically looks at sequences of vector spaces with dimension going to infinity, or function spaces. We will treat these two cases as prototypical examples of <em>(I)</em> analysis in terms of some diverging parameter and <em>(II)</em> infinite dimensional spaces, where the correlation between values plays an important role.</p>
<h1 id="some-examples">Some examples</h1>
<h2 id="i-on-mathbb-cn-with-ngg-1-how-much-off-is-cauchy-schwarz-typically">(I) On $\mathbb C^n$ with $n\gg 1$, how much off is Cauchy-Schwarz “typically”?</h2>
<p>The intuition is that in higher dimensions it becomes increasingly unlikely for two vectors to be approximately parallel (i.e. $\theta\approx 0$ or $\theta\approx \pi$ in which case Cauchy-Schwarz is sharp) since there are “too many directions”. To answer question (I) it is instructive to consider a pair of independent vectors, chosen uniformly at random from the set of unit vectors.</p>
<p>By rotational symmetry we may fix $a$ to be $(1,0,0,\dots,0)$ and choose $b$ uniformly at random from the set of unit vectors. Clearly its distribution is invariant under permuting indices, hence we expect (because of the constraint of having norm 1) that each entry will be of order $n^{-\frac{1}{2}}$ and thus $\mathbb E\vert \langle a,b\rangle\vert =O(n^{-\frac{1}{2}})$.</p>
<p>While this tells us that Cauchy-Schwarz is off by a factor of $O(\sqrt n)$ (note that by assumption $|a||b|=1$ deterministically), it is not very illuminating. It is a good exercise to write out a proof not using the above symmetries – instead using the central limit theorem and the fact that entries are asymptotically independent as $n\to\infty$, one then notices that the missing factor of $O(\sqrt n)$ comes from the fact that Cauchy-Schwarz does not take into account any cancellations in the inner product.</p>
<h2 id="ii-how-much-off-is-mathbb-ef2leq-mathbb-ef2">(II) How much off is $(\mathbb Ef)^2\leq \mathbb E[f^2]$?</h2>
<p>First we establish how this relates to Cauchy-Schwarz. We consider (more general) inner products of the form $\langle f,g\rangle_\mu := \int_\mathbb{R} \bar f(x)g(x)d\mu(x)$, where $\mu$ is some probability measure on $\mathbb R$ and $f,g$ are square-integrable, complex-valued functions. Clearly we recover the usual Euclidean inner product (up to a factor) by choosing $\mu$ to be the normalised counting measure on ${1,2,\dots,n}\subseteq\mathbb R$. Now note that Cauchy-Schwarz gives us that</p>
<script type="math/tex; mode=display">\vert\langle \mathbf 1,f\rangle_\mu\vert^2 = \cos(\theta)^2 \langle f,f\rangle_\mu \leq \langle f,f\rangle_\mu, (*)</script>
<p>where $\mathbf 1$ is the constant 1 function. But note that the left hand side is just $(\mathbb E_\mu f)^2$, whereas the right hand side equals $\mathbb E_\mu [\vert f\vert^2]$, hence giving the claim in the heading.</p>
<p>It is hard to imagine angles between functions. So we give a different way to “derive” Cauchy-Schwarz here: Recall the formula giving the variance of a function in terms of expectations:</p>
<script type="math/tex; mode=display">\text{Var}_\mu f = \mathbb E_\mu [f^2]-(\mathbb E_\mu f)^2\geq 0.</script>
<p>Comparing with equation $(*)$ we thus get</p>
<script type="math/tex; mode=display">\vert\langle \mathbf 1,f\rangle_\mu\vert^2 = \langle f,f\rangle_\mu - \text{Var}_\mu f \leq \langle f,f\rangle_\mu.</script>
<p><u>Side remark (a curious identity):</u> Turning this “additive” bound into a “multiplicative” one (i.e. $\langle \mathbf 1,f\rangle_\mu^2 = \langle f,f\rangle_\mu (1-\frac{Var_\mu f}{\langle f,f\rangle_\mu})$) and comparing with the cosine formulation from before, we notice (using $\sin^2+\cos^2 = \mathbf 1$) that $\frac{Var_\mu f}{\langle f,f\rangle_\mu} = \sin(\theta)^2$.</p>
<p>We conclude with the heuristic for a fixed(!) function $f:|f|_\mu=1$:</p>
<p>“The smaller its variance, the better the bound $(\mathbb Ef)^2\leq \mathbb E[f^2]$. But it still remains to see how big the variance “typically” is. Now this “typical” will depend on the application one has in mind; the analogy to the previous section would be to choose $f$ randomly as white noise, but this is not (most of the time) not what one has in mind when thinking about a “uniformly random” function. Depending on which field one comes from, the “canonical” choice might be chosen from a smoother family of functions, like Brownian motion on the interval $[0,1]$; it is not hard to convince oneself that “more smoothness” will typically decrease the (now itself random!) $\text{Var}_\mu f$. This can be made rigorous using <a href="https://en.wikipedia.org/wiki/Poincar%C3%A9_inequality#Poincar%C3%A9%E2%80%93Wirtinger_inequality">Poincaré-type inequalities</a>.</p>
Thu, 08 Aug 2019 15:14:05 +0100
http://peter.muehlbacher.me/math/2019/08/08/cauchy-schwarz/
http://peter.muehlbacher.me/math/2019/08/08/cauchy-schwarz/Notes<p>The contents have been moved to an <a href="https://github.com/petermuehlbacher/notes/blob/master/notes.ipynb">IPython notebook</a> for presentation purposes.</p>
Fri, 02 Dec 2016 16:46:05 +0000From Counting Trees to Bounding Eigenvalues<p><strong><em>Update (15.02.2018):</em></strong> <em>The paper has been published on <a href="http://arxiv.org/abs/1802.05175">arXiv</a>, slides of the talk can be found <a href="/files/talk.pdf">here</a>, the <code class="highlighter-rouge">.tex</code> file with comments for each slide <a href="/files/talk.tex">here</a>. They are more up-to-date than the material presented below.</em></p>
<hr />
<p>This article does by no means intend to present an entire proof. Instead the following
should be viewed as complementary material motivating the
most interesting steps, much like a talk would do.</p>
<h2 id="the-setting">The Setting</h2>
<p>The central object of interest is a (big) matrix <script type="math/tex">H=\frac{1}{\sqrt N}(X_{ij})_ {i,j=1}^N,</script>
where <script type="math/tex">X_{ij}</script> are complex, centered random variables that are independent bar
the constraint that <script type="math/tex">X_{ij}=\bar{X_{ji}}.</script> There is also a growth condition
on their moments, but in this expository article we try to focus on what is most
important and hence will not go into the more technical details for which this is needed.</p>
<p>Let furthermore <script type="math/tex">S=(\mathbb E |H_{ij}|^2)_ {i,j=1}^N</script> such that its entries are
of the same order of magnitude (in $N$).</p>
<p>Denote the largest eigenvalue of <script type="math/tex">H</script> by <script type="math/tex">\lambda_\text{max}</script>.</p>
<h2 id="eigenvalues--trace">Eigenvalues ↔ Trace</h2>
<p>While bounding <script type="math/tex">\mathbb E\lambda_\text{max}</script> seems hard, we could instead
relate it to an object like the trace we already know quite a bit about from
moment method/Stieltjes transform proofs.</p>
<p>To do that use Jensen to get <script type="math/tex">\mathbb E\lambda_\text{max} \leq \mathbb E(\lambda_\text{max}^{2k})^\frac{1}{2k}</script> for positive integers $k$.
Now since taking even powers makes all eigenvalues positive we have
<script type="math/tex">\mathbb E\lambda_\text{max}^{2k} \leq \mathbb E tr H^{2k}.</script></p>
<h2 id="trace--trees">Trace ↔ Trees</h2>
<p>Standard arguments (e.g. following <a href="http://cims.nyu.edu/~zeitouni/cupbook.pdf">Zeitouni at al.</a>, in particular the pages following the definition of a graph associated with an S-word) reveal that <script type="math/tex">\mathbb E tr H^{2k}\rightarrow \sum_{\pi\in T_k}\text{val}(\pi),</script> as <script type="math/tex">N\rightarrow\infty</script> where $T_k$ is the set of planar, ordered and rooted trees with <script type="math/tex">k</script> edges. We will refrain from giving a rigorous definition of <script type="math/tex">\text{val}(\pi)</script> here and instead illustrate what one should think of it with the following example:</p>
<p>Let <script type="math/tex">\pi</script> be the following tree:</p>
<p><img src="http://i.imgur.com/d08afISm.jpg" alt="tree with edges ab, bc, cd, be" title="tree with edges ab, bc, cd, be" /></p>
<p>Then <script type="math/tex">\text{val}(\pi) = \sum_{a,b,c,d,e=1}^N S_{ab}S_{bc}S_{cd}S_{be}.</script> Note that non-isomorphic trees (in general) do give different values! Also note that this “definition” is slightly different from the one in the paper, since the paper is tailored to an efficient and logically coherent presentation rather than introducing concepts as close as possible to already existing ones. Conceptually it doesn’t make a difference though.</p>
<h3 id="re-deriving-a-known-result">Re-deriving a Known Result</h3>
<p>In the “Stieltjes transform setting” one can use theorem 2.1 of <a href="https://arxiv.org/pdf/1506.05095v4.pdf">Erdös et al.</a> to establish the bound <script type="math/tex">\mathbb E\lambda_\text{max}\leq 2\|S\|^\frac{1}{2}</script>. This bound follows trivially from the above convergence result <script type="math/tex">\mathbb E tr H^{2k}\rightarrow \sum_{\pi\in T_k}\text{val}(\pi)</script> by taking suprema over all the <script type="math/tex">S</script> (i.e. <script type="math/tex">\sum_b\sup_a S_{ab} = \|S\|</script>).</p>
<h3 id="improving-the-result">Improving the Result</h3>
<p>The central idea of the paper is that we can actually bound the sums over the values of the trees more efficiently and in particular in a way that can be read off each summand’s tree representation.</p>
<p>Let us first show how we can do better in the case of the previous example:</p>
<script type="math/tex; mode=display">\sum_{a,b,c,d,e=1}^N S_{ab}S_{bc}S_{cd}S_{be}\leq N\sum_{b,c,d,e=1}^N \sup_a S_{ab}S_{bc}S_{cd}\times \sup_{\tilde b}S_{\tilde be} = N\|S^3\|\|S\|,</script>
<p>which, by submultiplicativity of the norm, is an improvement by a factor of <script type="math/tex">z_3 := \frac{\|S^3\|}{\|S\|^3}\leq 1</script> (note that we can get rid of the factor <script type="math/tex">N</script> with the help of the <script type="math/tex">2k</script>-th root - to let <script type="math/tex">N</script> and <script type="math/tex">k</script> go to infinity simultaneously requires some conditions on the moments of the random variables though). In a similar manner (group <script type="math/tex">ab,be</script> and <script type="math/tex">bc,cd</script> together) we also get the (in general) different upper bound <script type="math/tex">\|S^2\|^2</script>, giving an improvement by a factor of <script type="math/tex">z_2^2 := (\frac{\|S^2\|}{\|S\|^2})^2</script>. More generally, define <script type="math/tex">z_j := \frac{\|S^j\|}{\|S\|^j}</script> to be the improvement of a sequence of the form <script type="math/tex">S_{i_1i_2}S_{i_2i_3}\dots S_{i_{j-2}i_{j-1}}</script>.</p>
<h2 id="trees--dyck-paths">Trees ↔ Dyck Paths</h2>
<p>To generalize this we interpret this method of obtaining upper bounds in a more intuitive (i.e. graph(ical)) setting: Note that those sequences correspond to branches of length $j$ in the tree representation.
Now trees didn’t turn out to be the easiest representation to work with explicitly. When trying to write computer simulations to rule out some conjectures the bijection between ordered trees and Dyck paths turned out to be helpful and indeed, there is a nice interpretation of above-mentioned sequences of length <script type="math/tex">j</script> in terms of Dyck paths given by the following bijection:</p>
<ul>
<li>Start at the root of the tree.</li>
<li>Go around the tree clockwise, say.</li>
<li>Every time you go from a parent to its child add an “up” to the Dyck path, every time you go from a child to its parent add a “down” to the Dyck path.
<img src="http://i.imgur.com/mWwk4MJ.jpg" alt="illustration of the algorithm" title="tree with edges ab, bc, cd, be and corresponding Dyck path" /></li>
</ul>
<p>Now the “outermost” sequences translate into <script type="math/tex">j</script>-runs in the Dyck path. Note that not every sequence of length <script type="math/tex">j</script> has a corresponding <script type="math/tex">j</script>-run in the Dyck path obtained by this method, but this can both improve or worsen the bound (think about that!).</p>
<p>Rewrite the sum over trees</p>
<script type="math/tex; mode=display">\sum_{\pi\in T_k}\text{val}(\pi) = |T_k|\sum_{\pi\in T_k}\frac{\text{val}(\pi)}{|T_k|} = |T_k|\mathbb E\text{val},</script>
<p>where the expectation is not the one over the random entries of <script type="math/tex">H</script>, but over all trees equipped with the uniform measure. We might as well pull out the deterministic part <script type="math/tex">\|S\|^k</script> to get</p>
<script type="math/tex; mode=display">\sum_{\pi\in T_k}\text{val}(\pi) = \|S\|^k\mathbb E\prod_j z_j^{T_j},</script>
<p>where <script type="math/tex">T_j(\pi)</script> might count the <script type="math/tex">j</script>-up-runs, the <script type="math/tex">j</script>-down-runs or some combination of both in the Dyck path corresponding to the tree <script type="math/tex">\pi</script>. However, as it turns out we have to get a little bit more creative what to count if we want to obtain explicit results. To figure out what we want to count we first analyze the space of uniformly distributed Dyck paths to figure out what we <em>can</em> count.</p>
<h2 id="dyck-paths--random-walks">Dyck Paths ↔ Random Walks</h2>
<p>While “uniformly distributed” sounds like something that should be easy to analyze it’s actually a non-trivial task to get uniform samples in a systematic (and somewhat efficient) way. Fortunately computer scientists already had to devise means to do so and <a href="http://dl.acm.org/citation.cfm?id=357091">Arnold et al’s paper</a> provided the following handy formula for the probability of having to go up being at “time” <script type="math/tex">t</script> and height <script type="math/tex">h</script>:</p>
<script type="math/tex; mode=display">p_{t,h} := \frac{1}{2} \frac{h+2}{h+1} \frac{2k-t-h}{2k-t}</script>
<p>While this is slightly harder to analyze than, say, a fair coin toss (where <script type="math/tex">p_{t,h}</script> would simply be <script type="math/tex">1/2</script> for all <script type="math/tex">t,h</script>) it still has one important property: It is Markovian (though not time homogeneous). This allows for a discretisation of the time axis.</p>
<p>In the following it is helpful to think of this construction of uniform Dyck paths as a random walk in a potential supported in a triangle with repelling boundary effects.
Essentially the idea boils down to the following: In the regions where strong repulsion takes place (near <script type="math/tex">h=0</script> and <script type="math/tex">2k-t-h=0</script>) we have to get our hands dirty and prove everything “manually”, but everywhere else we can simply approximate it with constant probabilities.</p>
<p>In particular we have a strong upwards drift when we’re close to <script type="math/tex">h=0</script> and a strong downwards drift when we’re close to <script type="math/tex">2k-t-h</script> – play around a bit, taking different limits in <script type="math/tex">p_{t,h}</script>! E.g. <script type="math/tex">% <![CDATA[
p_{t,h}\approx \frac{1}{2}\frac{2k-t-h}{2k-t}<\frac{1}{2} %]]></script> for large <script type="math/tex">h</script> and <script type="math/tex">p_{t,h}\approx \frac{1}{2} \frac{h+2}{h+1}>\frac{1}{2}</script> for small <script type="math/tex">h</script>.</p>
<p><img src="http://i.imgur.com/gnv6gQ4m.jpg" alt="" title="Dyck path induced random walk landscape" /></p>
<p>So we would like to count up-runs if <script type="math/tex">h</script> is small and down-runs everywhere else. This (i.e. switching between what we count at different positions) introduces some problems of overcounting which can be overcome with the help of the <script type="math/tex">2k</script>-th root. So if we set <script type="math/tex">T_j</script> to count just that we end up with the problem of estimating <script type="math/tex">\mathbb E_\frac{1}{2} \prod_j z_j^{T_{j}}</script> instead, where <script type="math/tex">\mathbb E_\frac{1}{2}</script> is the measure over paths of length <script type="math/tex">n</script>, say, where at all times and heights the probability of going up is <script type="math/tex">1/2</script>.</p>
<p>If we want to get <script type="math/tex">\lim_{n\rightarrow\infty}(\mathbb E_\frac{1}{2} \prod_j z_j^{T_{j}})^\frac{1}{n}</script> (some simple calculations which can be found in the paper lead to this) we could of course try to compute <script type="math/tex">\mathbb E_\frac{1}{2} \prod_j z_j^{T_{j}}=: a_n</script> explicitly for large <script type="math/tex">n</script> and take the <script type="math/tex">n</script>-th root afterwards.</p>
<h2 id="random-stopping-times">Random Stopping Times</h2>
<p>A more elegant way of approaching this problem is to remember Cauchy-Hadamard’s theorem, relating <script type="math/tex">\lim_n \|a_n\|^{1/n}</script> to the radius of convergence of the power series <script type="math/tex">\sum_n a_n x^n</script>. We get this power series by introducing a random geometric stopping time <script type="math/tex">n^*</script> s.t. <script type="math/tex">\mathbb P(n^* = n)=(1-w)w^{n-1}</script> and taking expectation over this random stopping time:</p>
<script type="math/tex; mode=display">\mathbb E z^{T(n^* -1)} = (1-w)\sum_n w^n\mathbb E z^{U(n)},</script>
<p>where, by abuse of notation, we ignored dependence on <script type="math/tex">j</script>, highlighted the function’s dependence on the length of the path <script type="math/tex">n</script> by introducing it as a parameter and (on the l.h.s.) take expectation over both the space of paths and the random stopping time.</p>
<p>To determine the radius of convergence of <script type="math/tex">\mathbb E z^{T(n^* -1)}</script> we use the fact that we actually have a fairly explicit formula for it (following <a href="https://arxiv.org/pdf/1407.6831v1.pdf">Holst et al.</a>, theorem 1).</p>
<p>For further details feel free to take a look at the actual paper and drop me a line!</p>
Fri, 02 Dec 2016 16:46:05 +0000
http://peter.muehlbacher.me/math/2016/12/02/from-counting-trees-to-bounding-eigenvalues/
http://peter.muehlbacher.me/math/2016/12/02/from-counting-trees-to-bounding-eigenvalues/Setting up Anki to learn Chinese<p>To be fair I’d like to point out that this text will hardly contain any pieces of
information that cannot be found in the <a href="http://ankisrs.net/docs/manual.html">Anki manual</a>, Wikipedia and some other
sites. So if you’re interested and not too busy I’d recommend taking your time
and (at least) reading the manual.</p>
<h1 id="using-anki">Using Anki</h1>
<h3 id="basics">Basics</h3>
<p>Anki’s philosophy follows an object-oriented programming paradigm.
<strong>Notes</strong> are Anki’s equivalent of objects, storing the data using <strong>fields</strong> to store single pieces of information. To review you need <strong>cards</strong>, which are the visual representation of (selected subset of) a note. Do not store information in separate <strong>decks</strong> (like english-chinese, chinese-english) so that in case you need to edit something you only need to do it once.</p>
<h3 id="special-effects">“Special effects”</h3>
<ul>
<li>typing the answer: {{type:FIELDNAME}}</li>
<li>conditional statements: {{#FIELDNAME}}…{{/FIELDNAME}}</li>
<li>hints: {{hint:FIELDNAME}}</li>
<li>the odd one: {{cloze:FIELDNAME}}</li>
</ul>
<h3 id="reviewing">Reviewing</h3>
<p>To test e.g. only english-hanzi, do a logical search like “(card:3 or card:5) deck:中文” if you have the same cards (in particular in the same order - that’s what the number refers to) as mentioned below.</p>
<h1 id="setting-up-anki-for-chinese">Setting up Anki for Chinese</h1>
<p><a href="https://ankiweb.net/shared/info/3448800906">Chinese support</a> introduces (amongst others) the following fields:</p>
<ul>
<li>Hanzi</li>
<li>Meaning</li>
<li>Reading</li>
<li>Color</li>
<li>Sound</li>
<li>Traditional</li>
<li>Simplified</li>
</ul>
<p>I’d recommend also adding the following fields:</p>
<ul>
<li><em>Stroke Order Diagram</em></li>
<li><em>Sentence</em> (example sentences)</li>
<li><em>Similar</em> (Hanzi that look similar)</li>
</ul>
<p>As for the cards, I found the following to be helpful:</p>
<ol>
<li>Hanzi → Pinyin</li>
<li>Hanzi → Meaning</li>
<li>Meaning → Hanzi</li>
<li>Meaning → Pinyin</li>
<li>Listening → Hanzi</li>
<li>Listening → Meaning</li>
</ol>
<h1 id="does-japanese-help">Does Japanese help?</h1>
<p>Kind of. The “Traditional” field helps sometimes. See e.g. <a href="https://en.wikipedia.org/wiki/Kanji#Local_developments_and_divergences_from_Chinese">Wikipedia</a> on how Hanzi and Kanji diverged.</p>
Sun, 02 Oct 2016 12:49:05 +0100
http://peter.muehlbacher.me/languages/chinese/2016/10/02/setting-up-anki-for-chinese/
http://peter.muehlbacher.me/languages/chinese/2016/10/02/setting-up-anki-for-chinese/Making Deaf Dogs Hear Again<p>A word of warning: This is supposed to be a fun project.
It is neither very challenging from a mathematical point of view,
nor is it (including the name and introduction) to be taken seriously.</p>
<p>If anything, there are some considerations concerning the usability of programs
visualizing sound (by Markus Y.) and a coding exercise on fractals and
sound processing in Python.</p>
<p>The most <a href="https://github.com/petermuehlbacher/makedeafdogshearagain">up to date version</a> (implementation and a very concise explanation) can be found on Github.</p>
Sun, 19 Jun 2016 18:06:05 +0100
http://peter.muehlbacher.me/math/2016/06/19/making-deaf-dogs-hear-again/
http://peter.muehlbacher.me/math/2016/06/19/making-deaf-dogs-hear-again/Energy Landscapes of Spherical Spin Glasses<p>The most up to date version of my <a href="https://github.com/petermuehlbacher/thesis2/blob/master/thesis2.pdf">bachelor’s thesis</a> on this subject can be found on Github.</p>
<p>Now the title may beg the question why one should even be interested in the distribution of critical points
of some spin glass model. One possible answer is provided by LeCun’s paper on <a href="http://arxiv.org/pdf/1412.0233.pdf">The Loss Surfaces of Multilayer Networks</a>, which shows that, in the appropriate limit, the distribution of critical points (i.e. local/global minima) of deep neural networks are just those of spherical spin glasses.</p>
<p>This is interesting because it partly explains why neural networks work that well in practice where we cannot hope to find the global minimum and have to work with a local one: Indeed, it turns out that, again in the appropriate limit (roughly “the size of the network going to infinity”), “almost all” local minima have to be arbitrarily close to the global one. In other words: Finding a local minimum in a big neural network, there is a high probability that the corresponding configuration of parameters will perform about as well as the one of the global minimum (or even better because global minima tend to be prone to overfitting).</p>
<p>To be precise the results are not only about local minima, but generally about critical point of finite index. Hence these results are also interesting for numerical considerations (like which optimization scheme to pick), since they also give an idea how saddle points are distributed. At this point one should also mention <a href="http://arxiv.org/pdf/cond-mat/0611023v1.pdf">The statistics of critical points of Gaussian fields on large-dimensional spaces</a>, suggesting that the distribution of eigenvalues of the Hessian of a critical point is a shifted semicircle.</p>
<h1 id="abstract">Abstract</h1>
<p>In a recent paper Auffinger, Ben Arous and Černý gave an asymptotic evaluation of the complexity of spherical p-spin spin glass models via random matrix theory. This yields an interesting layered structure of the low critical values for the Hamiltonian of these models. This work aims to provide an overview of findings needed to prove and thoroughly understand the above mentioned results, omitting the more technical proofs to keep it compact.</p>
<p>Intermediary results of independent interest include Wigner’s semicircle law and various large deviation principles.</p>
Sat, 05 Mar 2016 08:07:05 +0000
http://peter.muehlbacher.me/math/2016/03/05/energy-landscapes-of-spherical-spin-glasses/
http://peter.muehlbacher.me/math/2016/03/05/energy-landscapes-of-spherical-spin-glasses/Calculations in the Schwarzschild Metric<h2 id="motivation">Motivation</h2>
<p>At some point in your life you may (or may not – in that case you can stop
reading here) want to calculate the propagation time
<!--(do the physicists really call it that way?)-->
of a photon along a path $\gamma$ in some potential
field, which is given by</p>
<script type="math/tex; mode=display">\int_\gamma ds,</script>
<p>where $ds$ is the part corresponding to the time component of the potential
field</p>
<script type="math/tex; mode=display">d\tau^2=ds^2-dl^2-dp^2,</script>
<p>where</p>
<ul>
<li>$\tau$ is the proper time, which is constant for particles moving at the speed of light,</li>
<li>$ds=f_t(\mathbf x)dt$ for $t$ being the time coordinate,</li>
<li>$dl=f_r(\mathbf x)dr$ for $r$ being the radial spatial coordinate and</li>
<li>$dp=f_\phi(\mathbf x)d\phi$ for $\phi$ being the longitude.</li>
</ul>
<p>In a general spherical coordinate system there is also the colatitude $\theta$,
but we will ignore this one henceforth since (in the case of a radially
symmetric potential field (i.e.
$f_i(\mathbf x)=f_i(r), i\in\{t,r,\phi\}$ for $\mathbf x=(r,\phi)$),
which will be assumed throughout this article from now on) the coordinate system
can be tilted in such a way that $\theta$ is constant for
geodesics without external influences like mirrors or more general any other
particles interacting with the photon in question.</p>
<h2 id="preliminary-results">Preliminary results</h2>
<p>When wanting to calculate something like $\int_\gamma ds$ several questions
arise:</p>
<ol>
<li>What is $\gamma$?</li>
<li>How does this $ds$ interact with the parameterization of $\gamma$?</li>
<li>Is the integral path independent?</li>
</ol>
<p>The first question will not be dealt with in detail here, but there are some
things worth pointing out (at least to non-physicists): First of all, given some
photon at position $\mathbf x(0)$ with initial velocity $\mathbf{\dot x}(0)$
a parameterization of its trajectory is given implicitly by the
“geodesic equations” (so-called “null-geodesics” in the case of a photon), which
are second order, non-linear differential equations for $\tau,t,r,\phi$ with
initial conditions given by $\mathbf x(0)$ and $\mathbf{\dot x}(0)$.
In particular, if you want to calculate something like $\int_A^B ds$ (which is
only suggestive as it is not well-defined unless we know that this integral is
path-independent!) you cannot just take $\gamma(t)$ to be $tA+(1-t)B$ in a
potential field induced by a non-Euclidean metric (e.g. the Schwarzschild
metric).</p>
<p>Concerning the second question: In general it will be hard to parameterize
$t$ with a path. Since we have initial conditions for the spatial coordinates,
the geodesic equations will yield solutions for those spatial coordinates, so
we would like to avoid any $dt$s and rather work with $dr$ and $d\phi$.
This can be achieved by noting that a particle travelling at the speed of light
does not experience proper time, implying that $d\tau\equiv 0$. Thus we can (and
usually do) substitute $ds$ by $\sqrt{dl^2+dp^2}$.</p>
<p>When it comes to path dependence we have to get a little bit more specific about
the functions $f_i(\mathbf x)=f_i(\mathbf x_r), i\in\{t,r,\phi\}$. Note that,
since $ds=f_tdt, \dots$, $ds$ is not an arbitrary measure, but only a
(potentially not translation-invariant) multiple of the Lebesgue measure. As a
result we can use the usual theory to determine if a line integral is
path-independent, which amounts to checking if the 1-form $ds=\sqrt{dl^2+dp^2}$
is exact (i.e. there exists an $F$, s.t.
$dF=\sum_i \frac{\partial F}{\partial x_i}dx_i = ds$).
<!--$ds=\sqrt{dl^2+dp^2}$ has an antiderivative and if there is an open, simply
connected set $U$, such that $\gamma\subset U$.-->
In general (i.e. for non trivial $dl,dp$ and non-trivial geodesics $\gamma$)
no such $F$ exists (which can also be seen by showing that $ds$ is not <a href="https://en.wikipedia.org/wiki/Closed_and_exact_differential_forms">closed</a>).
This brings us to the actual topic of this article:</p>
<h2 id="calculations-in-the-schwarzschild-metric">Calculations in the Schwarzschild Metric</h2>
<p>To do that we first characterize the Schwarzschild metric by writing out the
functions $f_i(\mathbf x), i\in\{t,r,\phi\}$:</p>
<script type="math/tex; mode=display">f_t((r,\phi))=\sqrt{1-\frac{r_s}{r}},</script>
<script type="math/tex; mode=display">f_r((r,\phi))=\frac{1}{\sqrt{1-\frac{r_s}{r}}},</script>
<script type="math/tex; mode=display">f_\phi((r,\phi))=r^2,</script>
<p>where the latter holds only under the assumption $\theta\equiv\pi/2$ and $r_s$
is a constant, called the “Schwarzschild radius”, that scales with the weight of
the object inducing this metric and the gravitational constant $G$.</p>
<p>We already noticed that for photons we have <script type="math/tex">ds=\sqrt{dl^2+dp^2},</script> but what does
$\int_\gamma\sqrt{dl^2+dp^2}$ mean? First of all we will rewrite this integral,
which is an integral over a 1-Form $da$ (mathematically speaking), as an integral
with a “real” integrand by using the parameterization of the path $\gamma$ and
noting that 1-Forms are functions mapping each point $x$ of the “original” space
$X$ ($\mathbb R^4$ in our case, but we will only use two components and thus
treat it as $\mathbb R^2$) to some dual element $da(\mathbf x)$ of the tangent space
$T_x^* $, such that</p>
<script type="math/tex; mode=display">da(\mathbf x)(\nabla\mathbf x)=\langle f_a(\mathbf x),\nabla\mathbf x\rangle,</script>
<p>for every $da$ of the form $da(x)=f_a(x)dx$.</p>
<p>Thus, in the Schwarzschild metric</p>
<script type="math/tex; mode=display">\int_\gamma ds =
\int_0^{L(\gamma)} \sqrt{
\langle f_r(\gamma(t)),\dot\gamma(t)\rangle^2+
\langle f_\phi(\gamma(t)),\dot\gamma(t)\rangle^2}dt =</script>
<script type="math/tex; mode=display">\int_0^{L(\gamma)} \sqrt{
\frac{1}{1-\frac{r_s}{\gamma_r(t)}}\dot\gamma_r(t)^2+
(\gamma_r(t)^2\dot\gamma_\phi(t))^2}dt</script>
<h2 id="intuition">Intuition</h2>
<p>For $d\tau\equiv 0$ we regain a positive definite metric by writing
$ds^2=dl^2+dp^2$, which looks pretty similar to the Euclidean case
$dl^2=dx^2+dy^2$, where $dl$ is the “infinitesimally small length element”.
The difference is that the Lebesgue measure is translation-invariant, i.e.
$dx(\mathbf x)\equiv dx$ and $dy(\mathbf x)\equiv dy$ while, in general
$dl(\mathbf x)\neq dl(\mathbf x’)$ and $dp(\mathbf x)\neq dp(\mathbf x’)$.</p>
<p>So it is reasonable to think of $f_r,f_\phi$ as non translation-invariant
weighting factors in the Euclidean metric.</p>
Tue, 15 Dec 2015 14:23:05 +0000
http://peter.muehlbacher.me/physics/math/2015/12/15/calculations-in-the-schwarzschild-metric/
http://peter.muehlbacher.me/physics/math/2015/12/15/calculations-in-the-schwarzschild-metric/Confounding the Years<h2 id="年内-and-年中">年内 and 年中</h2>
<h3 id="年ねん内ないwithin-a-year"><ruby>年<rt>ねん</rt>内<rt>ない</rt></ruby>—within a year</h3>
<p>(Maybe) counterintuitively this doesn’t mean anything like “in this/a year”,
but rather <strong>within the/a year</strong> or <strong>by the end of the year</strong>.
It can be further modified by adding a number in front like in:</p>
<blockquote>
<p>この<ruby>仕<rt>し</rt>事<rt>ごと</rt></ruby>は２
<ruby>年<rt>ねん</rt>内<rt>ない</rt></ruby>に<ruby>終<rt>お</rt></ruby>わらせなければならない。
<br />This work must be finished within two years.</p>
</blockquote>
<h3 id="年ねん中じゅうthe-whole-year"><ruby>年<rt>ねん</rt>中<rt>じゅう</rt></ruby>—the whole year</h3>
<p>Again, this one is kind of counterintuitive as it doesn’t have anything to do
with the “middle of the year”, but <strong>the whole year</strong>.</p>
<h2 id="年来-and-来年">年来 and 来年</h2>
<h3 id="年ねん来らいlongstanding"><ruby>年<rt>ねん</rt>来<rt>らい</rt></ruby>—longstanding</h3>
<p>This means <strong>for some years</strong> and can be specified by adding a number in front
of it like:</p>
<blockquote>
<p>５年来
<br />for five years</p>
</blockquote>
<h3 id="来らい年ねんnext-year"><ruby>来<rt>らい</rt>年<rt>ねん</rt></ruby>—next year</h3>
<p>Now that one is a little bit nicer than the rest since it can be translated word
by word, since next+year is <strong>next year</strong>.</p>
<h2 id="先年-and-去年">先年 and 去年</h2>
<h3 id="先せん年ねんformer-years"><ruby>先<rt>せん</rt>年<rt>ねん</rt></ruby>—former years</h3>
<p>That’s a particularly nasty one: Previous+year does not mean previous
year which would not seem to be too far-fetched after learning 先月=last month),
instead it means <strong>former years</strong> or <strong>a few years ago</strong>.</p>
<h3 id="去きょ年ねんlast-year"><ruby>去<rt>きょ</rt>年<rt>ねん</rt></ruby>—last year</h3>
<p>At least past+year means <strong>last year</strong>. There is still hope!</p>
<h3 id="近きん年ねん"><ruby>近<rt>きん</rt>年<rt>ねん</rt></ruby></h3>
<p>Another “exception” that doesn’t really fit into this article’s structure but
will be pointed out nonetheless is 近年, meaning <strong>recent years</strong>. I find it a
little bit confusing because of 近日, which does not mean recently, but soon.</p>
Fri, 06 Nov 2015 23:12:05 +0000
http://peter.muehlbacher.me/languages/japanese/2015/11/06/confounding-the-years/
http://peter.muehlbacher.me/languages/japanese/2015/11/06/confounding-the-years/MPGP 2015<p>Warning: The descriptions may be totally off and contain tons of errors since there was at most one talk that I’d feel somewhat comfortable talking about myself.</p>
<ul>
<li><strong><em>Existence/uniqueness for a crystalline curvature flow</em></strong> (Antonin Chambolle): Proves existence with some level set methods that looked interesting; however, I have to admit I couldn’t really follow the talk when it became technical (which was pretty early on)</li>
<li><strong><em>Numerical Approximation of Interacting Particles in Lipid Membranes</em></strong> (Ralf Kornhuber): There were different approaches on how to deal with particles (e.g. receptors/proteins in general) that are in a membrane and how their orientation, rotation, etc. affects the PDEs’ domains and thus their behavior. Apparently this is pretty hard even after simplifying assumptions like only looking at a graph-like section of the membrane and neglecting some physical forces.</li>
<li><strong><em>An Adaptive Moving Mesh Method for Geometric Evolutions Laws and Bulk-Surface PDEs</em></strong> (John MacKenzie): This talk was pretty “FEM-heavy” and the one of the few things I understood was that one shouldn’t naively choose the number of grid points on a surface proportional to its curvature since some serious bunching occurs. To avoid that (I’m not sure if that was the speaker’s idea, but he talked about it quite a bit) one introduces some kind of floor function for the mesh node distributions which yields better numerical results.</li>
<li><strong><em>Numerical tools for inter- and extrapolation in the space of shells</em></strong> (Benedikt Wirth): Based on a scheme to evaluate Bezier curves (pointwise) given the concept of Bezier curves (and hypersurfaces in general) has been generalized to Riemannian manifolds, using the log map as distance function. After playing around a little bit one can simplify a lot (under the assumption of the hypersurface being at least C1).</li>
<li><strong><em>Geometric graph-based methods for high dimensions</em></strong> (Andrea Bertozzi): That one was particularly interesting since it combined common methods from PDEs and the spectral embedding framework from machine learning. I will try to read the related papers as soon as possible and write more about it then. As an intermediary comment: Some kind of mincut can be found by evolving a phase field model by the flow induced by the Ginzburg-Landau functional. The speaker tried a discretization and what is surprising is that after only very few iterations (~4 if I remember correctly) this method produces great results (~96% on the MNIST dataset, given only ~3%(???) as training data). [Warning: Numbers may be completely mixed up]</li>
<li><strong><em>Finite Elements on polytopic meshes and beyond</em></strong> (Andrea Cangiani) & <strong><em>A Stable Finite Element Approximation for the Dynamics of Fluidic Biomembranes</em></strong> (John Barrett): I’ll be honest and admit I didn’t understand anything of those two talks (very FEM-heavy).</li>
<li><strong><em>Shape control through active materials: some case studies with applications to manipulation and locomotion</em></strong> (Antonio DeSimone): Great and very engaging talk. Started off with wondering how snakes can move if the only forces they experience are friction, translated that into PDEs and showed how the spontaneous curvature results in directed movement. Then this has been used to analyze how certain cells are moving.</li>
<li><strong><em>Diffuse interface methods for bulk-surface variational problems</em></strong> (Martin Burger): Various smaller topics, mainly error analysis which was quite technical. Particular attention has been paid to low frequency perturbations and apparently their model is quite robust against those.</li>
<li><strong><em>Numerical methods for large bilayer bending problems</em></strong> (Sören Bartels): Given two surfaces on top of each other that react (i.e. expand) differently to some external input (heat, electricity) they looked for a model describing that kind of behaviour. If I remember correctly they looked for a model that didn’t produce “cat ears” when evolving a rectangular surface. Gamma convergence has been shown and their model has some Laplacian of curvature in it, so we get a 4th order, nonlinear system of PDEs which are generally not understood too well.</li>
<li><strong><em>Anisotropic mean curvature flow as a formal singular limit of the nonlinear bidomain and multidomain models</em></strong> (Giovanni Bellettini): I wouldn’t call it “technical”, but it was by far the most “mathematical” talk, in the sense of following a typical definition>theorem>proof style, so it is pretty hard to summarize.</li>
<li><strong><em>Minimising a relaxed Willmore functional for graphs subject to Dirichlet boundary conditions</em></strong> (Klaus Decklenick): Given some closed surface the Willmore functional (surface integral over the mean(?) curvature) can be shown to be something like 2π with some power depending on the dimension. In this scenario the speaker considered the case of Dirichlet boundary conditions (in particular not a closed surface anymore; closed in the sense of nonexisting boundary in the manifold’s induced topology) in the restriction to a graph-like setting and proved several estimates.</li>
<li><strong><em>A phase field model for Willmore’s energy with topological constraint</em></strong> (Patrick Dondl): Also a very engaging talk that basically dealt with the question on how to formulate some “topological invariance” (no splitting into disjoint subsets) of phase field models into a penalty functional.</li>
<li><strong><em>Mathematical models for tissue growth</em></strong> (John King): It was that day’s last talk and about 5-10 variables have been introduced in the first few slides so I didn’t really get anything.</li>
<li><strong><em>A coupled surface-Cahn-Hilliard bulk-diffusion system modelling lipid raft formation in cell membranes</em></strong> (Harald Garcke): This model is a funny one because many terms have been chosen such that it everything converges against something non trivial and physical interpretations have been found afterwards (at least that was my impression).</li>
<li><strong><em>On anisotropic Willmore Flow</em></strong> (Paola Pozzi): Similar to Bellettini’s talk this also made heavy use of dual maps in order to extend the definition of an anisotropic Willmore flow (for strictly convex unit balls in the metric induced by some Minkowski functional) in such a way that the dual of the unit ball’s surface behaves like an n-sphere, i.e. (for a prescribed volume) it minimizes the (generalized) Willmore functional. Doing FEMs with this generalized Willmore functional seems to be incredibly hard though because of many, many nonlinearities that occur.</li>
<li><strong><em>Finite element methods for the stochastic mean curvature flow</em></strong> (Xiaobing H. Feng): Additive and multiplicative perturbations (white noise independent or dependent on the solution, repectively) have been dealt with and some error bounds were established (uniformly in time and H^3 in space if I remember correctly). Apparently the Ito calculus doesn’t seem to be the “right” choice, instead the Stratonovich integral has been used (chain rule of ordinary calculus still holds). The problem with multiplicative white noise seems to be that it will change the sign and which makes error estimates pretty hard. From what I’ve understood one thus adds some term delta, so that f(u)(eps W’) becomes f(u)(delta+eps W’), proves that for delta going to zero this gives the correct solution and then proves error estimates only for eps being smaller than sqrt(2delta) or something like that.</li>
<li><strong><em>Coupled bulk-surface free boundary problems arising from a mathematical model
receptor ligand dynamics</em></strong> (Charlie Elliott): This talk started off with a video of yesterday’s bonfire night showing some politicians and popes getting incinerated. Followed up by some technical difficulties it got quite mathematical after all and again I couldn’t really follow since it already built on some prior work one should have been familiar with. However, it looked like it’d be very interesting if you could follow it.</li>
</ul>
Thu, 05 Nov 2015 23:05:05 +0000
http://peter.muehlbacher.me/math/physics/2015/11/05/mpgp-2015/
http://peter.muehlbacher.me/math/physics/2015/11/05/mpgp-2015/Protein Docking From Scratch<p>During the last semester I did a seminar on artificial intelligence but ended up
choosing protein docking as my topic. From a mathematical point of view it could
get interesting, however, just implementing the potential and its gradient
which, mathematically speaking, is pretty trivial was much more work than I
expected it to be and in <a href="https://github.com/petermuehlbacher/reports/blob/master/docking/paper/docking.pdf">my report for the seminar</a> I elaborated on those
obstacles ranging from really trivial, but time-devouring, ones like parsing
<a href="http://www.wwpdb.org/documentation/file-format">PDB files</a> to “harder” ones like determining parameters for the potential,
which is again an optimization problem. In this post I will give a brief
overview of the topics dealt with in the report.</p>
<h2 id="sections-that-might-be-interesting-for-people-who-want-to-get-into-protein-docking">Sections that might be interesting for people who want to get into protein docking</h2>
<h3 id="terminology">Terminology</h3>
<p>The first section introduces basic terminology so that one should be able to
get the gist of a text that deals with protein docking. It starts from defining
polypeptide chains and all the structures (primary, secondary, tertiary and
quaternary) and goes on to introduce the different coordinate systems that are
frequently being used to conclude with a definition of what protein docking is
about.</p>
<h3 id="lennard-jones-potential">Lennard-Jones potential</h3>
<p>In the second section the Lennard-Jones potential is introduced, a physical
interpretation and its feasibility is given and its parameter estimation is
discussed. I think the latter is particularly interesting because even though I
knew I was only dealing with <em>models</em> of reality and not “reality itself” I
thought that, for example, setting the parameter accounting for the
Pauli-repulsion to the value the physicists “derived” from experiments (or
calculations) would be fine. However, one should keep in mind this is only some
model which works best if fitted to the data, no matter if this fitting leads to
denying physicists’ intuition (e.g. two carbon atoms in two different positions
experiencing two different Pauli-repulsion forces). The underlying physics
serves for a good initial guess for the model, but does not have to be the final
one!</p>
<h2 id="sections-that-might-be-interesting-for-mathematicians">Sections that <em>might</em> be interesting for mathematicians</h2>
<h3 id="coordinate-conversion">Coordinate conversion</h3>
<p>People who are into numerical mathematics are very likely to tell you otherwise,
but in my opinion there is no really interesting math behind it.</p>
<h3 id="computation-of-the-gradient">Computation of the gradient</h3>
<p>Again, from a mathematical point of view the definition of a gradient is simple,
however, applied mathematicians dislike difference quotients not only because of
their inherent susceptibility to cancellation, but also because of the time they
take in the multivariate case for functions that take a lot of time or
computational resources to evaluate.</p>
<p>As a result automatic differentiation was introduced. The forward as well as the
backward mode are, based on <a href="http://solon.cma.univie.ac.at/numbook.txt">Neumaier’s book on numerical analysis</a>, analysed
in this section.
The backwards mode is particularly interesting as it is the way to go for
potentials depending on many inputs. Naturally it has different names in
different areas; the machine learning community for example loves to post like
three new “graphical tutorials” on backpropagation which is nothing but a
backwards automatic differentiation, which in turn is basically a clever use of
the chain rule.</p>
<p>At this point I would like to thank <a href="http://frankknox.harvard.edu/people/dougal-maclaurin">Dougal Maclaurin</a> and
<a href="http://people.seas.harvard.edu/~dduvenaud/">David Duvenaud</a> not only for writing the <a href="https://github.com/HIPS/autograd/">AutoGrad library</a> (a Python
implementation for backwards automatic differentiation), but also for their
quick support and helping me out when I encountered difficulties using it.</p>
<h2 id="sections-that-might-be-interesting-for-people-who-like-to-implement-stuff">Sections that might be interesting for people who like to implement stuff</h2>
<h3 id="implementation">Implementation</h3>
<p>This was definitely the part that took the most time and was the least
interesting (again, from a mathematical point of view). However, it may be
interesting as a starting point for people who want to play around with PDB
files because I found the documentation to be pretty horrible, so I tried to
summarize the parts that I needed here.</p>
<h3 id="results">Results</h3>
<p>I should really work on the results, but limited time and computational
resources were (and still are) a real showstopper. Maybe I will update it some
time, but honestly speaking (most of the) other tasks have a higher priority.</p>
Thu, 20 Aug 2015 22:45:00 +0100
http://peter.muehlbacher.me/chemistry/math/2015/08/20/protein-docking/
http://peter.muehlbacher.me/chemistry/math/2015/08/20/protein-docking/