Jekyll2021-09-16T12:24:17+00:00https://fragile-credences.github.io/feed.xmlfragile credencesTom Adamczewski's blogHow much of the fall in fertility could be explained by lower mortality?2021-08-05T00:00:00+00:002021-08-05T00:00:00+00:00https://fragile-credences.github.io/fertility-mortality<p><a href="https://ourworldindata.org/child-mortality#when-more-infants-survive-fertility-goes-down"><img src="../assets/images/ourworldindata_scatter-fertility-vs-infant-survival.png" alt="ourworldindata_scatter-fertility-vs-infant-survival" /></a></p>
<p>Many people think that lower child mortality causes fertility to decline.</p>
<p>One prominent theory for this relationship, as described by <a href="https://ourworldindata.org/child-mortality#when-more-infants-survive-fertility-goes-down">Our World in Data</a><sup id="fnref:quote-context" role="doc-noteref"><a href="#fn:quote-context" class="footnote" rel="footnote">1</a></sup>, is that “infant survival reduces the parents’ demand for children”<sup id="fnref:aside-1" role="doc-noteref"><a href="#fn:aside-1" class="footnote" rel="footnote">2</a></sup>. (Infants are children under 1 years old).</p>
<p>In this article, I want to look at how we can precisify that theory, and what magnitude the effect could possibly take. What fraction of the decline in birth rates could the theory explain?</p>
<p><strong>Important.</strong> I don’t want to make claims here about how parents <em>actually</em> make fertility choices. I only want to examine the implications of various models, and specifically how much of the observed changes in fertility the models could explain.</p>
<h2 id="constant-number-of-children">Constant number of children</h2>
<p>One natural interpretation of “increasing infant survival reduces the parents’ demand for children” is that parents are adjusting the number of births to keep the number of surviving children constant.</p>
<p>Looking at Our World in Data’s graph, we can see that in most of the countries depicted, the infant survival rate went from about 80% to essentially 100%. This is a factor of 1.25. Meanwhile, there were 1/3 as many births. If parents were adjusting the number of births to keep the number of surviving children constant, the decline in infant mortality would explain a change in births by a factor of 1/1.25=0.8, a -0.2 change that is only <strong>30%</strong> of the -2/3 change in births.</p>
<p>The basic mathematical reason this happens is that even when mortality is tragically high, the survival rate is still thankfully much closer to 1 than to 0, so even a very large proportional fall in mortality will only amount to a small proportional increase in survival.</p>
<p>Some children survive infancy but die later in childhood. Although Our World in Data’s quote focuses on infant mortality, it makes sense to consider older children too. I’ll look at under-5 mortality, which generally has better data than older age groups, and also captures a large fraction of all child mortality<sup id="fnref:over-5-data" role="doc-noteref"><a href="#fn:over-5-data" class="footnote" rel="footnote">3</a></sup>.</p>
<h3 id="england-1861-1951">England (1861-1951)</h3>
<p>England is a country with an early demographic transition and good data available.</p>
<p><a href="https://link.springer.com/content/pdf/10.1007/s00148-004-0208-z.pdf">Doepke 2005</a> quotes the following numbers:</p>
<table>
<thead>
<tr>
<th> </th>
<th>1861</th>
<th>1951</th>
</tr>
</thead>
<tbody>
<tr>
<td>Infant mortality</td>
<td>16%</td>
<td>3%</td>
</tr>
<tr>
<td>1-5yo mortality</td>
<td>13%</td>
<td>0.5%</td>
</tr>
<tr>
<td>0-5 yo mortality</td>
<td>27%</td>
<td>3.5%</td>
</tr>
<tr>
<td><strong>Survival to 5 years</strong></td>
<td><strong>73%</strong></td>
<td><strong>96.5%</strong></td>
</tr>
</tbody>
<tbody>
<tr>
<td>Fertility</td>
<td>4.9</td>
<td>2.1</td>
</tr>
</tbody>
</table>
<p>Fertility fell by 57%, while survival to 5 years rose by 32%. Hence, if parents aim to keep the number of surviving children constant, the change in child survival can <a href="https://docs.google.com/spreadsheets/d/1vsQLOVcay-nYTfEZFVST4yO3NiETo2EoafF5PybGBd4/edit#gid=0&range=D45">explain <strong>43%</strong></a><sup id="fnref:file" role="doc-noteref"><a href="#fn:file" class="footnote" rel="footnote">4</a></sup> of the actual fall in fertility. (It would have explained only 23% had we erroneously considered only the change in infant survival.)</p>
<h3 id="sub-saharan-africa-1990-2017">Sub-Saharan Africa (1990-2017)</h3>
<p>If we look now at sub-Saharan Africa data from the World Bank, the 1990-2017 change in fertility is from 6.3 to 4.8, a 25% decrease, whereas the 5-year survival rate went from 0.82 to 0.92, a 12% increase. So the fraction of the actual change in fertility that could be explained by the survival rate is <strong>44%</strong>. (This would have been 23% had we looked only at infant survival).</p>
<iframe width="100%" height="371" seamless="" frameborder="0" scrolling="no" src="https://docs.google.com/spreadsheets/d/e/2PACX-1vQZrbi2ne3PKflmX_s_3iJ209viIQv23K5Ho5mHUZ8oRMwuf-e2Z2q7kfdX6NQESXtAkBlOqSk3GUzP/pubchart?oid=1122146329&format=interactive"></iframe>
<p><em><a href="https://docs.google.com/spreadsheets/d/1vsQLOVcay-nYTfEZFVST4yO3NiETo2EoafF5PybGBd4/edit#gid=0">Source data and calculations</a>. Chart not showing up? <a href="/assets/images/fertility.svg">Go to the <code class="language-plaintext highlighter-rouge">.svg</code> file.</a></em></p>
<p>So far, we have seen that this very simple theory of parental decision-making can explain 30-44% of the decline in fertility, while also noticing that considering childhood mortality beyond infancy was important to giving the theory its full due.</p>
<p>However, in more sophisticated models of fertility choices, the theory looks worse.</p>
<h2 id="a-more-sophisticated-model-of-fertility-decisions">A more sophisticated model of fertility decisions</h2>
<p>Let us imagine that instead of holding it constant, parents treat the number of surviving children as one good among many in an optimization problem.</p>
<p>An increase in the child survival rate can be seen as a decrease in the cost of surviving children. Parents will then substitute away from other goods and increase their target number of surviving children. If your child is less likely to die as an infant, you may decide to aim to have <em>more</em> children: the risk of experiencing the loss of a child is lower.<sup id="fnref:aside-99" role="doc-noteref"><a href="#fn:aside-99" class="footnote" rel="footnote">5</a></sup></p>
<p>For a more formal analysis, we can turn to the <a href="https://www.jstor.org/stable/pdf/1912563.pdf">Barro and Becker (1989)</a> model of fertility. I’ll be giving a simplified version of the presentation in <a href="https://link.springer.com/content/pdf/10.1007/s00148-004-0208-z.pdf">Doepke 2005</a>).</p>
<p>In this model, parents care about their own consumption as well as their number of surviving children. The parents maximise<sup id="fnref:uf" role="doc-noteref"><a href="#fn:uf" class="footnote" rel="footnote">6</a></sup></p>
\[U(c,n) = u(c) + n^\epsilon V\]
<p>where</p>
<ul>
<li>\(n\) is the number of surviving children and \(V\) is the value of a surviving child</li>
<li>\(\epsilon\) is a constant \(\in (0,1)\)</li>
<li>\(u(c)\) is the part of utility that depends on consumption<sup id="fnref:uc" role="doc-noteref"><a href="#fn:uc" class="footnote" rel="footnote">7</a></sup></li>
</ul>
<p>The income of a parent is \(w\), and there is a cost per birth of \(p\) and an additional cost of \(q\) per surviving child<sup id="fnref:aside-2" role="doc-noteref"><a href="#fn:aside-2" class="footnote" rel="footnote">8</a></sup>. The parents choose \(b\), the number of births. \(s\) is the probability of survival of a child, so that \(n=sb\).</p>
<p>Consumption is therefore \(c=w-(p+qs)b\) and the problem becomes
\(\max_{b} U = u(w-(p+qs)b) + (sb)^\epsilon V\)</p>
<p>Letting \(b^{*}(s)\) denote the optimal number of births as a function of \(s\), what are its properties?</p>
<p>The simplest one is that \(sb^*(s)\), the number of <em>surviving</em> children, is increasing in \(s\). This is the substitution effect we described intuitively earlier in this section. This means that if \(s\) is multiplied by a factor \(x\) (say 1.25), \(b^*(s)\) will be multiplied <em>more than</em> \(1/x\) (more than 0.8).</p>
<p>When we looked at the simplest model, with a constant number of children, we guessed that it could explain 30-44% of the fall in fertility. That number is a <strong>strict upper bound</strong> on what the current model could explain.</p>
<p>What we really want to know, to answer the original question, is how \(b^*(s)\) itself depends on \(s\). To do this, we need to get a little bit more into the relative magnitude of the cost per birth \(p\) and the additional cost \(q\) per surviving child. As Doepke writes,</p>
<blockquote>
<p>If a major fraction of the total cost of children accrues for every birth, fertility [i.e. \(b^*(s)\)] would tend to increase with the survival probability; the opposite holds if children are expensive only after surviving infancy<sup id="fnref:aside-3" role="doc-noteref"><a href="#fn:aside-3" class="footnote" rel="footnote">9</a></sup>.</p>
</blockquote>
<p>This tells us that falling mortality could actually cause fertility to <em>increase</em> rather than decrease.<sup id="fnref:p_q" role="doc-noteref"><a href="#fn:p_q" class="footnote" rel="footnote">10</a></sup></p>
<p>To go further, we need to plug in actual values for the model parameters. Doepke does this, using numbers that reflect the child mortality situation of England in 1861 and 1951, but also what seem to be some pretty arbitrary assumptions about the parent’s preferences (the shape of \(u\) and the value of \(\epsilon\)).</p>
<p>With these assumptions, he finds that “the total fertility rate falls from 5.0 (the calibrated target) to 4.2 when mortality rates are lowered to the 1951 level”<sup id="fnref:quote-context-2" role="doc-noteref"><a href="#fn:quote-context-2" class="footnote" rel="footnote">11</a></sup>, a 16% decrease. This represents is <strong>28%</strong> of the actually observed fall in fertility to 2.1.</p>
<h3 id="extensions-of-barro-becker-model">Extensions of Barro-Becker model</h3>
<p>The paper then considers various extensions of the basic Barro-Becker model to see if they could explain the large decrease in fertility that we observe.</p>
<p>For example, it has been hypothesized that when there is <em>uncertainty</em> about whether a child will survive (hitherto absent from the models), parents want to avoid the possibility of ending up with zero surviving children. They therefore have many children as a precautionary measure. Declining mortality (which reduces uncertainty since survival rates are thankfully greater than 0.5) would have a strong negative impacts on births.</p>
<p>However, Doepke also considers a third model, that incorporates not only stochastic mortality but also sequential fertility choice, where parents may condition their fertility decisions on the observed survival of children that were born previously. The sequential aspect reduces the uncertainty that parents face over the number of surviving children they will end up with.</p>
<p>The stochastic and sequential models make no clear-cut predictions based on theory alone. Using the England numbers, however, Doepke finds a robust conclusion. In the stochastic+sequential model, for almost all reasonable parameter values, the expected number of surviving children still increases with \(s\) (my emphasis):</p>
<blockquote>
<p>To illustrate this point, let us consider the extreme case [where] utility from consumption is close to linear, while risk aversion with regards to the number of surviving children is high. … [W]hen we move (with the same parameters) to the more realistic sequential model, where parents can replace children who die early, … despite the high risk aversion with regards to the number of children, total fertility drops only to 4.0, and <strong>net fertility rises</strong> to 3.9, just as with the benchmark parameters. … Thus, in the sequential setup the conclusion that mortality decline raises net fertility is robust to different preference specifications, even if we deliberately emphasize the precautionary motive for hoarding children.</p>
</blockquote>
<p>So even here, the fall in mortality would only explain 35% of the actually observed change in fertility. It seems that the ability to “replace” children who did not survive in the sequential model is enough to make its predictions pretty similar to the simple Barro-Becker model.</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:quote-context" role="doc-endnote">
<p>The quote in context on Our World in Data’s <a href="https://ourworldindata.org/child-mortality#when-more-infants-survive-fertility-goes-down">child mortality page</a>: “the causal link between infant [<1 year old] survival and fertility is established in both directions: Firstly, increasing infant survival reduces the parents’ demand for children. And secondly, a decreasing fertility allows the parents to devote more attention and resources to their children.” <a href="#fnref:quote-context" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:aside-1" role="doc-endnote">
<p>As an aside, my impression is that if you asked an average educated person “Why do women in developing countries have more children?”, their first idea would be: “because child mortality is higher”. It’s almost a trope, and I feel that it’s often mentioned pretty glibly, without actually thinking about the decisions and trade-offs faced by the people concerned. That’s just an aside though – the theory clearly has prima facie plausibility, and is also cited in serious places like academia and Our World in Data. It deserves closer examination. <a href="#fnref:aside-1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:over-5-data" role="doc-endnote">
<p>It should be possible to conduct the Africa analysis for different ages using <a href="http://ghdx.healthdata.org/gbd-results-tool">IMHE</a>’s more granular data, but it’s a bit more work. (There appears to be no direct data on deaths <em>per birth</em> as opposed to per capita, and data on fertility is contained in a different dataset from the main Global Burden of Disease data.) <a href="#fnref:over-5-data" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:file" role="doc-endnote">
<p>All things decay. Should this Google Sheets spreadsheet become inaccessible, you can download <a href="/assets/files/fertility-mortality.xlsx">this <code class="language-plaintext highlighter-rouge">.xlsx</code> copy</a> which is stored together with this blog. <a href="#fnref:file" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:aside-99" role="doc-endnote">
<p>In this light, we can see that the constant model is not really compatible with parents viewing additional surviving children as a (normal) good. Nor of course is it compatible with viewing children as a bad, for then parents would choose to have 0 children. Instead, it could for example be used to represent parents aiming for a socially normative number of surviving children. <a href="#fnref:aside-99" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:uf" role="doc-endnote">
<p>I collapse Doepke’s \(\beta\) and \(V\) into a single constant \(V\), since they can be treated as such in Model A, the only model that I will present mathematically in this post. <a href="#fnref:uf" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:uc" role="doc-endnote">
<p>Its actual expression, that I omit from the main presentation for simplicity, is \(u(c)=\frac{c^{1-\sigma}}{1-\sigma}\), the constant relative risk-aversion utility function. <a href="#fnref:uc" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:aside-2" role="doc-endnote">
<p>There is nothing in the model that compels us to call \(p\) the “cost per birth”, this is merely for ease of exposition. The model itself only assumes that there are two periods for each child: in the first period, costing \(p\) to start, children face a mortality risk; and in the second period, those who survived the first face zero mortality risk and cost \(q\). <a href="#fnref:aside-2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:aside-3" role="doc-endnote">
<p>Once again, Doepke calls the model’s early period “infancy”, but this is not inherent in the model. <a href="#fnref:aside-3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:p_q" role="doc-endnote">
<p>It’s difficult to speculate about the relative magnitude of \(p\) and \(q\), especially if, departing from Doepke, we make the early period of the model, say, the first 5 years of life. If the first period is only infancy, it seems plausible to me that \(q \gg p\), but then we also fail to capture any deaths after infancy. On the other hand, extending the early period to 5 incorrectly assumes that parents get no utility from children before they reach the age of 5. <a href="#fnref:p_q" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:quote-context-2" role="doc-endnote">
<p>The following additional context may be helpful to understand this quote:</p>
<blockquote>
<p>The survival parameters are chosen to correspond to the situation in England in 1861 . According to Perston et al. (1972) the infant mortality rate (death rate until first birthday) was \(16 \%\), while the child mortality rate (death rate between first and fifth birthday) was \(13 \%\). Accordingly, I set \(s_{i}=0.84\) and \(s_{y}=0.87\) in the sequential model, and \(s=s_{i} s_{y}=0.73\) in the other models. Finally, the altruism factor \(\beta\) is set in each model to match the total fertility rate, which was \(4.9\) in 1861 (Chenais 1992). Since fertility choice is discrete in Models B and C, I chose a total fertility rate of \(5.0\) as the target.</p>
<p>Each model is thus calibrated to reproduce the relationship of fertility and infant and child mortality in 1861 . I now examine how fertility adjusts when mortality rates fall to the level observed in 1951 , which is \(3 \%\) for infant mortality and \(0.5 \%\) for child mortality. The results for fertility can be compared to the observed total fertility rate of \(2.1\) in 1951 .</p>
<p>In Model A (Barro-Becker with continuous fertility choice), the total fertility rate falls from \(5.0\) (the calibrated target) to \(4.2\) when mortality rates are lowered to the 1951 level. The expected number of surviving children increases from \(3.7\) to \(4.0\). Thus, there is a small decline in total fertility, but (as was to be expected given Proposition 1) an increase in the net fertility rate.</p>
</blockquote>
<p><a href="#fnref:quote-context-2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>The special case of the normal likelihood function2021-07-31T00:00:00+00:002021-07-31T00:00:00+00:00https://fragile-credences.github.io/bayes-normal-likelihood<p><strong>Summary<sup id="fnref:ack" role="doc-noteref"><a href="#fn:ack" class="footnote" rel="footnote">1</a></sup></strong>: <em>The likelihood function implied by an estimate \(b\) with standard deviation \(\sigma\) is the probability density function (PDF) of a \(\mathcal{N}(b,\sigma^2)\). Though this might sound intuitive, it’s actually a special case. If we don’t firmly grasp that it’s an exception, it can be confusing. In general, the likelihood function is not equal to any PDF.</em></p>
<p>Suppose that a study has the point estimator \(B\) for the parameter \(\Theta\). The study results are an estimate \(B=b\) (typically a regression coefficient), and an estimated standard deviation<sup id="fnref:standard-error" role="doc-noteref"><a href="#fn:standard-error" class="footnote" rel="footnote">2</a></sup> \(\hat{sd}(B)=s\).</p>
<p>In order to know how to combine this information with a prior over \(\Theta\) in order to update our beliefs, we need to know what is the <em>likelihood function</em> implied by the study. The likelihood function is the probability of observing the study data \(B=b\) given different values for \(\Theta\). It is formed from the probability of the observation that \(B=b\) conditional on \(\Theta=\theta\), but viewed and used as a function of \(\theta\) only<sup id="fnref:notation" role="doc-noteref"><a href="#fn:notation" class="footnote" rel="footnote">3</a></sup>:</p>
\[\mathcal{L}: \theta \mapsto P(B =b \mid \Theta = \theta)\]
<p>The event “\(B=b\)” is often shortened to just “\(b\)” when the meaning is clear from context, so that the function can be more briefly written \(\mathcal{L}: \theta \mapsto P(b \mid \theta)\).</p>
<p><strong>So, what is \(\mathcal{L}\)?</strong> In a typical regression context, \(B\) is assumed to be approximately normally distributed around \(\Theta\), due to the central limit theorem. More precisely, \(\frac{B - \Theta}{sd(B)} \sim \mathcal{N}(0,1)\), and equivalently \(B\sim \mathcal{N}(\Theta,sd(B)^2)\).</p>
<p>\(sd(B)\) is seldom known, and is often replaced with its estimate \(s\), allowing us to write \(B\sim \mathcal{N}(\Theta,s^2)\), where only the parameter \(\Theta\) is unknown<sup id="fnref:unknown-variance" role="doc-noteref"><a href="#fn:unknown-variance" class="footnote" rel="footnote">4</a></sup>.</p>
<p>We can plug this into the definition of the likelihood function:</p>
\[\mathcal{L}: \theta \mapsto P(b\mid \theta)= \text{PDF}_{\mathcal{N}(\theta,s^2)}(b) = {\frac {1}{s\sqrt {2\pi }}}\exp \left(-{\frac {1}{2}}\left({\frac {b-\theta
}{s
}}\right)^{2} \right)\]
<p>We could just leave it at that. \(\mathcal{L}\) is the function<sup id="fnref:distribution-function" role="doc-noteref"><a href="#fn:distribution-function" class="footnote" rel="footnote">5</a></sup> above, and that’s all we need to compute the posterior. But a slightly different expression for \(\mathcal{L}\) is possible. After factoring out the square,</p>
\[\mathcal{L}: \theta \mapsto {\frac {1}{s
{\sqrt {2\pi }}}}\exp \left(-{\frac {1}{2}} {\frac {(b-\theta)^2
}{s^2
}} \right),\]
<p>we make use of the fact that \((b-\theta)^2 = (\theta-b)^2\) to rewrite \(\mathcal{L}\) with the positions of \(\theta\) and \(b\) flipped:</p>
\[\mathcal{L}: \theta \mapsto {\frac {1}{s
{\sqrt {2\pi }}}}\exp \left(-{\frac {1}{2}}\left({\frac {\theta-b
}{s
}}\right)^{2} \right).\]
<p>We then notice that \(\mathcal{L}\) is none other than</p>
\[\mathcal{L}: \theta \mapsto \text{PDF}_{\mathcal{N}(b,s^2)}(\theta)\]
<p>So, for all \(b\) and for all \(\theta\), \(\mathcal{L}: \theta \mapsto \text{PDF}_{\mathcal{N}(\theta,s^2)}(b) = \text{PDF}_{\mathcal{N}(b,s^2)}(\theta)\).</p>
<p>The key thing to realise is that this is a special case due to the fact that the functional form of the normal PDF is invariant to substituting \(b\) and \(\theta\) for each other. For many other distributions of \(B\), we cannot apply this procedure.</p>
<p>This special case is worth commenting upon because it has personally lead me astray in the past. I often encountered the case where \(B\) is normally distributed, and I used the equality above without deriving it and understanding where it comes from. It just had a vaguely intuitive ring to it. I would occasionally slip into thinking it was a more general rule, which always resulted in painful confusion.</p>
<p>To understand the result, let us first illustrate it with a simple numerical example. Suppose we observe an Athenian man \(b=200\) cm tall. For all \(\theta\), the likelihood of this observation if Athenian men’s heights followed an \(\mathcal{N}(\theta,10)\) is the same number as the density of observing an Athenian \(\theta\) cm tall if Athenian men’s heights followed a \(\mathcal{N}(200,10)\)<sup id="fnref:neg" role="doc-noteref"><a href="#fn:neg" class="footnote" rel="footnote">6</a></sup>.</p>
<p><img src="../assets/images/density-likelihood.png" alt="" /></p>
<p><em>Graphical representation of \(\text{PDF}_{\mathcal{N}(\theta,10)}(200) = \text{PDF}_{\mathcal{N}(200,10)}(\theta)\)</em></p>
<p>When encountering this equivalence, you might, like me, sort of nod along. But puzzlement would be a more appropriate reaction. To compute the likelihood of our 200 cm Athenian under different \(\Theta\)-values, we can substitute a <em>totally different question</em>: “assuming that \(\Theta=200\), what is the probability of seeing Athenian men of different sizes?”.</p>
<p>The puzzle is, I think, best resolved by viewing it as a special case, an algebraic curiosity that only applies to some distributions. Don’t even try to build an intuition for it, because it does not generalise.</p>
<p>To help understand this better, let’s look at at a case where the procedure cannot be applied.</p>
<p>Suppose for example that \(B\) is binomially distributed, representing the number of successes among \(n\) independent trials with success probability \(\Theta\). We’ll write \(B \sim \text{Bin}(n, \theta)\).</p>
<p>\(B\)’s probability mass function is</p>
\[g: k \mapsto \text{PMF}_{\text{Bin}(n, \theta)}(k) = {n \choose k} \phi^k (1-\phi)^{n-k}\]
<p>Meanwhile, the likelihood function for the observation of \(b\) successes is</p>
\[\mathcal{M}: \phi \mapsto \text{PMF}_{\text{Bin}(n, \theta)}(b) = {n \choose b} \phi^b (1-\phi)^{n-b}\]
<p>To attempt to take the PMF \(g\), set its parameter \(\theta\) equal to \(b\), and obtain the likelihood function would not just give incorrect values, it would be a domain error. Regardless of how we set its parameters, \(g\) could never be equal to the likelihood function \(\mathcal{M}\), because \(g\) is defined on \(\{0,1,...,n\}\), whereas \(\mathcal{M}\) is defined on \([0,1]\).</p>
<p><img src="../assets/images/LikelihoodFunctionAfterHHT.png" alt="img" /></p>
<p><em>The likelihood function \(\mathcal{Q}: P_H \mapsto P_H^2(1-P_H)\) for the binomial probability of a biased coin landing heads-up, given that we have observed \(\{Heads, Heads, Tails\}\). It is defined on \([0,1]\). (The constant factor \(3 \choose 2\) is omitted, a common practice with likelihood functions, because these constant factors have no meaning and make no difference to the posterior distribution.)</em></p>
<p>It’s hopefully now quite intuitive that the case where \(B\) is normally distributed was a special case.<sup id="fnref:simple" role="doc-noteref"><a href="#fn:simple" class="footnote" rel="footnote">7</a></sup></p>
<p>Let’s recapitulate.</p>
<p>The likelihood function is the probability of \(b\mid\theta\) viewed as a function of \(\theta\) only. It is absolutely not a density of \(\theta\).</p>
<p>In the special case where \(B\) is normally distributed, we have the confusing ability of being able to express this function as if it were the density of \(\theta\) under a distribution that depends on \(b\).</p>
<p>I think it’s best to think of that ability as an algebraic coincidence, due to the functional form of the normal PDF. We should think of \(\mathcal{L}\) in the case where \(B\) is normally distributed as just another likelihood function.</p>
<p>Finally, I’d love to know if there is some way to view this special case as enlightening rather than just a confusing exception.</p>
<p>I believe that to say that a \(\text{PDF}_{\theta,\Gamma}(b)=\text{PDF}_{b,\Gamma}(\theta)\) (where \(\text{PDF}_{\psi,\Gamma}\) denotes the PDF of a distribution with one parameter \(\psi\) that we wish to single out and a vector \(\Gamma\) of other parameters), is equivalent to saying that the PDF is <em>symmetric around its singled-out parameter</em>. For example, a \(\mathcal{N}(\mu,\sigma^2)\) is symmetric around its parameter \(\mu\). But this hasn’t seemed insightful to me. Please write to me if you know an answer to this.</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:ack" role="doc-endnote">
<p>Thanks to Gavin Leech and Ben West for feedback on a previous versions of this post. <a href="#fnref:ack" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:standard-error" role="doc-endnote">
<p>I do not use the confusing term ‘standard error’, which I believe should mean \(sd(B)\) but is often also used to also denote its estimate \(s\). <a href="#fnref:standard-error" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:notation" role="doc-endnote">
<p>I use uppercase letters \(\Theta\) and \(B\) to denote random variables, and lower case \(\theta\) and \(b\) for particular values (realizations) these random variables could take. <a href="#fnref:notation" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:unknown-variance" role="doc-endnote">
<p>A more sophisticated approach would be to let \(sd(B)\) be another unknown parameter over which we form a prior; we would then update our beliefs jointly about \(\Theta\) and \(sd(B)\). See for example <a href="https://sci-hub.se/10.1002/9781118593165.ch17">Bolstad & Curran (2016), Chapter 17, “Bayesian Inference for Normal with Unknown Mean and Variance”</a>. <a href="#fnref:unknown-variance" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:distribution-function" role="doc-endnote">
<p>I don’t like the term “<a href="https://www.google.com/search?q=%22likelihood+distribution%22">likelihood distribution</a>”, I prefer “likelihood function”. In formal parlance, mathematical <a href="https://en.wikipedia.org/wiki/Distribution_(mathematics)">distributions</a> are a generalization of functions, so it’s arguably technically correct to call any likelihood function a likelihood distribution. But in many contexts, “distribution” is merely used as short for “probability distribution”. So “likelihood distribution” runs the risk of making us think of “likelihood <em>probability</em> distribution” – but the likelihood function is not generally a probability distribution. <a href="#fnref:distribution-function" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:neg" role="doc-endnote">
<p>We are here ignoring any inadequacies of the \(B\sim N(\Theta,s^2)\) assumption, including but not limited to the fact that one cannot observe men with negative heights. <a href="#fnref:neg" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:simple" role="doc-endnote">
<p>Another simple reminder that the procedure couldn’t possibly work in general is that in general the likelihood function is not even a PDF at all. For example, a broken thermometer that always gives the temperature as 20 degrees has \(P(B=20 \mid \theta) = 1\) for all \(\theta\), which evidently does not integrate to 1 over all values of \(\theta\).</p>
<p>To take a different tack, the fact that the likelihood function is <a href="http://theoryandpractice.org/stats-ds-book/distributions/invariance-of-likelihood-to-reparameterizaton.html#how-does-the-likelihood-transform-to-reparameterization">invariant to reparametrization</a> also illustrates that it is not a probability density of \(\theta\) (thanks to Gavin Leech for the link). <a href="#fnref:simple" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Summary1: The likelihood function implied by an estimate \(b\) with standard deviation \(\sigma\) is the probability density function (PDF) of a \(\mathcal{N}(b,\sigma^2)\). Though this might sound intuitive, it’s actually a special case. If we don’t firmly grasp that it’s an exception, it can be confusing. In general, the likelihood function is not equal to any PDF. Thanks to Gavin Leech and Ben West for feedback on a previous versions of this post. ↩How to circumvent Sci-Hub ISP block2021-05-15T00:00:00+00:002021-05-15T00:00:00+00:00https://fragile-credences.github.io/scihub-proxy<p><img src="/assets/images/sci-hub-proxy.png" alt="" /></p>
<p>In the UK, many internet service providers (ISPs) block <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5832410/">Sci-Hub</a>. However, a simple proxy is enough to circumvent this (you don’t even need a VPN). Routing requests through a suitable<sup id="fnref:bl" role="doc-noteref"><a href="#fn:bl" class="footnote" rel="footnote">1</a></sup> proxy lets you open Sci-Hub in your regular browser as if it weren’t blocked<sup id="fnref:reverse-lookup" role="doc-noteref"><a href="#fn:reverse-lookup" class="footnote" rel="footnote">2</a></sup>.</p>
<p>Routing all your traffic through a proxy may come with privacy and security concerns, and will slow your connection a bit. We want to use our proxy only for accessing Sci-Hub.</p>
<p>You can use extensions like <a href="https://chrome.google.com/webstore/detail/proxy-switchyomega/padekgcemlokbadohgkifijomclgjgif">ProxySwitchy</a> to tell your browser to automatically use certain proxies, or no proxy at all, for sets of websites that you define.</p>
<p>Unfortunately, this extension, and others like it, require permissions to insert arbitrary JavaScript into <em>any</em> page you visit (the web store accurately explains that the extension can “read and change all your data on the websites you visit”). That’s likely due to insufficiently granular permission definitions by Chrome, and is not the fault of the presumably well-intentioned extension authors. But it freaks me out a little bit (<a href="https://robertheaton.com/2018/07/02/stylish-browser-extension-steals-your-internet-history/">bad things have happened</a>).</p>
<p>Luckily, we can achieve the same effect by writing our own <em>proxy auto-configuration file</em>. A proxy auto-configuration or PAC file contains just a single JavaScript function like this:</p>
<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">function</span> <span class="nx">FindProxyForURL</span> <span class="p">(</span><span class="nx">url</span><span class="p">,</span> <span class="nx">host</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// Sci-Hub requests</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">shExpMatch</span><span class="p">(</span><span class="nx">host</span><span class="p">,</span> <span class="dl">'</span><span class="s1">sci-hub.se</span><span class="dl">'</span><span class="p">)</span> <span class="o">||</span> <span class="nx">shExpMatch</span><span class="p">(</span><span class="nx">host</span><span class="p">,</span><span class="dl">'</span><span class="s1">*.sci-hub.se</span><span class="dl">'</span><span class="p">))</span> <span class="p">{</span>
<span class="c1">// Your proxy address and port number</span>
<span class="k">return</span> <span class="dl">'</span><span class="s1">PROXY 123.456.789:9279</span><span class="dl">'</span><span class="p">;</span>
<span class="p">}</span>
<span class="c1">// All other requests</span>
<span class="k">return</span> <span class="dl">'</span><span class="s1">DIRECT</span><span class="dl">'</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>We can instruct the operating system to read this file. Search for instructions on Google (<a href="https://www.google.com/search?q=proxy+auto+configuration+file+mac+os">example</a>).</p>
<p>When you use the proxy to access Sci-Hub for the first time in a browser session, the browser will ask you for the username and password to your proxy server. If you’re using Chrome, I’d recommend saving the credentials into the browser’s password manager to avoid having to enter them again.<sup id="fnref:save-password" role="doc-noteref"><a href="#fn:save-password" class="footnote" rel="footnote">3</a></sup></p>
<p>There are many free proxies on the Internet, but I find that using the services of an actual for-profit proxy company is well worth it, for the greater speed and reliability. Currently <a href="https://www.webshare.io/proxy-server?referral_code=1uknewljmt9y">webshare.io</a> (referral link) offers 1 GB per month free, which is quite a lot of Sci-Hub PDFs. After that you can get 250 GB for $2.99 per month.<sup id="fnref:webshare" role="doc-noteref"><a href="#fn:webshare" class="footnote" rel="footnote">4</a></sup></p>
<h3 id="step-by-step-instructions">Step by step instructions</h3>
<ol>
<li><a href="https://www.webshare.io/?referral_code=1uknewljmt9y">Create an account on webshare.io</a> (referral link)</li>
<li>Choose a proxy from your list, and copy its address and port number into your PAC file, following the pattern above.</li>
<li>Set your operating system to read its proxy settings from this PAC file<sup id="fnref:proxyWIN" role="doc-noteref"><a href="#fn:proxyWIN" class="footnote" rel="footnote">5</a></sup>. Instructions for this are easy to Google (<a href="https://www.google.com/search?q=proxy+auto+configuration+file+mac+os">example</a>).</li>
<li>Open Sci-Hub in your browser. Enter your proxy username and password and optionally save these credentials in the browser. You can find the credentials in your webshare.io account.</li>
<li>Don’t forget to only use Sci-Hub to look at really old papers that have lapsed into the public domain :)</li>
</ol>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:bl" role="doc-endnote">
<p>Obviously, the proxy must not itself be on a network that blocks Sci-Hub. I have not come across any proxy that blocks Sci-Hub in this way. <a href="#fnref:bl" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:reverse-lookup" role="doc-endnote">
<p>Changing your DNS resolver to a public one <a href="https://developers.google.com/speed/public-dns/">like Google’s</a> instead of your ISP’s is not sufficient as of 2021, for two ISPs I’ve tested, and I suspect for all UK ISPs that implement blocking. (Many people believe changing the DNS resolver is sufficient. Probably ISPs used to implement simple DNS level blocking and have recently upped their game.) My guess is that instead of merely blocking the request to resolve <code class="language-plaintext highlighter-rouge">sci-hub.se</code> at the DNS resolver level, the ISPs are also doing a <a href="https://en.wikipedia.org/wiki/Reverse_DNS_lookup">reverse lookup</a> on every requested IP address to check whether it corresponds to a blacklisted domain. <a href="#fnref:reverse-lookup" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:save-password" role="doc-endnote">
<p>You want to use the Chrome password manager because third-party password managers such as 1Password are not able to auto-fill credentials when logging in to a proxy server (as opposed to logging into a webpage). Note that if you have a third-party password manager extension installed this will disable the browser setting “Offer to save passwords”. I recommend that you temporarily disable your password manager extension, log in to your proxy server, save the password into Chrome, and then enable the extension again. <a href="#fnref:save-password" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:webshare" role="doc-endnote">
<p>Their home page exemplifies a dark pattern by not showing the pricing by the GB; it just says you get ‘up to unlimited’ bandwidth. You’ll be able to see the actual pricing after you create an account. <a href="#fnref:webshare" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:proxyWIN" role="doc-endnote">
<p>One gotcha is that Windows 10 <a href="https://docs.microsoft.com/en-us/troubleshoot/browsers/cannot-read-pac-file">forces you</a> to call your PAC file from a web server; it cannot be a local file (??!). To work around this, you can upload your file as a <a href="https://gist.github.com/">Gist</a> and link to the <code class="language-plaintext highlighter-rouge">/raw</code>. <a href="#fnref:proxyWIN" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Modified respirator to shield myself and others from COVID2021-01-02T00:00:00+00:002021-01-02T00:00:00+00:00https://fragile-credences.github.io/covid-respirator<p><strong>Summary:</strong> <em>I have tried many types of masks and respirators during the 2020 pandemic. My recommendation is to use ‘elastomeric’ respirators common in industry, and to either filter or completely block off their exhalation valve. The result is a comfortable respirator that I believe offers a high level of protection against airborne diseases to myself and others. I am not an infectious disease expert.</em></p>
<h1 class="no_toc" id="contents">Contents</h1>
<ol id="markdown-toc">
<li><a href="#elastomeric-respirators" id="markdown-toc-elastomeric-respirators">Elastomeric respirators</a></li>
<li><a href="#cdc-recommendation-for-exhalation-valves" id="markdown-toc-cdc-recommendation-for-exhalation-valves">CDC recommendation for exhalation valves</a></li>
<li><a href="#recommendation-a-3m-6500ql-series-with-kn95-and-surgical-mask" id="markdown-toc-recommendation-a-3m-6500ql-series-with-kn95-and-surgical-mask">Recommendation A: 3M 6500QL Series with KN95 and surgical mask</a> <ol>
<li><a href="#choice-of-respirator" id="markdown-toc-choice-of-respirator">Choice of respirator</a></li>
<li><a href="#choice-of-filters-for-a-3m-respirator" id="markdown-toc-choice-of-filters-for-a-3m-respirator">Choice of filters for a 3M respirator</a></li>
</ol>
</li>
<li><a href="#recommmendation-b-miller-lpr-100-with-tape-and-surgical-mask" id="markdown-toc-recommmendation-b-miller-lpr-100-with-tape-and-surgical-mask">Recommmendation B: Miller LPR-100 with tape and surgical mask</a> <ol>
<li><a href="#how-we-need-to-modify-the-respirator" id="markdown-toc-how-we-need-to-modify-the-respirator">How we need to modify the respirator</a></li>
<li><a href="#exhalation-valve" id="markdown-toc-exhalation-valve">Exhalation valve</a></li>
<li><a href="#inhalation-valves" id="markdown-toc-inhalation-valves">Inhalation valves</a></li>
<li><a href="#also-use-a-surgical-mask" id="markdown-toc-also-use-a-surgical-mask">Also use a surgical mask</a></li>
<li><a href="#choice-of-respirator-1" id="markdown-toc-choice-of-respirator-1">Choice of respirator</a></li>
</ol>
</li>
<li><a href="#potential-concerns" id="markdown-toc-potential-concerns">Potential concerns</a> <ol>
<li><a href="#repeated-use" id="markdown-toc-repeated-use">Repeated use</a></li>
<li><a href="#cdc-guidelines" id="markdown-toc-cdc-guidelines">CDC guidelines</a></li>
<li><a href="#concerns-specific-to-tape-method" id="markdown-toc-concerns-specific-to-tape-method">Concerns specific to tape method</a> <ol>
<li><a href="#c02-rebreathing" id="markdown-toc-c02-rebreathing">C02 rebreathing</a></li>
<li><a href="#exhaling-through-filter" id="markdown-toc-exhaling-through-filter">Exhaling through filter</a></li>
<li><a href="#discussion-of-tape-technique-by-others" id="markdown-toc-discussion-of-tape-technique-by-others">Discussion of tape technique by others</a></li>
</ol>
</li>
</ol>
</li>
<li><a href="#overall-recommendation" id="markdown-toc-overall-recommendation">Overall recommendation</a></li>
</ol>
<h1 id="elastomeric-respirators">Elastomeric respirators</h1>
<p>The effectiveness of a mask can be broken down into two parts: how well the mask fits on your face, and the filtration efficiency of the mask. A further important consideration is comfort.</p>
<p>Surgical and cloth masks are comfortable but have poor fit and filtration efficiency. I believe it’s possible to do much better.</p>
<p>Respirators that meet the NIOSH <a href="https://en.wikipedia.org/wiki/NIOSH_air_filtration_rating">N95 or N99 standards</a> for filtration efficiency, such as the N95 respirators pictured below, are popular in healthcare settings.</p>
<p><img src="/assets/images/mask/3M_N95_Particulate_Respirator.jfif" width="50%" /><img src="/assets/images/mask/n95_sailor.webp" width="50%" /> <br />
<em>3M N95 respirator (left), N95 respirator in a medical setting (right)</em></p>
<p>In my experience, these have two main downsides:</p>
<ul>
<li>They may be scarce during pandemics. You should probably leave limited supplies for healthcare workers.</li>
<li>The tight elastic band that ensures a good fit also makes the respirators very uncomfortable for extended use.</li>
</ul>
<p>KN95 and KF95 are respectively the Chinese and Korean-manufactured masks that claim to have the same efficacy as N95s. They come with ear loops rather than behind-the-head elastic bands, so they have a far looser seal than N95s. I suppose you could add your own elastic bands to them to improve the seal, but then they would be just as uncomfortable as N95s. Therefore, they are not a competitive option.</p>
<p><img src="/assets/images/mask/kn95_stock.webp" width="50%" /><br />
<em>A KN95 mask</em></p>
<p>There also exist N95+ masks designed for industrial tasks that produce harmful airborne particles (such as welding or paint spraying). They are called elastomeric respirators, or sometimes industrial respirators.</p>
<p><img src="/assets/images/mask/miller_respirator_stock.webp" width="50%" /><img src="/assets/images/mask/3m-half-facepiece-reusable-respirator.webp" width="50%" />
<em>High filtration efficiency elastomeric respirators for industrial use. Miller LPR-100 (left), 3M 6200 (right).</em></p>
<p>Compared to healthcare N95s, these respirators:</p>
<ul>
<li>are more widely available</li>
<li>achieve superior fit by using elastomers shaped like a human face (there is no need to bend a metal nose bridge)</li>
<li>are <strong>much more comfortable</strong>, mainly because they:
<ul>
<li>spread the pressure onto a wider area of skin</li>
<li>come in multiple sizes</li>
<li>have adjustable straps</li>
</ul>
</li>
<li>won’t fog up your glasses</li>
</ul>
<p>A downside is that it’s more difficult to be audible through an elastomeric respirator than through an N95 or surgical mask. I am able to be understood by raising my voice, but smooth social interactions are not guaranteed. It’s probably not a great setup for spending time with your friends; you can use a KN95 for that.</p>
<p>The fatal flaw<sup id="fnref:fatal" role="doc-noteref"><a href="#fn:fatal" class="footnote" rel="footnote">1</a></sup> of these elastomerics when it comes to disease control is that they have an exhalation valve that allows unfiltered air to exit the mask. In PPE jargon, they do not provide source control. (This may be about to change in 2021, see this footnote<sup id="fnref:MSA" role="doc-noteref"><a href="#fn:MSA" class="footnote" rel="footnote">2</a></sup>. I will try to keep this post updated.)</p>
<video loop="" muted="" controls="" style="max-height: 75vh; max-width: 50%">
<source src="/assets/images/mask/valve.webm" type="video/webm" />
Your browser does not support the video tag.
</video>
<p><em>The exhalation valve opening on the Miller LPR-100</em></p>
<p>We can modify these respirators to filter their exhalation valve (recommendaton A), or completely close it off (recommendation B).</p>
<p>(If infection through the mucosal lining of the eyes is an important concern to you, and you don’t wear glasses, you should also wear safety googgles.)</p>
<h1 id="cdc-recommendation-for-exhalation-valves">CDC recommendation for exhalation valves</h1>
<p>During the 2020 pandemic, the US CDC issued the following recommendation, in a <a href="https://blogs.cdc.gov/niosh-science-blog/2020/09/08/source-control/">blog post</a> from August 8 2020<sup id="fnref:cdc" role="doc-noteref"><a href="#fn:cdc" class="footnote" rel="footnote">3</a></sup>:</p>
<blockquote>
<p>If only a respirator with an exhalation valve is available and source control is needed, cover the exhalation valve with a surgical mask, procedure mask, or a cloth face covering that does not interfere with the respirator fit.</p>
</blockquote>
<h1 id="recommendation-a-3m-6500ql-series-with-kn95-and-surgical-mask">Recommendation A: 3M 6500QL Series with KN95 and surgical mask</h1>
<p>If you want to follow something similar to CDC guidance, I recommend:</p>
<ul>
<li>A <a href="https://www.3m.com/3M/en_US/worker-health-safety-us/personal-protective-equipment/half-face-respirator/">3M 6500QL series</a> respirator</li>
<li>A part of a KN95/KF95 mask tightly covering the exhalation valve</li>
</ul>
<p><img src="/assets/images/mask/3m_original.webp" width="32%" /><img src="/assets/images/mask/3m_modified_0.webp" width="32%" /><img src="/assets/images/mask/3m_modified_1.webp" width="32%" /><br />
<em>3M 6502QL. Unmodified (left), KN95 material covering valve (middle and right)</em></p>
<p>You’ll likely want to add a surgical mask on top of that:</p>
<ul>
<li>as a backup</li>
<li>for the very small amount of additional filtration it provides</li>
<li>to avoid misunderstandings with strangers</li>
</ul>
<p><img src="/assets/images/mask/3m_modified_surgical.webp" width="33%" /><br />
<em>3M 6502QL with KN95 material covering valve and a surgical mask on top</em></p>
<p>Surgical masks are not primarily designed to filter aerosols<sup id="fnref:surgical_fda" role="doc-noteref"><a href="#fn:surgical_fda" class="footnote" rel="footnote">4</a></sup>. It seems clear to me that KN95s and KF95s are superior to a surgical mask for covering an exhalation valve (let a alone a cloth mask). (There is a <a href="https://www.fda.gov/medical-devices/coronavirus-disease-2019-covid-19-emergency-use-authorizations-medical-devices/personal-protective-equipment-euas">list</a> of such respirators that have received an emergency use authorization from the FDA. There are probably many low-quality masks fraudulently marketed as KN95 and KF95 at the moment, so make sure you buy from an approved manufacturer.)</p>
<p>In the models I have seen, the material in KN95s is far more flexible than in N95s, making allowing you to shape it so that it tightly covers an exhalation valve. It’s slightly fiddly but definitely possible with a bit of dexterity and perseverance. Using a thinner surgical mask would be easier; but the KN95’s extra protection for third parties is well worth it.</p>
<p>Here are the steps you should follow (see video):</p>
<ul>
<li>cut a KN95 in half along the fold</li>
<li>cut one half to size further</li>
<li>use a rubber band and tape to attach the material over the respirator valve. This is better explained with a video than in words. The main thing to know is that you should use the two small ridges in the plastic below the valve to secure the rubber band.</li>
<li>add tape on the upper end of the KN95 material</li>
</ul>
<video controls="" style="max-height: 100vh; max-width:100%">
<source src="/assets/images/mask/instructions.webm" type="video/webm" />
Your browser does not support the video tag.
</video>
<p><em>Instructions</em></p>
<p>Unfortunately, for any valve covering approach, there is a trade-off between a fit and the surface area usable for exhale filtration. I have not been able to achieve a good fit when placing a KN95 or surgical mask more loosely over the valve, which would give more surface area. In my setup a small rectangle of KN95 has to do all the filtration, which likely lowers the efficiency. However, the N95 specification is for a flow rate of 85 liters per minute, which is many times the 6 liters per minute breathed by an individual at rest<sup id="fnref:flowrate" role="doc-noteref"><a href="#fn:flowrate" class="footnote" rel="footnote">5</a></sup>, so I am not very concerned.</p>
<p><img src="/assets/images/mask/ridges.webp" width="50%" /><br />
<em>Ridges on 6502QL. Note that in the real setup the KN95 will go below the elastic band.</em></p>
<p><img src="/assets/images/mask/3m_modified_valve_rectangle.webp" width="50%" /><br />
<em>Location of the valve underneath the KN95 material. View form below the respirator.</em></p>
<h2 id="choice-of-respirator">Choice of respirator</h2>
<p>I have tried two industrial respirator models, the <a href="https://www.google.com/search?q=3M%206502QL">3M 6502QL</a> and the <a href="https://www.google.com/search?q=miller+lpr-100">Miller LPR-100</a>. I prefer the build quality and aesthetics of the Miller (see <a href="#choice-of-respirator-1">below</a>), but its shape makes it almost impossible to get a good seal if you attempt to cover the valve with a surgical mask or KN95. So for this technique, I recommend the <a href="https://www.google.com/search?q=3M+6500QL+series">3M 6500QL series</a>.</p>
<p>I am aware of three 3M half-facepiece reusable respirator groups, the 6000 series, the 6500 series, and the 7500 series.</p>
<p><img src="/assets/images/mask/3m_lineup.webp" />
<em>3M half-facepiece reusable respirators (<a href="https://www.3m.com/3M/en_US/company-us/all-3m-products/~/All-3M-Products/Personal-Protective-Equipment/Reusable-Respirators/?N=5002385+8711017+8720539+8720550+3294857497&rt=r3">3M.com</a>)</em></p>
<p>Since I have only tried a respirator of the 6500 series, I do not have a strong view on which is preferable. I would recommend the 6500, mostly because I have already demonstrated that it’s possible to cover the valve. The 6000 series does not have a downward-facing exhalation valve and may be harder to work with. I’m agnostic about the relative merits of the 7500.</p>
<p>The 6500 series has a quick latch version (difference explained <a href="https://www.3m.com/3M/en_US/worker-health-safety-us/personal-protective-equipment/half-face-respirator/">here</a>), which is the one I used. I’d recommend the quick latch 6500QL series, because it seems that the latch makes the fit of the KN95 material to the respirator more secure (see video). By the way, attaching a mask on top of the valve makes the quick-latch mechanism much less effective; I never use it.</p>
<p>Each series comes in three sizes, large, medium and small. I am a male with a medium-to-large head, and I use a medium (the 6502QL).</p>
<p>Regarding whether airlines will accept this setup, I have heard both some positive anecdotes and one negative anecdote.</p>
<h2 id="choice-of-filters-for-a-3m-respirator">Choice of filters for a 3M respirator</h2>
<p>I use the 3M 2097 P100 filters.</p>
<p>You should use lightweight filters that are rated N100, R100 or P100. The “100” <a href="https://en.wikipedia.org/wiki/NIOSH_air_filtration_rating">means</a> that 99.97% of particles smaller than 0.3 micrometres are filtered out. The letters N, R and P refer to whether the filter is still effective when exposed to oil-based aerosols, this should be irrelevant for our purposes.</p>
<p>The weight of the filters is a crucial determinant of comfort. I originally used the 3M respirator with the 3M 60926 cartridges, which filter gases and vapors as well as particles. This was a mistake, as filtering gases and vapors is irrelevant from the point of view of infectious disease, and these cartridges are much heavier than the 3M 2097 P100 filters. Switching to the lighter filters made a world of difference; now wearing the 3M doesn’t bother me at all.</p>
<p><img src="/assets/images/mask/3m_weight_cartridge.webp" width="50%" /><img src="/assets/images/mask/3m_weight_filter.webp" width="50%" /><br />
<em>The 3M 6502QL respirator weighs 395 g. with 3M 60926 cartridges, but only 128 g. with 3M 2097 filters, a 68% reduction.</em></p>
<h1 id="recommmendation-b-miller-lpr-100-with-tape-and-surgical-mask">Recommmendation B: Miller LPR-100 with tape and surgical mask</h1>
<p>I believe that, in expectation, the previous method offers slightly worse protection to third parties than a well-fit valveless medical N95, because our makeshift exhalation valve filter may not be entirely effective.</p>
<p>This section details another technique which may be able to achieve the best of both worlds: the comfort and availability of industrial masks, and the third-party protection offered by valveless masks.</p>
<h2 id="how-we-need-to-modify-the-respirator">How we need to modify the respirator</h2>
<p>Let’s look at how the valves in industrial masks work. I’ll be using the Miller LPR-100, but the 3M is built similarly.</p>
<p>The exhalation valve is at the front. There are also two inhalation valves, one on each side between the mouth and the filter. These only allow air to come into the mask from outside, forcing all the exhaled air to go through the exhalation valve (instead of some of it going back through the filter).</p>
<p><img src="/assets/images/mask/miller_valves_0.webp" width="50%" /><img src="/assets/images/mask/miller_valves_1.webp" width="50%" />
<em>Miller LPR-100 valves</em></p>
<p>We need to disable both the inhalation and exhalation valves:</p>
<ul>
<li>The unfiltered exhalation valve should be completely sealed off.</li>
<li>In order to allow the user to exhale, the inhalation valves need to be turned into simple holes that allow two-way air circulation.</li>
</ul>
<p>This will mean that both inhaled and exhaled air will go through the P100 filters.</p>
<h2 id="exhalation-valve">Exhalation valve</h2>
<p>We can seal off the exhalation valve from the outside with tape<sup id="fnref:altmethods" role="doc-noteref"><a href="#fn:altmethods" class="footnote" rel="footnote">6</a></sup>. On the Miller respirator, there is a little plastic cage covering the valve, and this cage can be taped over. Note that tape sticks very poorly to the elastomer (the dark blue material on the Miller). This is why I only place tape on the plastic; this seems to be sufficient.</p>
<p><img src="/assets/images/mask/tape_0.webp" width="50%" /><img src="/assets/images/mask/tape_1.webp" width="50%" />
<em>Tape on exhalation cage</em></p>
<p>I am using <a href="https://www.amazon.com/gp/product/B00004Z4DU/ref=ppx_yo_dt_b_asin_title_o01_s00?ie=UTF8&psc=1">painter’s tape</a> because it’s supposed to pull off without leaving a residue of glue. It’s possible that it would be better to use tape with a stronger adhesive. (A friend of mine commented: “Some ideas for sealing off the exhalation valve: (1) Butyl tape/self-vulcanizing tape. Not so much a sticky tape as a ribbon of moldable putty, so no adhesive residue. This stuff is pretty much unparalleled if you need to make a fully gas- and watertight seal around an irregularly shaped opening in a pinch without making a mess. The fact that it has no adhesive does put some constraints on the geometry of the part you’re sealing off, but I think it would work (better than painter’s tape, at least) on the Miller. (2) Vinyl tape/electrical tape. It’s relatively water-resistant and can be stretched to some extent. The adhesive also sticks to polymers pretty well (although it does leave a lot of residue after some time, but you can clean that off with a bit of IPA).”)</p>
<p>You can check the seal of your tape by pressing the mask onto your face and attempting to exhale (with the inhalation valves intact). Air should only be able to escape through the sides of the mask.</p>
<h2 id="inhalation-valves">Inhalation valves</h2>
<p>The inhalation valves are removable and can be pulled out. They are very thin and feel like they might be about to break when you pull them out, but I have been able to pull four of them out without a problem.</p>
<p><img src="/assets/images/mask/miller_valves_1.webp" width="50%" /><img src="/assets/images/mask/valve_hand.webp" width="50%" />
<em>Touching a valve (left), a valve after it has been pulled out (right)</em></p>
<p><img src="/assets/images/mask/valves_plugs.webp" width="50%" /><img src="/assets/images/mask/valves_removed_0.webp" width="50%" />
<em>The two inhalation valves (left), the filter now visible through the holes (right)</em></p>
<p>Pushing the valves back in is easy.</p>
<p>The tape can be removed and the valves re-inserted, making my modification fully reversible.</p>
<h2 id="also-use-a-surgical-mask">Also use a surgical mask</h2>
<p>Even if you’re using the tape technique, I recommend also covering the respirator with a surgical mask, since this has no downsides and might have some benefit. The seal on the exhalation valve might not be perfect and may get worse over time, so an extra layer of filtration, however imperfect, is a good backup.</p>
<p>It’s also beneficial because it makes what you’re doing legible to others. You don’t want to explain this weird tape business to strangers, even if it’s for their protection.</p>
<p><img src="/assets/images/mask/miller_respirator_tom.webp" width="32%" /><img src="/assets/images/mask/miller_tape.webp" width="32%" /><img src="/assets/images/mask/industrial_w_surgical.webp" width="32%" />
<em>Miller LPR-100. Unmodified (left), with tape (middle), with tape and surgical mask (right)</em></p>
<h2 id="choice-of-respirator-1">Choice of respirator</h2>
<p>For this technique, I recommend the <a href="https://www.google.com/search?q=miller+lpr-100">Miller LPR-100</a><sup id="fnref:m" role="doc-noteref"><a href="#fn:m" class="footnote" rel="footnote">7</a></sup>.</p>
<p>I recommend the Miller over the 3M because:</p>
<ul>
<li>its build quality feels superior to me</li>
<li>it looks better</li>
<li>it blocks less of your field of view</li>
</ul>
<p>Since the Miller is better than the 3M, and 3M is such a huge player in this market, I think there’s a decent chance that the Miller is in fact one of the very best options that exists.</p>
<p>The Miller weighs 139 g., which is a negligible difference to the 3M’s 128 g</p>
<p>I also like the fact that you can buy a neat <a href="https://www.google.com/search?q=miller%20283374">rigid case</a> to hold the Miller respirator. The case is called the 283374.</p>
<p><img src="/assets/images/mask/miller_case_0.webp" width="50%" /><img src="/assets/images/mask/miller_case_1.webp" width="50%" /><br />
<em>Miller case, 283374</em></p>
<p>The Miller model comes with replaceable P100-rated filters, while the 3M can be used with many types of filters and cartridges.</p>
<p>If you want to implement this technique on the 3M, it should be possible; all steps will be similar.</p>
<h1 id="potential-concerns">Potential concerns</h1>
<p>The 3M+KN95 method we discussed earlier can be seen as a simple adaptation of CDC guidelines, so I have fewer concerns about it.</p>
<p>However, the tape technique involves a more fundamental alteration. This might seem unwise. How do I know I haven’t messed up something crucial, endangering myself and others?</p>
<p>Before discussing the specific concerns, it’s useful to consider: what are the relevant alternatives to my recommendation?</p>
<p>My best guess is that constantly wearing a correctly fitted medical N95, with the really tight elastic bands, is very slightly safer for others in expectation than the tape method (due to risks of things going wrong, like the tape getting unstuck). However, it is not a likely alternative for everyday use. First, in my experience, N95s are more difficult to fit correctly than industrial masks. Second, for me, these respirators are prohibitively uncomfortable. I have seen few people use them. I think the realistic alternatives for most people are cloth and surgical masks. I am relatively confident that both of my techniques are an improvement on that, for both the user and third parties.</p>
<!-- The probability mass I assign to harm comes mostly from unknown unknowns and the fact that official agencies don't recommend my technique, not from any specific evidence of risks. -->
<!-- I currently don't plant to prioritize writing a full detailed justification of these beliefs for a sceptical audience. Writing such a thing is a great deal of effort and there are benefits to getting the idea out earlier. In order to let you come to the best conclusion for you given your knowledge and risk tolerance, I'm trying to lay out most of the evidence against my proposal that I've considered (without giving the full reasoning for why I find this evidence insufficiently persuasive on balance). -->
<p>By the way, I am not an expert in disease control. I studied economics and philosophy and then worked as a researcher.</p>
<h2 id="repeated-use">Repeated use</h2>
<p>Healthcare N95s are supposed to be used only once before being decontaminated. However, I plan to use the same filters many times. Is this a problem?</p>
<p>Why are N95s supposed to be used once? According to this <a href="https://www.cdc.gov/niosh/topics/hcwcontrols/recommendedguidanceextuse.html">CDC guidance</a>,</p>
<blockquote>
<p>the most significant risk [of extended use and reuse] is of contact transmission from touching the surface of the contaminated respirator. … Respiratory pathogens on the respirator surface can potentially be transferred by touch to the wearer’s hands and thus risk causing infection through subsequent touching of the mucous membranes of the face. …</p>
<p>While studies have shown that some respiratory pathogens remain infectious on respirator surfaces for extended periods of time, in microbial transfer [touching the respirator] and reaerosolization [coughing or sneezing through the respirator] studies more than ~99.8% have remained trapped on the respirator after handling or following simulated cough or sneeze.</p>
</blockquote>
<p>Since I plan to leave the respirator unused for hours or days between each use, and any viral dose on the exterior of the filters is likely to be very small, I don’t think this is a huge concern overall. I am very open to contrary evidence.</p>
<p>By the way, based on this guidance, it seems to me we should also worry less about reusing respirators and masks in general, even without decontamination. (Decontamination makes a lot more sense for health care workers who are exposed to COVID patients).</p>
<p>It’s good to remember to avoid touching the filters.</p>
<h2 id="cdc-guidelines">CDC guidelines</h2>
<p>As explained above, the CDC recommends a surgical or cloth mask to cover the valve. There is no evidence that they considered either of the techniques I described above when issuing their blog post.</p>
<p>The tape method is a greater deviation from the CDC guidelines than the KN95-covering method, so if you care about following official guidance you could use the latter.</p>
<h2 id="concerns-specific-to-tape-method">Concerns specific to tape method</h2>
<p>I assign a relatively low chance that the tape method is worse than the CDC recommendation of covering the valve with a surgical mask (my views depend considerably on the tightness of the surgical mask seal).
and a very low chance that it’s worse than a surgical mask alone. The probability mass I assign to harm is a combination of concerns about exhaling through the filter reducing its efficacy, and unknown unknowns.</p>
<h3 id="c02-rebreathing">C02 rebreathing</h3>
<p>Without the valves, part of the air you inhale will be air that you just exhaled, which contains more C02. I have not personally noticed any effects from this.</p>
<h3 id="exhaling-through-filter">Exhaling through filter</h3>
<p>Could exhaling through the filter be a bad thing somehow? I wasn’t able to find any source making an explicit statement on this, but I think it’s unlikely to be a problem.</p>
<p>One reason to worry is that the founder of <a href="https://narwallmask.com/">Narwall Mask</a> has told me that, according to one filtration expert he spoke to, one-way airflow greatly prolongs the life of the filters compared to two-way airflow. However, based on my small amount of research, I don’t think the life of the filters would be affected to a degree that is practically important.</p>
<p>The MSA valveless elastomeric respirator that I mentioned in this footnote<sup id="fnref:MSA:1" role="doc-noteref"><a href="#fn:MSA" class="footnote" rel="footnote">2</a></sup> appears to have filters that can be used for more than 1 month of daily use during the workday; and moreover, we can see in the respirator’s <a href="https://msa.webdamdb.com/directdownload.php?ti=42341334&tok=sKRw3WQqRXhcJ/V4EfuwtARR">brochure</a> that these filters, with model number 815369, are the same as those that are used in MSA’s line of regular, valved elastomeric respirators (see <a href="http://msa.webdamdb.com/bp/#/folder/1749983/50030424">here</a>). From this I conclude that: two-way airflow through regular P100 filters was considered an acceptable design choice by MSA; and these filters can be used two-way for at least a month of hospital use.</p>
<p>In addition, healthcare N95s (without valves) are designed to be exhaled through. They are only rated for a day of use, but I believe this is <em>not</em> because the filter loses efficacy (see <a href="#repeated-use">section on repeated use</a>).</p>
<p>Exhaled air has a <a href="https://en.wikipedia.org/wiki/Humidity#Relative_humidity">relative humidity</a> close to 100%. Could exposure to humid air reduce the efficacy of the filters? In <a href="https://academic.oup.com/annweh/article/59/5/629/2196149">this study</a> of N95 filters, the difference penetration rose from around 2% to around 4% when relative humidity went from 10% to 80%, and this effect increased with duration of continuous use. The flow rate was 85 L/min.</p>
<p><img src="/assets/images/mask/humidity_0.webp" width="75%" /><br />
<em>Combination of figures 3 and 5, <a href="https://academic.oup.com/annweh/article/59/5/629/2196149">Mahdavi et al.</a></em></p>
<p>Note that this study, which simulates inhalation of humid air, does not address (except very indirectly) the question of how the <em>exhalation</em> humidity affects the <em>inhalation</em> filtration.</p>
<h3 id="discussion-of-tape-technique-by-others">Discussion of tape technique by others</h3>
<ul>
<li><a href="https://www.cdc.gov/niosh/docs/2021-107/pdfs/2021-107.pdf?id=10.26616/NIOSHPUB2021107">This NIOSH study</a> tested three modifications of valved respirators: covering the valve on the interior with surgical tape, covering the valve on the interior with an electrocardiogram (ECG) pad, and stretching a surgical mask over the exterior of the respirator.
<ul>
<li>They found that “penetration was 23% for the masked-over mitigation; penetration was 5% for the taped mitigation; penetration was 2% for the [ECG pad] mitigation”. I would be very interested in more discussion of why the ECG pad did so much better than the surgical tape, the authors don’t say much. One guess could be that the ECG pad has a more powerful adhesive, which would suggest that it’s important to choose a strongly adhesive tape if implementing my technique.</li>
<li>When discussing the choice of modification strategies, the authors wrote that “two concerns are that the adhesive could pull away from the surface, thereby not blocking airflow to the same degree over time, and that these adhesives could contain chemicals that have toxicological effects.”
<img src="/assets/images/mask/niosh_study.webp" alt="study" /></li>
</ul>
</li>
<li>In an <a href="https://multimedia.3m.com/mws/media/1791526O/respiratory-protection-faq-general-public-tb.pdf">FAQ released by 3M</a>, in response to the question of whether one should tape over the exhalation valve, they wrote “3M does not recommend that tape be placed over the exhalation valve”, but do not give any reasons for this beyond the fact that it may become “more difficult to breathe through … if the exhalation valve is taped shut”.</li>
<li>The state of Maine’s Department of Public Safety <a href="https://www.maine.gov/ems/sites/maine.gov.ems/files/inline-files/2020-08-21%20Operational%20Bulletin%20Regarding%20Masks%20with%20Exhalation%20Valves.pdf">recommends against tape-covering</a>, but merely because “this would be considered altering the device and violates the manufacturer’s recommendation”.</li>
</ul>
<h1 id="overall-recommendation">Overall recommendation</h1>
<p>I think it’s about 50/50 which of my two methods is better all things considered. They’re close enough that I think the correct decision depends on how much you care about protecting yourself vs source control. If source control is a minor consideration to you, I’d go with the KN95 valve coverage method, otherwise the tape method.</p>
<p>(As I said in a previous footnote<sup id="fnref:MSA:2" role="doc-noteref"><a href="#fn:MSA" class="footnote" rel="footnote">2</a></sup>, if a valveless elastomeric mask is widely available by the time you read this, that is absolutely a superior option to the hacks I have developed.)</p>
<p>(The <a href="https://narwallmask.com/">Narwall Mask</a> is a commercial solution based on a snorkel mask that may be appealing if you don’t mind (i) the lack of NIOSH-approval and (ii) buying from a random startup, and (iii) you don’t mind or even prefer the full-facepiece design.)</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:fatal" role="doc-endnote">
<p>Or is it fatal? I had always assumed it was a fatal flaw, until I found some experts arguing otherwise. In <a href="https://www.ajicjournal.org/article/S0196-6553(20)30888-9/fulltext">this commentary</a>, the authors say: “Data characterizing particle release through exhalation valves are presently lacking; it is our opinion that such release will be limited by the complex path particles must navigate through a valve. We expect that fewer respiratory aerosols escape through the exhalation valve than through and around surgical masks, unrated masks, or cloth face coverings, all of which have much less efficient filters and do not fit closely to the face”.</p>
<p>I have been able to find some data; this <a href="https://www.cdc.gov/niosh/docs/2021-107/pdfs/2021-107.pdf?id=10.26616/NIOSHPUB2021107">recent NIOSH study</a> finds that valved N95s have 1-40% penetration. “some models … had less than 20% penetration even without any mitigation. Other models … had much greater penetration with a median penetration above 40%.” Note that for these tests, the flow rates of 25-85 L/min are higher than the 6 L/min of a person at rest, and that lower flow rates had lower penetration.</p>
<p>Penetration rates of tens of percent are not very good, and not acceptable for my standards, but it’s less bad than I expected, perhaps competitive with surgical masks, and better than cloth masks!</p>
<p><img src="/assets/images/mask/niosh_study_fig5.webp" alt="Niosh" /> <a href="#fnref:fatal" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:MSA" role="doc-endnote">
<p>In fact, as of November 25 2020, the company MSA Safety announced in a <a href="https://www.prnewswire.com/news-releases/first-elastomeric-respirator-without-exhalation-valve-approved-by-niosh-301180276.html">press release</a> that the <strong>first elastomeric respirator without an exhalation valve has been approved by NIOSH</strong>. It’s called the Advantage 290 Respirator. The <a href="https://us.msasafety.com/p/0001000002W0001120">product page</a> has some good documentation.</p>
<p>This <a href="https://www.journalacs.org/article/S1072-7515(20)30471-3/fulltext">journal article</a> from September 2020, although it does not mention MSA, appears to be about the Advantage 290. (This is based on the picture in Fig 1. resembling the picture in the press release, and the fact that the hospitals in the paper are in Pennsylvania and New York states, while MSA is headquartered in Pennsylvania). The article explains how it was rolled out to thousands of healthcare workers (a first wave had 1,840 users). They claim that the cost was “approximately $20 for an elastomeric mask and $10 per cartridge”, which is amazingly low.</p>
<p>They write: “After more than 1 month of usage, we have found that filters have not needed to be changed more frequently than once a month”.</p>
<p>Unfortunately, it seems to be difficult to get one’s hands on one of these right now. The website invites you to contact sales, and the lowest option for “your budget” is “less than $9,999”.</p>
<p>Moreover, even if you were able to get the Advantage 290, it might be too selfish to do so, since this respirator is likely to otherwise be used by healthcare workers. On the other hand, the price signal you create would in expectation lead to greater quantities being produced, partially offsetting the effect. If you are able to get one by paying a large premium over the hospital price, this may even be net positive for others.</p>
<p>If this respirator became available in large quantities, everything I say here would be obsolete.</p>
<p>By the way, I am astonished that it took until this November 2020 for a PPE company to create a valveless elastomeric respirator, this seems to be a very useful product for any infectious disease situation. <a href="#fnref:MSA" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:MSA:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:MSA:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a></p>
</li>
<li id="fn:cdc" role="doc-endnote">
<p>It’s unclear to me how much one should downweight this recommendation due to appearing on a CDC blog rather than as more formal CDC guidance. In the post, the recommendations are called “tips”. <a href="#fnref:cdc" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:surgical_fda" role="doc-endnote">
<p>The <a href="https://www.fda.gov/medical-devices/personal-protective-equipment-infection-control/n95-respirators-surgical-masks-and-face-masks#s2">FDA says</a>: “While a surgical mask may be effective in blocking splashes and large-particle droplets, a face mask, by design, does not filter or block very small particles in the air that may be transmitted by coughs, sneezes, or certain medical procedures.” <a href="#fnref:surgical_fda" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:flowrate" role="doc-endnote">
<p>3M <a href="https://risk.arizona.edu/sites/default/files/3mrespiratorsandsurgicalmaskcomparison.pdf">claims</a> that “85 liters per minute (lpm) represents a very high work rate, equivalent to the breathing rate of an individual running at 10 miles an hour”. <a href="http://webcache.googleusercontent.com/search?q=cache:7nf66Oofo38J:www.umich.edu/~exphysio/mvs110lecture/Readings/Reading7PhysiolSupportSys.doc">These lecture notes</a> say that a person has a pulmonary ventilation of 6 L/min at rest, 75 L/min during moderate exercise, and 150L/min during vigorous exercise. <a href="#fnref:flowrate" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:altmethods" role="doc-endnote">
<p>I tried two other methods before I settled on using tape: gluing a thin silicon wafer over the valve on the <em>inside</em> of the mask, and applying glue to the valve directly. Both these methods are entirely inferior and should not be used. <a href="#fnref:altmethods" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:m" role="doc-endnote">
<p>The model number is ML00895 for the M/L size, and ML00894 for the S/M size. <a href="#fnref:m" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Summary: I have tried many types of masks and respirators during the 2020 pandemic. My recommendation is to use ‘elastomeric’ respirators common in industry, and to either filter or completely block off their exhalation valve. The result is a comfortable respirator that I believe offers a high level of protection against airborne diseases to myself and others. I am not an infectious disease expert.Efficient validity checking in monadic predicate logic2020-11-27T00:00:00+00:002020-11-27T00:00:00+00:00https://fragile-credences.github.io/monadic-predicate<p>Monadic predicate logic (with identity) is decidable. (See Boolos, Burgess, and Jeffrey 2007, Ch. 21. The result goes back to Löwenheim-Skolem 1915).</p>
<p>How can we <a href="https://monadic-predicate.herokuapp.com/">write a program</a> to check whether a formula is logically valid (and hence also a theorem)?</p>
<p>First, we have to parse the formula, meaning to convert it form a string into a format that represents its syntax in a machine-readable way. That format is an <a href="https://en.wikipedia.org/wiki/Abstract_syntax_tree">abstract syntax tree</a> like this:</p>
<pre style="font-family:monospace">
Formula:
∀x(Ax→(Ax∧Bx))
Abstract syntax tree:
∀
├── x
└── →
├── A
│ └── x
└── ∧
├── A
│ └── x
└── B
└── x
</pre>
<p>Writing the parser was a fun lesson in a fundamental aspect of computer science. But there was nothing novel about this exercise, and not much interesting to say about it.</p>
<p>The focus of this post, instead, is the part of the program that actually checks whether this syntax tree represents a logically valid formula.</p>
<p>To start with, we might try to evaluate the formula under every possible model of a given size. How big does the model need to be?</p>
<p>We can make use of the Löwenheim-Skolem theorem (looking first at the case without identity):</p>
<blockquote>
<p>If a sentence of monadic predicate logic (without identity) is satisfiable, then it has a model of size no greater than \(2^k\), where \(k\) is the number of predicates in the sentence. (Lemma 21.8 BBJ).</p>
</blockquote>
<p>A sentence’s negation is satisfiable if and only if the sentence is not valid, so the theorem equivalently states: a sentence is valid iff it is true under every model of size no greater than \(2^k\).</p>
<p>For a sentence with \(k\) predicates, every constant \(c\) in the model is assigned a list of \(k\) truth-values, representing for each predicate \(P\) whether \(P(c)\). We can use <code class="language-plaintext highlighter-rouge">itertools</code> to find every possible such list, i.e. every possible assignment to a constant.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>>></span> <span class="n">k</span> <span class="o">=</span> <span class="mi">2</span>
<span class="o">>>></span> <span class="n">possible_predicate_combinations</span> <span class="o">=</span> <span class="p">[</span><span class="n">i</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">itertools</span><span class="p">.</span><span class="n">product</span><span class="p">([</span><span class="bp">True</span><span class="p">,</span><span class="bp">False</span><span class="p">],</span><span class="n">repeat</span><span class="o">=</span><span class="n">k</span><span class="p">)]</span>
<span class="p">[(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span> <span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">False</span><span class="p">),</span> <span class="p">(</span><span class="bp">False</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span> <span class="p">(</span><span class="bp">False</span><span class="p">,</span> <span class="bp">False</span><span class="p">)]</span>
</code></pre></div></div>
<p>The list of every possible assignment to a constant has a length of \(2^k\).</p>
<p>We can then ask <code class="language-plaintext highlighter-rouge">itertools</code> to give us, for a model of size \(m\), every possible combination of \(m\) such lists of possible constant-assignments. We let \(m\) be at most \(2^k\), because of the theorem.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>>></span> <span class="k">for</span> <span class="n">m</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="o">**</span><span class="n">k</span><span class="o">+</span><span class="mi">1</span><span class="p">):</span>
<span class="o">>>></span> <span class="n">possible_models</span> <span class="o">=</span> <span class="p">[</span><span class="n">i</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">itertools</span><span class="p">.</span><span class="n">product</span><span class="p">(</span><span class="n">possible_predicate_combinations</span><span class="p">,</span><span class="n">repeat</span><span class="o">=</span><span class="n">m</span><span class="p">)]</span>
<span class="o">>>></span> <span class="k">print</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">possible_models</span><span class="p">),</span><span class="s">"possible models of size"</span><span class="p">,</span><span class="n">m</span><span class="p">)</span>
<span class="o">>>></span> <span class="k">for</span> <span class="n">model</span> <span class="ow">in</span> <span class="n">possible_models</span><span class="p">:</span>
<span class="o">>>></span> <span class="k">print</span><span class="p">(</span><span class="nb">list</span><span class="p">(</span><span class="n">model</span><span class="p">))</span>
<span class="mi">4</span> <span class="n">possible</span> <span class="n">models</span> <span class="n">of</span> <span class="n">size</span> <span class="mi">1</span>
<span class="p">[(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">)]</span>
<span class="p">[(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">False</span><span class="p">)]</span>
<span class="p">[(</span><span class="bp">False</span><span class="p">,</span> <span class="bp">True</span><span class="p">)]</span>
<span class="p">[(</span><span class="bp">False</span><span class="p">,</span> <span class="bp">False</span><span class="p">)]</span>
<span class="mi">16</span> <span class="n">possible</span> <span class="n">models</span> <span class="n">of</span> <span class="n">size</span> <span class="mi">2</span>
<span class="p">[(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span> <span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">)]</span>
<span class="p">[(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span> <span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">False</span><span class="p">)]</span>
<span class="p">[(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span> <span class="p">(</span><span class="bp">False</span><span class="p">,</span> <span class="bp">True</span><span class="p">)]</span>
<span class="p">[(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span> <span class="p">(</span><span class="bp">False</span><span class="p">,</span> <span class="bp">False</span><span class="p">)]</span>
<span class="p">[(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">False</span><span class="p">),</span> <span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">)]</span>
<span class="p">[(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">False</span><span class="p">),</span> <span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">False</span><span class="p">)]</span>
<span class="p">[(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">False</span><span class="p">),</span> <span class="p">(</span><span class="bp">False</span><span class="p">,</span> <span class="bp">True</span><span class="p">)]</span>
<span class="p">[(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">False</span><span class="p">),</span> <span class="p">(</span><span class="bp">False</span><span class="p">,</span> <span class="bp">False</span><span class="p">)]</span>
<span class="p">[(</span><span class="bp">False</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span> <span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">)]</span>
<span class="p">[(</span><span class="bp">False</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span> <span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">False</span><span class="p">)]</span>
<span class="p">[(</span><span class="bp">False</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span> <span class="p">(</span><span class="bp">False</span><span class="p">,</span> <span class="bp">True</span><span class="p">)]</span>
<span class="p">[(</span><span class="bp">False</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span> <span class="p">(</span><span class="bp">False</span><span class="p">,</span> <span class="bp">False</span><span class="p">)]</span>
<span class="p">[(</span><span class="bp">False</span><span class="p">,</span> <span class="bp">False</span><span class="p">),</span> <span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">)]</span>
<span class="p">[(</span><span class="bp">False</span><span class="p">,</span> <span class="bp">False</span><span class="p">),</span> <span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">False</span><span class="p">)]</span>
<span class="p">[(</span><span class="bp">False</span><span class="p">,</span> <span class="bp">False</span><span class="p">),</span> <span class="p">(</span><span class="bp">False</span><span class="p">,</span> <span class="bp">True</span><span class="p">)]</span>
<span class="p">[(</span><span class="bp">False</span><span class="p">,</span> <span class="bp">False</span><span class="p">),</span> <span class="p">(</span><span class="bp">False</span><span class="p">,</span> <span class="bp">False</span><span class="p">)]</span>
<span class="mi">64</span> <span class="n">possible</span> <span class="n">models</span> <span class="n">of</span> <span class="n">size</span> <span class="mi">3</span>
<span class="p">[(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span> <span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span> <span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">)]</span>
<span class="p">[(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span> <span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span> <span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">False</span><span class="p">)]</span>
<span class="p">[(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span> <span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span> <span class="p">(</span><span class="bp">False</span><span class="p">,</span> <span class="bp">True</span><span class="p">)]</span>
<span class="p">[(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span> <span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span> <span class="p">(</span><span class="bp">False</span><span class="p">,</span> <span class="bp">False</span><span class="p">)]</span>
<span class="p">[(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span> <span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">False</span><span class="p">),</span> <span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">)]</span>
<span class="p">[(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span> <span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">False</span><span class="p">),</span> <span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">False</span><span class="p">)]</span>
<span class="p">[(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span> <span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">False</span><span class="p">),</span> <span class="p">(</span><span class="bp">False</span><span class="p">,</span> <span class="bp">True</span><span class="p">)]</span>
<span class="p">[(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span> <span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">False</span><span class="p">),</span> <span class="p">(</span><span class="bp">False</span><span class="p">,</span> <span class="bp">False</span><span class="p">)]</span>
<span class="p">[(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span> <span class="p">(</span><span class="bp">False</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span> <span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">)]</span>
<span class="p">[(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span> <span class="p">(</span><span class="bp">False</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span> <span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">False</span><span class="p">)]</span>
<span class="p">...</span>
<span class="mi">256</span> <span class="n">possible</span> <span class="n">models</span> <span class="n">of</span> <span class="n">size</span> <span class="mi">4</span>
<span class="p">[(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span> <span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span> <span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span> <span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">)]</span>
<span class="p">[(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span> <span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span> <span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span> <span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">False</span><span class="p">)]</span>
<span class="p">[(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span> <span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span> <span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span> <span class="p">(</span><span class="bp">False</span><span class="p">,</span> <span class="bp">True</span><span class="p">)]</span>
<span class="p">[(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span> <span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span> <span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span> <span class="p">(</span><span class="bp">False</span><span class="p">,</span> <span class="bp">False</span><span class="p">)]</span>
<span class="p">[(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span> <span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span> <span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">False</span><span class="p">),</span> <span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">)]</span>
<span class="p">[(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span> <span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span> <span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">False</span><span class="p">),</span> <span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">False</span><span class="p">)]</span>
<span class="p">[(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span> <span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span> <span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">False</span><span class="p">),</span> <span class="p">(</span><span class="bp">False</span><span class="p">,</span> <span class="bp">True</span><span class="p">)]</span>
<span class="p">[(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span> <span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span> <span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">False</span><span class="p">),</span> <span class="p">(</span><span class="bp">False</span><span class="p">,</span> <span class="bp">False</span><span class="p">)]</span>
<span class="p">[(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span> <span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span> <span class="p">(</span><span class="bp">False</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span> <span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">)]</span>
<span class="p">[(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span> <span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span> <span class="p">(</span><span class="bp">False</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span> <span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="bp">False</span><span class="p">)]</span>
<span class="p">...</span>
</code></pre></div></div>
<p>What’s unfortunate here is that for our \(k\)-predicate sentence, we will need to check \(\sum_{m=1}^{2^k} (2^k)^m =\frac{2^k ((2^k)^{2^k} - 1)}{2^k - 1}\) models. The sum is very roughly equal to its last term, \((2^k)^{2^k} = 2^{k2^k}\). For \(k=3\), this is a number in the billions, for \(k=4\), it’s a number with 19 zeroes.</p>
<p>So checking every model is computationally impossible in practice. Fortunately, we can do better.</p>
<p>Let’s look back at the Löwenheim-Skolem theorem and try to understand why \(2^k\) appears in it:</p>
<blockquote>
<p>If a sentence of monadic predicate logic (without identity) is satisfiable, then it has a model of size no greater than \(2^k\) , where \(k\) is the number of predicates in the sentence. (Lemma 21.8 BBJ).</p>
</blockquote>
<p>As we’ve seen, \(2^k\) is the number of possible combinations of predicates that can be true of a constant in the domain. Visually, this is the number of subsets in a <em><a href="https://en.wikipedia.org/wiki/Partition_of_a_set">partition</a></em> of the possibility space:</p>
<p><img src="/assets/images/possibility-space-partition.png" alt="" /></p>
<p>If a model had a size of, say, \(2^k + 1\), one of the subsets in the partition would need to contain more than one element. But this additional element would be superfluous insofar as the truth-value of the sentence is concerned. The partition subset corresponds to a predicate-combination that would already be true with just one element in the subset, and will continue to be true if more elements are added. Take, for example, the subset labeled ‘8’ in the drawing, which corresponds to \(R \land \neg Q \land \neg P\). The sentence \(\exists x R(x) \land \neg Q(x) \land \neg P(x)\) is true whether there are one, two, or a million elements in subset 8. Similarly, \(\forall x R(x) \land \neg Q(x) \land \neg P(x)\) does not depend on the number of elements in subset 8.</p>
<p>Seeing this not only illuminates the theorem, but also let us see that the vast majority of the multitudinous \(\sum_{m=1}^{2^k} (2^k)^m\) models we considered earlier are equivalent. All that matters for our sentence’s truth-value is whether each of the subsets is <em>empty</em> or non-empty. This means there are in fact only \(2^{(2^k)}-1\) model equivalence classes to consider. We need to subtract one because the subsets cannot all be empty, since the domain needs to be non-empty.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>>></span> <span class="n">k</span> <span class="o">=</span> <span class="mi">2</span>
<span class="o">>>></span> <span class="n">eq_classes</span> <span class="o">=</span> <span class="p">[</span><span class="n">i</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">itertools</span><span class="p">.</span><span class="n">product</span><span class="p">([</span><span class="s">'Empty'</span><span class="p">,</span><span class="s">'Non-empty'</span><span class="p">],</span><span class="n">repeat</span><span class="o">=</span><span class="mi">2</span><span class="o">**</span><span class="n">k</span><span class="p">)]</span>
<span class="o">>>></span> <span class="n">eq_classes</span><span class="p">.</span><span class="n">remove</span><span class="p">((</span><span class="s">'Empty'</span><span class="p">,)</span><span class="o">*</span><span class="n">k</span><span class="o">**</span><span class="mi">2</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">eq_classes</span>
<span class="p">[(</span><span class="s">'Empty'</span><span class="p">,</span> <span class="s">'Empty'</span><span class="p">,</span> <span class="s">'Empty'</span><span class="p">,</span> <span class="s">'Non-empty'</span><span class="p">),</span>
<span class="p">(</span><span class="s">'Empty'</span><span class="p">,</span> <span class="s">'Empty'</span><span class="p">,</span> <span class="s">'Non-empty'</span><span class="p">,</span> <span class="s">'Empty'</span><span class="p">),</span>
<span class="p">(</span><span class="s">'Empty'</span><span class="p">,</span> <span class="s">'Empty'</span><span class="p">,</span> <span class="s">'Non-empty'</span><span class="p">,</span> <span class="s">'Non-empty'</span><span class="p">),</span>
<span class="p">(</span><span class="s">'Empty'</span><span class="p">,</span> <span class="s">'Non-empty'</span><span class="p">,</span> <span class="s">'Empty'</span><span class="p">,</span> <span class="s">'Empty'</span><span class="p">),</span>
<span class="p">(</span><span class="s">'Empty'</span><span class="p">,</span> <span class="s">'Non-empty'</span><span class="p">,</span> <span class="s">'Empty'</span><span class="p">,</span> <span class="s">'Non-empty'</span><span class="p">),</span>
<span class="p">(</span><span class="s">'Empty'</span><span class="p">,</span> <span class="s">'Non-empty'</span><span class="p">,</span> <span class="s">'Non-empty'</span><span class="p">,</span> <span class="s">'Empty'</span><span class="p">),</span>
<span class="p">(</span><span class="s">'Empty'</span><span class="p">,</span> <span class="s">'Non-empty'</span><span class="p">,</span> <span class="s">'Non-empty'</span><span class="p">,</span> <span class="s">'Non-empty'</span><span class="p">),</span>
<span class="p">(</span><span class="s">'Non-empty'</span><span class="p">,</span> <span class="s">'Empty'</span><span class="p">,</span> <span class="s">'Empty'</span><span class="p">,</span> <span class="s">'Empty'</span><span class="p">),</span>
<span class="p">(</span><span class="s">'Non-empty'</span><span class="p">,</span> <span class="s">'Empty'</span><span class="p">,</span> <span class="s">'Empty'</span><span class="p">,</span> <span class="s">'Non-empty'</span><span class="p">),</span>
<span class="p">(</span><span class="s">'Non-empty'</span><span class="p">,</span> <span class="s">'Empty'</span><span class="p">,</span> <span class="s">'Non-empty'</span><span class="p">,</span> <span class="s">'Empty'</span><span class="p">),</span>
<span class="p">(</span><span class="s">'Non-empty'</span><span class="p">,</span> <span class="s">'Empty'</span><span class="p">,</span> <span class="s">'Non-empty'</span><span class="p">,</span> <span class="s">'Non-empty'</span><span class="p">),</span>
<span class="p">(</span><span class="s">'Non-empty'</span><span class="p">,</span> <span class="s">'Non-empty'</span><span class="p">,</span> <span class="s">'Empty'</span><span class="p">,</span> <span class="s">'Empty'</span><span class="p">),</span>
<span class="p">(</span><span class="s">'Non-empty'</span><span class="p">,</span> <span class="s">'Non-empty'</span><span class="p">,</span> <span class="s">'Empty'</span><span class="p">,</span> <span class="s">'Non-empty'</span><span class="p">),</span>
<span class="p">(</span><span class="s">'Non-empty'</span><span class="p">,</span> <span class="s">'Non-empty'</span><span class="p">,</span> <span class="s">'Non-empty'</span><span class="p">,</span> <span class="s">'Empty'</span><span class="p">),</span>
<span class="p">(</span><span class="s">'Non-empty'</span><span class="p">,</span> <span class="s">'Non-empty'</span><span class="p">,</span> <span class="s">'Non-empty'</span><span class="p">,</span> <span class="s">'Non-empty'</span><span class="p">)]</span>
</code></pre></div></div>
<p>We are now ready to consider the extension to monadic predicate logic with identity. With identity, it’s possible to check whether any two members of a model are distinct or identical. This means we can distinguish the case where a partition subset contains one element from the case where it contains several. But we can still only distinguish up to a certain number of elements in a subset. That number is bounded above by the number of variables in the sentence<sup id="fnref:equality" role="doc-noteref"><a href="#fn:equality" class="footnote" rel="footnote">1</a></sup> (e.g. if you only have two variables \(x\) and \(y\), it’s not possible to construct a sentence that asserts there are three different things in some subset). Indeed we have:</p>
<blockquote>
<p>If a sentence of monadic predicate logic with identity is satisfiable, then it has a model of size no greater than \(2^k \times r\), where \(k\) is the number of monadic predicates and \(r\) the number of variables in the sentence. (Lemma 21.9 BBJ)</p>
</blockquote>
<p>By analogous reasoning to the case without identity, we need only consider \((r+1)^{(2^k)}-1\) model equivalence classes. All that matters for our sentence’s truth-value is whether each of the subsets has \(0, 1, 2 ...\) or \(r\) elements in it.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>>></span> <span class="n">k</span> <span class="o">=</span> <span class="mi">2</span>
<span class="o">>>></span> <span class="n">r</span> <span class="o">=</span> <span class="mi">2</span>
<span class="o">>>></span> <span class="n">eq_classes</span> <span class="o">=</span> <span class="p">[</span><span class="n">i</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">itertools</span><span class="p">.</span><span class="n">product</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="n">r</span><span class="o">+</span><span class="mi">1</span><span class="p">),</span><span class="n">repeat</span><span class="o">=</span><span class="mi">2</span><span class="o">**</span><span class="n">k</span><span class="p">)]</span>
<span class="o">>>></span> <span class="n">eq_classes</span><span class="p">.</span><span class="n">remove</span><span class="p">((</span><span class="mi">0</span><span class="p">,)</span><span class="o">*</span><span class="n">k</span><span class="o">**</span><span class="mi">2</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">eq_classes</span>
<span class="p">[(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span>
<span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="p">),</span>
<span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">),</span>
<span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span>
<span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">),</span>
<span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">0</span><span class="p">),</span>
<span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span>
<span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">),</span>
<span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">),</span>
<span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span>
<span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="p">),</span>
<span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">),</span>
<span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span>
<span class="p">...</span>
</code></pre></div></div>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:equality" role="doc-endnote">
<p>I believe it should be possible to find a tighter bound based on the number of times the equals sign actually appears in the sentence. For example, if equality is only used once, e.g. in \(\exists x \exists y \neg(x =y) \land \phi\) where \(\phi\) does not contain equality, it seems clear that the number of variables in \(\phi\) should have no bearing on the model size that is needed. My hunch is that more generally you need \(n*(n-1)/2\) uses of ‘\(=\)’ to assert that \(n\) objects are distinct, so, for example if ‘\(=\)’ appears 5 times you can distinguish 3 objects in a subset, or with 12 ‘\(=\)’s you can distinguish 5 objects. It’s only an intuition and I haven’t checked it carefully. <a href="#fnref:equality" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Monadic predicate logic (with identity) is decidable. (See Boolos, Burgess, and Jeffrey 2007, Ch. 21. The result goes back to Löwenheim-Skolem 1915).Protecting yourself from Vanguard’s poor security practices2020-11-25T00:00:00+00:002020-11-25T00:00:00+00:00https://fragile-credences.github.io/vanguard<p>The index fund company Vanguard supports two-factor authentication (2fa) with SMS. SMS is known to be the worst form of 2fa, because it is vulnerable to so-called SIM-swapping attacks. In this type of attack, the malicious party impersonates you and tells your telephone company you’ve lost your SIM card. They request that your number be moved to a new SIM card that they possess. The attacker could call up the company and ask them to activate a spare SIM card that they’ve acquired earlier, or they could visit a store and ask to be given a new SIM card for your number. Then they can receive your security codes.</p>
<p>The security of SMS-based 2fa is only as good as your phone operator’s protections against SIM-swapping, meaning probably not very good. The attacker only needs to convince one mall telco shop employee that they’re you, and they can likely try as many times as they want.</p>
<p>Vanguard claims to also support hardware security keys as a second factor. These are widely regarded as the gold standard for 2fa. Not only are they a true piece of hardware that can’t be SIM-swapped, they also ensure you’re protected even if you get fooled by a phishing attempt (by sending a code that is a function of the URL you are on).</p>
<p>So good news, right? No, because Vanguard made the inexplicable decision to force everyone who uses a security key to also keep SMS 2fa enabled as a fallback option. This utterly defeats the point. The attacker can just click ‘lost security key’ and get an SMS code instead. Users who enable the security key feature actually make their account <em>less</em> secure, because it now has two possible attack surfaces instead of one.</p>
<p><img src="/assets/images/vanguard-2fa-redbox.png" alt="" /></p>
<p>People have been complaining about this for years, ever since Vanguard first introduced security keys. On the Bogleheads forum (where intense Vanguard fanatics congregate), this issue was recognized <a href="https://www.bogleheads.org/forum/viewtopic.php?p=3144646#p3144646">in this thread from 2016</a>, <a href="https://www.bogleheads.org/forum/viewtopic.php?t=234202">this one from 2017</a>, <a href="https://www.bogleheads.org/forum/viewtopic.php?f=10&t=251560">this one from 2018</a>, and several others. There are <a href="https://www.google.com/search?q=vanguard+security+key+site:www.reddit.com">plenty of complaints</a> on reddit too. It’s fair to assume some of these people will have contacted Vanguard directly too.</p>
<p>It’s disappointing that a company with over 6 trillion dollars of assets under management offers its clients a security “feature” that makes their accounts less secure.</p>
<p><strong>The workaround I’ve found is to use a Google Voice number to receive SMS 2fa codes</strong> (don’t bother with the useless security key). Of course, you must set the Google Voice number not to forward SMS messages to your main phone number, which would defeat the purpose. Then, the messages can only be read by being logged in to the Google account. A Google account can be made into an extremely hardened target. The <a href="https://landing.google.com/advancedprotection/">advanced protection program</a> is available for the sufficiently paranoid.</p>
<p>If you don’t receive the SMS for some reason, you can also receive the authentication code with an automated call to the same number.</p>
<p>You need to have an existing US phone number to create a Google Voice account.</p>
<p>By the way, using Google Voice may not work for all companies that force you to use SMS 2fa. I have verified that it works for Vanguard. <a href="https://support.google.com/voice/thread/13363202?hl=en&msgid=13363524">This poster</a> claims that “many financial institutions will now only send their 2FA codes to true mobile phone numbers. Google Voice numbers are land lines, with the text messaging function spliced on via a third-party messaging gateway”.</p>The index fund company Vanguard supports two-factor authentication (2fa) with SMS. SMS is known to be the worst form of 2fa, because it is vulnerable to so-called SIM-swapping attacks. In this type of attack, the malicious party impersonates you and tells your telephone company you’ve lost your SIM card. They request that your number be moved to a new SIM card that they possess. The attacker could call up the company and ask them to activate a spare SIM card that they’ve acquired earlier, or they could visit a store and ask to be given a new SIM card for your number. Then they can receive your security codes.Eliciting probability distributions from quantiles2020-08-28T00:00:00+00:002020-08-28T00:00:00+00:00https://fragile-credences.github.io/quantiles<p>We often have intuitions about the probability distribution of a variable that we would like to translate into a formal specification of a distribution. Transforming our beliefs into a fully specified probability distribution allows us to further manipulate the distribution in useful ways.</p>
<p>For example, you believe that the cost of a medication is a positive number that’s about 10, but with a long right tail: say, a 10% probability of being more than 100. To use this cost estimate in a Monte Carlo simulation, you need to know exactly what distribution to plug in. Or perhaps you have a prior about the effect of creatine on cognitive performance, and you want to formally <a href="http://bayesupdate.com/">update that prior using Bayes’ rule</a> when a new study comes out. Or you want to make a forecast about a candidate’s share of the vote and evaluate the accuracy of your forecast using a <a href="https://en.wikipedia.org/wiki/Scoring_rule">scoring rule</a>.</p>
<p>In most software, you have to specify a distribution by its parameters, but these parameters are rarely intuitive. The normal distribution’s mean and standard deviation are somewhat intuitive, but this is the exception rather than the rule. The lognormal’s mu and sigma correspond to the mean and standard deviation of the variable’s <em>logarithm</em>, something I personally have no intuitions about. And I don’t expect good results if you ask someone to supply a beta distribution’s alpha and beta shape parameters.</p>
<p>I have built a tool that creates a probability distribution (of a given family) from user-supplied <em>quantiles</em>, sometimes also called percentiles. Quantiles are points on the cumulative distribution function: \((p,x)\) pairs such that \(P(X<x)=p\). To illustrate what quantiles are, we can look at the example distribution below, which has a 50th percentile (or median) of -1 and a 90th percentile of 10.</p>
<p><img src="/assets/images/quantiles/quantiles-example.png" width="50%" /><br />
<em>A cumulative distribution function with a median of -1 and a 90th percentile of 10</em></p>
<p>The <a href="https://github.com/tadamcz/make-distribution">code is on GitHub</a>, and the webapp is <a href="http://makedistribution.com">here</a>.</p>
<p>Let’s run through some examples of how you can use this tool. At the end, I will discuss how it compares to other probability elicitation software, and why I think it’s a valuable addition.</p>
<h1 id="traditional-distributions">Traditional distributions</h1>
<p>The tool supports the normal and lognormal distributions, and more of the usual distribution families could easily be added. The user supplies the distribution family, along with an arbitrary number of quantiles. If more quantiles are provided than the distribution has parameters (more than two in this case), the system is over-determined. The tool then uses least squares to find the best fit.</p>
<p>This is some example input:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">family</span> <span class="o">=</span> <span class="s">'lognormal'</span>
<span class="n">quantiles</span> <span class="o">=</span> <span class="p">[(</span><span class="mf">0.1</span><span class="p">,</span><span class="mi">50</span><span class="p">),(</span><span class="mf">0.5</span><span class="p">,</span><span class="mi">70</span><span class="p">),(</span><span class="mf">0.75</span><span class="p">,</span><span class="mi">100</span><span class="p">),(</span><span class="mf">0.9</span><span class="p">,</span><span class="mi">150</span><span class="p">)]</span>
</code></pre></div></div>
<p>And the corresponding output:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>More than two quantiles provided, using least squares fit
Lognormal distribution
mu 4.313122980928514
sigma 0.409687416531683
quantiles:
0.01 28.79055927521217
0.1 44.17183774344628
0.25 56.64439363937313
0.5 74.67332855521319
0.75 98.44056294458953
0.9 126.2366766332274
0.99 193.67827989071688
</code></pre></div></div>
<p><img src="../assets/images/quantiles/lognormal.png" width="70%" /></p>
<h1 id="metalog-distribution">Metalog distribution</h1>
<p>The feature I am most excited about, however, is the support for a new type of distribution developed specifically for the purposes of flexible elicitation from quantiles, called the meta-logistic distribution. It was first described in <a href="http://www.metalogdistributions.com/images/quantiles/The_Metalog_Distributions_-_Keelin_2016.pdf">Keelin 2016</a>, which puts it at the cutting edge compared to the venerable normal distribution invented by Gauss and Laplace around 1810. The meta-logistic, or metalog for short, does not use traditional parameters. Instead, it can take on as many terms as the user provides quantiles, and adopts whatever shape is needed to fit these quantiles very closely. Closed-form expressions exist for its quantile function (the inverse of the CDF) and for its PDF. This leads to attractive computational properties (see footnote)<sup id="fnref:computational" role="doc-noteref"><a href="#fn:computational" class="footnote" rel="footnote">1</a></sup>.</p>
<p>Keelin explains that</p>
<blockquote>
<p>[t]he metalog distributions provide a convenient way to translate CDF data into smooth, continuous, closed-from distribution functions that can be used for real-time feedback to experts about the implications of their probability assessments.</p>
</blockquote>
<p>The metalog quantile function is derived by modifying the logistic quantile function,</p>
\[\mu + s \ln{\frac{y}{1-y}} \quad\text{ for } 0 < y < 1\]
<p>by letting \(\mu\) and \(s\) depend on \(y\) instead of being constant.</p>
<p>As Keelin writes, given a systematically increasing \(s\) as one moves from left to right, a right skewed distribution would result. And a systematically decreasing \(\mu\) as one moves from left to right would make the distribution spikier in the middle with correspondingly heavier tails.</p>
<p>By modifying \(s\) and \(\mu\) in ever more complex ways we can make the metalog take on almost any shape. In particular, in most cases the metalog CDF passes through all the provided quantiles exactly<sup id="fnref:feasibility" role="doc-noteref"><a href="#fn:feasibility" class="footnote" rel="footnote">2</a></sup>. Moreover, we can specify the metalog to be unbounded, to have arbitrary bounds, or to be semi-bounded above or below.</p>
<p>Instead of thinking about which of several highly constraining distribution families to use, just choose the metalog and let your quantiles speak for themselves. As Keelin says:</p>
<blockquote>
<p>one needs a distribution that has flexibility far beyond that of traditional distributions – one that enables “the data to speak for itself” in contrast to imposing unexamined and possibly inappropriate shape constraints on that data.</p>
</blockquote>
<p>For example, we can fit an unbounded metalog to the same quantiles as above:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">family</span> <span class="o">=</span> <span class="s">'metalog'</span>
<span class="n">quantiles</span> <span class="o">=</span> <span class="p">[(</span><span class="mf">0.1</span><span class="p">,</span><span class="mi">50</span><span class="p">),(</span><span class="mf">0.5</span><span class="p">,</span><span class="mi">70</span><span class="p">),(</span><span class="mf">0.75</span><span class="p">,</span><span class="mi">100</span><span class="p">),(</span><span class="mf">0.9</span><span class="p">,</span><span class="mi">150</span><span class="p">)]</span>
<span class="n">metalog_leftbound</span> <span class="o">=</span> <span class="bp">None</span>
<span class="n">metalog_rightbound</span> <span class="o">=</span> <span class="bp">None</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Meta-logistic distribution
quantiles:
0.01 11.968367580205552
0.1 50.000000000008185
0.25 58.750000000005215
0.5 70.0
0.75 100.00000000000519
0.9 150.00000000002515
0.99 281.7443263650518
</code></pre></div></div>
<p><img src="../assets/images/quantiles/metalog.png" width="70%" /></p>
<p>The metalog’s actual parameters (as opposed to the user-supplied quantiles) have no simple interpretation and are of no use unless the next piece of software you’re going to use knows what a metalog is. Therefore the program doesn’t return the parameters. Instead, if we want to manipulate this distribution, we can use the expressions of the PDF and CDF that the software provides, or alternatively export a large number of samples into another tool that accepts distributions described by a list of samples (such as the Monte Carlo simulation tool <a href="https://getguesstimate.com">Guesstimate</a>). By default, 5000 samples will be printed; you can copy and paste them.</p>
<h1 id="approaches-to-elicitation">Approaches to elicitation</h1>
<p>How does this tool compare to other approaches for creating subjective belief distributions? Here are the strategies I’ve seen.</p>
<h2 id="belief-intervals">Belief intervals</h2>
<p>The first approach is to provide a belief interval that is mapped to some fixed quantiles, e.g. a 90% belief interval (between the 0.05 and 0.95 quantile) like on <a href="http://getguesstimate.com">Guesstimate</a>. <a href="http://metaculus.com">Metaculus</a> provides a graphical way to input the same data, allowing the user to drag the quantiles across a line under a graph of the PDF. This is the simplest and most user-friendly approach. The tool I built incorporates the belief interval approach while going beyond it in two ways. First, you can provide completely arbitrary quantiles, instead of specifically the 0.05 and 0.95 – or some other belief interval symmetric around 0.5. Second, you can provide more than two quantiles, which allows the user to query intuitive information about more parts of the distribution.</p>
<p><img src="/assets/images/quantiles/guesstimate.png" /><br />
<em>Guesstimate</em></p>
<p><img src="/assets/images/quantiles/metaculus.png" width="70%" /><br />
<em>Metaculus</em></p>
<h2 id="drawing">Drawing</h2>
<p>Another option is to draw the PDF on a canvas, in free form, using your mouse. This is the very innovative approach of <a href="http://probability.dev">probability.dev</a>.<sup id="fnref:canvas" role="doc-noteref"><a href="#fn:canvas" class="footnote" rel="footnote">3</a></sup></p>
<p><img src="/assets/images/quantiles/dev.png" /><br />
<em>probability.dev</em></p>
<h2 id="oughts-elicit">Ought’s elicit</h2>
<p>Ought’s <a href="https://elicit.ought.org">elicit</a> lets you provide quantiles like my tool, or equivalently bins with some probability mass in each bin<sup id="fnref:b" role="doc-noteref"><a href="#fn:b" class="footnote" rel="footnote">4</a></sup>. The resulting distribution is by default piecewise uniform (the cdf is piecewise linear), but it’s possible to apply smoothing. It has all the features I want, the drawback is that it only supports bounded distributions<sup id="fnref:elicit" role="doc-noteref"><a href="#fn:elicit" class="footnote" rel="footnote">5</a></sup>.</p>
<p><img src="/assets/images/quantiles/ought.png" /><br />
<em>Elicit</em></p>
<h2 id="mixtures">Mixtures</h2>
<p>A meta-level approach that can be applied to any of the above is to allow the user to specify a mixture distribution, a weighted average of distributions. For example, 1/3 weight on a <code class="language-plaintext highlighter-rouge">normal(5,5)</code> and 2/3 weight on a <code class="language-plaintext highlighter-rouge">lognormal(1,0.75)</code>. My opinion on mixtures is that they are good if the user is thinking about the event disjunctively; for example, she may be envisioning two possible scenarios, each of which she has a distribution in mind for. But on Metaculus and Foretold my impression is that mixtures are often used to indirectly achieve a single distribution whose rough shape the user had in mind originally.</p>
<h1 id="the-future">The future</h1>
<p>This is an exciting space with many recent developments. Guesstimate, Metaculus, Elicit and the metalog distribution have all been created in the last 5 years.</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:computational" role="doc-endnote">
<p>For the quantile function expression, see <a href="http://www.metalogdistributions.com/images/quantiles/The_Metalog_Distributions_-_Keelin_2016.pdf">Keelin 2016</a>, definition 1. The fact that this is in closed form means, first, that sampling randomly from the distribution is computationally trivial. We can use the inverse transform method: we take random samples from a uniform distribution over \([0,1]\) and plug them into the quantile function. Second, plotting the CDF for a certain range of probabilities (e.g. from 1% to 99%) is also easy.</p>
<p>The expression for the PDF is unusual in that it is a function of the cumulative probability \(p \in (0,1)\), instead of a function of values of the random variable. See <a href="http://www.metalogdistributions.com/images/quantiles/The_Metalog_Distributions_-_Keelin_2016.pdf">Keelin 2016</a>, definition 2. As Keelin explains (p. 254), to plot the PDF as is customary we can use the quantile function \(q(p)\) on the horizontal axis and the PDF expression \(f(p)\) on the vertical axis, and vary \(p\) in, for example, \([0.01,0.99]\) to produce the corresponding values on both axes.</p>
<p>Hence, for (i) querying the quantile function of the fitted metalog, sampling, and plotting the CDF, and (ii) plotting the PDF, everything can be done in closed form.</p>
<p>To query the CDF, however, numerical equation solving is applied. Since the quantile function is differentiable, Newton’s method can be applied and is fast. (Numerical equation solving is also used to query the PDF as a function of values of the random variable – but I don’t see why one would need densities except for plotting.) <a href="#fnref:computational" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:feasibility" role="doc-endnote">
<p>In most cases, there exists a metalog whose CDF passes through all the provided quantiles exactly. In that case, there exists an expression of the metalog parameters that is in closed form as a function of the quantiles (“\(a = Y^{−1}x\)”, <a href="http://www.metalogdistributions.com/images/quantiles/The_Metalog_Distributions_-_Keelin_2016.pdf">Keelin 2016</a>, p. 253. Keelin denotes the metalog parameters \(a\), the matrix \(Y\) is a simple function of the quantiles’ y-coordinates, and the vector \(x\) contains the quantiles’ x-coordinates. The metalog parameters \(a\) are the numbers that are used to modify the logistic quantile function. This modification is done according to equation 6 on p. 254.)</p>
<p>If there is no metalog that fits the quantiles exactly (i.e. the expression for \(a\) above does not imply a valid probability distribution), we have to use optimization to find the feasible metalog that fits the quantiles most closely. In this software implementation, “most closely” is defined as minimizing the absolute differences between the quantiles and the CDF (see <a href="https://github.com/isaacfab/rmetalog/issues/13">here</a> for more discussion).</p>
<p>In my experience, if a small number of quantiles describing a PDF with sharp peaks are provided, the closest feasible metalog fit to the quantiles may not pass through all the quantiles exactly. <a href="#fnref:feasibility" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:canvas" role="doc-endnote">
<p>Drawing the PDF instead of the CDF makes it difficult to hit quantiles. But drawing the CDF would probably be less intuitive – I often have the rough shape of the PDF in mind, but I never have intuitions about the rough shape of the CDF. The canvas-based approach also runs into difficulty with the tail of unbounded distributions. Overall I think it’s very cool but I haven’t found it that practical. <a href="#fnref:canvas" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:b" role="doc-endnote">
<p>To provide quantiles, simply leave the Min field empty – it defaults to the left bound of the distribution. <a href="#fnref:b" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:elicit" role="doc-endnote">
<p>I suspect this is a fundamental problem of the approach of starting with piecewise uniforms and adding smoothing. You need the tails of the CDF to asymptote towards 0 and 1, but it’s hard to find a mathematical function that does this while also (i) having the right probability mass under the tail (ii) stitching onto the piecewise uniforms in a natural way. I’d love to be proven wrong, though; the user interface and user experience on Elicit are really nice. (I’m aware that Elicit allows for ‘open-ended’ distributions, where probability mass can be assigned to an out-of-bounds outcome, but one cannot specify how that mass is distributed inside the out-of-bounds interval(s). So there is no true support for unbounded distributions. The ‘out-of-bounds’ feature exists because Elicit seems to be mainly intended as an add-on to Metaculus, which supports such ‘open-ended’ distributions but no truly unbounded ones.) <a href="#fnref:elicit" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>We often have intuitions about the probability distribution of a variable that we would like to translate into a formal specification of a distribution. Transforming our beliefs into a fully specified probability distribution allows us to further manipulate the distribution in useful ways.Debugging surprising behavior in SciPy numerical integration2020-07-01T00:00:00+00:002020-07-01T00:00:00+00:00https://fragile-credences.github.io/bayes<style>
.highlight{
width: 125%;
max-width: 95vw;
min-width: 100%;
position: absolute;
left: 50%;
transform: translateX(-50%);
position: relative;
}
</style>
<p>I wrote a <a href="https://github.com/tmkadamcz/bayes-continuous">Python app</a> to apply Bayes’ rule to continuous distributions. It looks like this:</p>
<div style="border: 1px solid grey"> <a href="http://bayesupdate.com/"><img src="/assets/images/bayes.png" /></a></div>
<p><em>Screenshot</em></p>
<p>I’m learning a lot about <a href="https://en.wikipedia.org/wiki/Numerical_analysis">numerical analysis</a> from this project. The basic idea is simple:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">unnormalized_posterior_pdf</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
<span class="k">return</span> <span class="n">prior</span><span class="p">.</span><span class="n">pdf</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="o">*</span><span class="n">likelihood</span><span class="p">.</span><span class="n">pdf</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="c1"># integrate unnormalized_posterior_pdf over the reals
</span><span class="n">normalization_constant</span> <span class="o">=</span> <span class="n">integrate</span><span class="p">.</span><span class="n">quad</span><span class="p">(</span><span class="n">unnormalized_posterior_pdf</span><span class="p">,</span><span class="o">-</span><span class="n">np</span><span class="p">.</span><span class="n">inf</span><span class="p">,</span><span class="n">np</span><span class="p">.</span><span class="n">inf</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">posterior_pdf</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
<span class="k">return</span> <span class="n">unnormalized_posterior_pdf</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="o">/</span><span class="n">normalization_constant</span>
</code></pre></div></div>
<p>However, when testing my code on complicated distributions, I ran into some interesting puzzles.</p>
<p>A first set of problems was caused by the SciPy numerical integration routines that my program relies on. They were sometimes returning incorrect results or <code class="language-plaintext highlighter-rouge">RuntimeErorr</code>s. These problems appeared when the integration routines had to deal with ‘extreme’ values: small normalization constants or large inputs into the cdf function. I eventually learned to hold the integration algorithm’s hand a little bit and show it where to go.</p>
<p>A second set of challenges had to do with how long my program took to run: sometimes 30 seconds to return the percentiles of the posterior distribution. While 30 seconds might be acceptable for someone who desperately needed that bayesian update, I didn’t want my tool to feel like a punch card mainframe. I eventually managed to make the program more than 10 times faster. The tricks I used all followed the same strategy. In order to make it less expensive to repeatedly evaluate the posterior’s cdf by numerical integration, I tried to find ways to make the interval to integrate narrower.</p>
<p>You can follow along with all the tests described in this post using <a href="/assets/files/bayesblog.py">this file</a>, whereas the code doing the calculations for the webapp is <a href="https://github.com/tmkadamcz/bayes-continuous/blob/master/bayes.py">here</a>.</p>
<h1 id="small-normalization-constants">Small normalization constants</h1>
<p><img src="/assets/images/bayes-far-apart.png" alt="Alt text" /></p>
<p>When the prior and likelihood are far apart, the unnormalized posterior takes tiny values.</p>
<p>It turns out that SciPy’s integration routine, <code class="language-plaintext highlighter-rouge">integrate.quad</code>, (incidentally, written in <a href="https://github.com/scipy/scipy/tree/master/scipy/integrate/quadpack">actual Fortran</a>!) has trouble integrating such a low-valued pdf.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">prior</span> <span class="o">=</span> <span class="n">stats</span><span class="p">.</span><span class="n">lognorm</span><span class="p">(</span><span class="n">s</span><span class="o">=</span><span class="p">.</span><span class="mi">5</span><span class="p">,</span><span class="n">scale</span><span class="o">=</span><span class="n">math</span><span class="p">.</span><span class="n">exp</span><span class="p">(.</span><span class="mi">5</span><span class="p">))</span> <span class="c1"># a lognormal(.5,.5) in SciPy notation
</span><span class="n">likelihood</span> <span class="o">=</span> <span class="n">stats</span><span class="p">.</span><span class="n">norm</span><span class="p">(</span><span class="mi">20</span><span class="p">,</span><span class="mi">1</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">Posterior_scipyrv</span><span class="p">(</span><span class="n">stats</span><span class="p">.</span><span class="n">rv_continuous</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span><span class="n">d1</span><span class="p">,</span><span class="n">d2</span><span class="p">):</span>
<span class="nb">super</span><span class="p">(</span><span class="n">Posterior_scipyrv</span><span class="p">,</span> <span class="bp">self</span><span class="p">).</span><span class="n">__init__</span><span class="p">()</span>
<span class="bp">self</span><span class="p">.</span><span class="n">d1</span><span class="o">=</span> <span class="n">d1</span>
<span class="bp">self</span><span class="p">.</span><span class="n">d2</span><span class="o">=</span> <span class="n">d2</span>
<span class="bp">self</span><span class="p">.</span><span class="n">normalization_constant</span> <span class="o">=</span> <span class="n">integrate</span><span class="p">.</span><span class="n">quad</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">unnormalized_pdf</span><span class="p">,</span><span class="o">-</span><span class="n">np</span><span class="p">.</span><span class="n">inf</span><span class="p">,</span><span class="n">np</span><span class="p">.</span><span class="n">inf</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">unnormalized_pdf</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span><span class="n">x</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="p">.</span><span class="n">d1</span><span class="p">.</span><span class="n">pdf</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="o">*</span> <span class="bp">self</span><span class="p">.</span><span class="n">d2</span><span class="p">.</span><span class="n">pdf</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">_pdf</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span><span class="n">x</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="p">.</span><span class="n">unnormalized_pdf</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="o">/</span><span class="bp">self</span><span class="p">.</span><span class="n">normalization_constant</span>
<span class="n">posterior</span> <span class="o">=</span> <span class="n">Posterior_scipyrv</span><span class="p">(</span><span class="n">prior</span><span class="p">,</span><span class="n">likelihood</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">'normalization constant:'</span><span class="p">,</span><span class="n">posterior</span><span class="p">.</span><span class="n">normalization_constant</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">"CDF values:"</span><span class="p">)</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">30</span><span class="p">):</span>
<span class="k">print</span><span class="p">(</span><span class="n">i</span><span class="p">,</span><span class="n">posterior</span><span class="p">.</span><span class="n">cdf</span><span class="p">(</span><span class="n">i</span><span class="p">))</span>
</code></pre></div></div>
<p>The cdf converges to… 52,477. This is not so good.</p>
<p>Because the cdf does converge, but to an incorrect value, we can conclude that the normalization constant is to blame. Because the cdf converges to a number greater than 1, <code class="language-plaintext highlighter-rouge">posterior.normalization_constant</code>, about <code class="language-plaintext highlighter-rouge">3e-12</code>, is an underestimate of the true value.</p>
<p>If we shift the likelihood distribution just a little bit to the left, to <code class="language-plaintext highlighter-rouge">likelihood = stats.norm(18,1)</code>, the cdf converges correctly, and we get a normalization constant of about <code class="language-plaintext highlighter-rouge">6e-07</code>. Obviously, the normalization constant should not jump five orders of magnitude from <code class="language-plaintext highlighter-rouge">6e-07</code> to <code class="language-plaintext highlighter-rouge">3e-12</code> as a result of this small shift.</p>
<p>The program is not integrating the unnormalized pdf correctly.</p>
<p>Difficulties with integration usually have to do with the shape of the function. If your integrand zig-zags up and down a lot, the algorithm may miss some of the peaks. But here, the shape of the posterior is almost the same whether we use <code class="language-plaintext highlighter-rouge">stats.norm(18,1)</code> or <code class="language-plaintext highlighter-rouge">stats.norm(20,1)</code><sup id="fnref:extra" role="doc-noteref"><a href="#fn:extra" class="footnote" rel="footnote">1</a></sup>. So the problem really seems to occur once we are far enough in the tails of the prior that the unnormalized posterior pdf takes values below a certain absolute (rather than relative) threshold. I don’t yet understand why. Perhaps some of the values are becoming too small to be represented with standard floating point numbers.</p>
<p>This seems rather bizarre, but here’s a piece of evidence that really demonstrates that low absolute values are what’s tripping up the integration routine that calculates the normalization constant. We just multiply the unnormalized pdf by 10000 (which will cancel out once we normalize).</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">unnormalized_pdf</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span><span class="n">x</span><span class="p">):</span>
<span class="k">return</span> <span class="mi">10000</span><span class="o">*</span><span class="bp">self</span><span class="p">.</span><span class="n">d1</span><span class="p">.</span><span class="n">pdf</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="o">*</span> <span class="bp">self</span><span class="p">.</span><span class="n">d2</span><span class="p">.</span><span class="n">pdf</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
</code></pre></div></div>
<p>Now the cdf converges to 1 perfectly (??!).</p>
<h1 id="large-inputs-into-cdf">Large inputs into cdf</h1>
<p>We take a prior and likelihood that are unproblematically close together:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">prior</span> <span class="o">=</span> <span class="n">stats</span><span class="p">.</span><span class="n">lognorm</span><span class="p">(</span><span class="n">s</span><span class="o">=</span><span class="p">.</span><span class="mi">5</span><span class="p">,</span><span class="n">scale</span><span class="o">=</span><span class="n">math</span><span class="p">.</span><span class="n">exp</span><span class="p">(.</span><span class="mi">5</span><span class="p">))</span><span class="c1"># a lognormal(.5,.5) in SciPy notation
</span><span class="n">likelihood</span> <span class="o">=</span> <span class="n">stats</span><span class="p">.</span><span class="n">norm</span><span class="p">(</span><span class="mi">5</span><span class="p">,</span><span class="mi">1</span><span class="p">)</span>
<span class="n">posterior</span> <span class="o">=</span> <span class="n">Posterior_scipyrv</span><span class="p">(</span><span class="n">prior</span><span class="p">,</span><span class="n">likelihood</span><span class="p">)</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">100</span><span class="p">):</span>
<span class="k">print</span><span class="p">(</span><span class="n">i</span><span class="p">,</span><span class="n">posterior</span><span class="p">.</span><span class="n">cdf</span><span class="p">(</span><span class="n">i</span><span class="p">))</span>
</code></pre></div></div>
<p>At first, the cdf goes to 1 as expected, but suddenly all hell breaks loose and the cdf <em>decreases</em> to some very tiny values:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>22 1.0000000000031484
23 1.0000000000095246
24 1.0000000000031442
25 2.4520867144186445e-09
26 2.7186998869943613e-12
27 1.1495658559228458e-15
</code></pre></div></div>
<p>What’s going on? When asked to integrate the pdf from minus infinity up to some large value like 25, <code class="language-plaintext highlighter-rouge">quad</code> doesn’t know where to look for the probability mass. When the upper bound of the integral is in an area with still enough probability mass, like 23 or 24 in this example, <code class="language-plaintext highlighter-rouge">quad</code> finds its way to the mass. But if you ask it to find a peak very far away, it fails.</p>
<p>A piece of confirmatory evidence is that if we make the peak spikier and harder to find, by setting the likelihood’s standard deviation to 0.5 instead of 1, the cdf fails earlier:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>22 1.000000000000232
23 2.9116983489798973e-12
</code></pre></div></div>
<p>We need to hold the integration algorithm’s hand and show it where on the real line the peak of the distribution is located. In SciPy’s <a href="https://docs.scipy.org/doc/scipy/reference/generated/scipy.integrate.quad.html">quad</a>, you can supply the <code class="language-plaintext highlighter-rouge">points</code> argument to point out places ‘where local difficulties of the integrand may occur’, but only when the integration interval is finite. The solution I came up with is to split the interval into two halves.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">split_integral</span><span class="p">(</span><span class="n">f</span><span class="p">,</span><span class="n">splitpoint</span><span class="p">,</span><span class="n">integrate_to</span><span class="p">):</span>
<span class="n">a</span><span class="p">,</span><span class="n">b</span> <span class="o">=</span> <span class="o">-</span><span class="n">np</span><span class="p">.</span><span class="n">inf</span><span class="p">,</span><span class="n">np</span><span class="p">.</span><span class="n">inf</span>
<span class="k">if</span> <span class="n">integrate_to</span> <span class="o"><</span> <span class="n">splitpoint</span><span class="p">:</span>
<span class="c1"># just return the integral normally
</span> <span class="k">return</span> <span class="n">integrate</span><span class="p">.</span><span class="n">quad</span><span class="p">(</span><span class="n">f</span><span class="p">,</span><span class="n">a</span><span class="p">,</span><span class="n">integrate_to</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">integral_left</span> <span class="o">=</span> <span class="n">integrate</span><span class="p">.</span><span class="n">quad</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="n">a</span><span class="p">,</span> <span class="n">splitpoint</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span>
<span class="n">integral_right</span> <span class="o">=</span> <span class="n">integrate</span><span class="p">.</span><span class="n">quad</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="n">splitpoint</span><span class="p">,</span> <span class="n">integrate_to</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span>
<span class="k">return</span> <span class="n">integral_left</span> <span class="o">+</span> <span class="n">integral_right</span>
</code></pre></div></div>
<p>This definitely won’t work for every difficult integral, but should help for many cases where most of the probability mass is not too far from the <code class="language-plaintext highlighter-rouge">splitpoint</code>.</p>
<p>For <code class="language-plaintext highlighter-rouge">splitpoint</code>, a simple choice is the average of the prior and likelihood’s expected values.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Posterior_scipyrv</span><span class="p">(</span><span class="n">stats</span><span class="p">.</span><span class="n">rv_continuous</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span><span class="n">d1</span><span class="p">,</span><span class="n">d2</span><span class="p">):</span>
<span class="bp">self</span><span class="p">.</span><span class="n">splitpoint</span> <span class="o">=</span> <span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">d1</span><span class="p">.</span><span class="n">expect</span><span class="p">()</span><span class="o">+</span><span class="bp">self</span><span class="p">.</span><span class="n">d2</span><span class="p">.</span><span class="n">expect</span><span class="p">())</span><span class="o">/</span><span class="mi">2</span>
</code></pre></div></div>
<p>We can now override the built-in <code class="language-plaintext highlighter-rouge">cdf</code> method, and specify our own method that uses <code class="language-plaintext highlighter-rouge">split_integral</code>:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Posterior_scipyrv</span><span class="p">(</span><span class="n">stats</span><span class="p">.</span><span class="n">rv_continuous</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">_cdf</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span><span class="n">x</span><span class="p">):</span>
<span class="k">return</span> <span class="n">split_integral</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">pdf</span><span class="p">,</span><span class="bp">self</span><span class="p">.</span><span class="n">splitpoint</span><span class="p">,</span><span class="n">x</span><span class="p">)</span>
</code></pre></div></div>
<p>Now things run correctly:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>22 1.0000000000000198
23 1.0000000000000198
24 1.0000000000000198
25 1.00000000000002
26 1.0000000000000202
...
98 1.0000000000000198
99 1.0000000000000193
</code></pre></div></div>
<h1 id="defining-support-of-posterior">Defining support of posterior</h1>
<p>So far I’ve only talked about problems that cause the program to return the wrong answer. This section is about a problem that only causes inefficiency, at least when it isn’t combined with other problems.</p>
<p>If you don’t specify the support of a continuous random variable in SciPy, it defaults to the entire real line. This leads to inefficiency when querying quantiles of the distribution. If I want to know the 50th percentile of my distribution, I call <code class="language-plaintext highlighter-rouge">ppf(0.5)</code>. As I described <a href="/sampling">previously</a>, <code class="language-plaintext highlighter-rouge">ppf</code> works by numerically solving the equation \(cdf(x)=0.5\). The <code class="language-plaintext highlighter-rouge">ppf</code> method automatically passes the support of the distribution into the equation solver and tells it to only look for solutions inside the support. When a distribution’s support is a subset of the reals, searching over the entire reals is inefficient.</p>
<p>To remedy this, we can define the support of the posterior as the intersection of the prior and likelihood’s support. For this we need a small function that calculates the intersection of two intervals.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">intersect_intervals</span><span class="p">(</span><span class="n">two_tuples</span><span class="p">):</span>
<span class="n">d1</span> <span class="p">,</span> <span class="n">d2</span> <span class="o">=</span> <span class="n">two_tuples</span>
<span class="n">d1_left</span><span class="p">,</span><span class="n">d1_right</span> <span class="o">=</span> <span class="n">d1</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span><span class="n">d1</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
<span class="n">d2_left</span><span class="p">,</span><span class="n">d2_right</span> <span class="o">=</span> <span class="n">d2</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span><span class="n">d2</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
<span class="k">if</span> <span class="n">d1_right</span> <span class="o"><</span> <span class="n">d2_left</span> <span class="ow">or</span> <span class="n">d2_right</span> <span class="o"><</span> <span class="n">d2_left</span><span class="p">:</span>
<span class="k">raise</span> <span class="nb">ValueError</span><span class="p">(</span><span class="s">"the distributions have no overlap"</span><span class="p">)</span>
<span class="n">intersect_left</span><span class="p">,</span><span class="n">intersect_right</span> <span class="o">=</span> <span class="nb">max</span><span class="p">(</span><span class="n">d1_left</span><span class="p">,</span><span class="n">d2_left</span><span class="p">),</span><span class="nb">min</span><span class="p">(</span><span class="n">d1_right</span><span class="p">,</span><span class="n">d2_right</span><span class="p">)</span>
<span class="k">return</span> <span class="n">intersect_left</span><span class="p">,</span><span class="n">intersect_right</span>
</code></pre></div></div>
<p>We can then call this function:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Posterior_scipyrv</span><span class="p">(</span><span class="n">stats</span><span class="p">.</span><span class="n">rv_continuous</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span><span class="n">d1</span><span class="p">,</span><span class="n">d2</span><span class="p">):</span>
<span class="nb">super</span><span class="p">(</span><span class="n">Posterior_scipyrv</span><span class="p">,</span> <span class="bp">self</span><span class="p">).</span><span class="n">__init__</span><span class="p">()</span>
<span class="n">a1</span><span class="p">,</span> <span class="n">b1</span> <span class="o">=</span> <span class="n">d1</span><span class="p">.</span><span class="n">support</span><span class="p">()</span>
<span class="n">a2</span><span class="p">,</span> <span class="n">b2</span> <span class="o">=</span> <span class="n">d2</span><span class="p">.</span><span class="n">support</span><span class="p">()</span>
<span class="c1"># 'a' and 'b' are scipy's names for the bounds of the support
</span> <span class="bp">self</span><span class="p">.</span><span class="n">a</span> <span class="p">,</span> <span class="bp">self</span><span class="p">.</span><span class="n">b</span> <span class="o">=</span> <span class="n">intersect_intervals</span><span class="p">([(</span><span class="n">a1</span><span class="p">,</span><span class="n">b1</span><span class="p">),(</span><span class="n">a2</span><span class="p">,</span><span class="n">b2</span><span class="p">)])</span>
</code></pre></div></div>
<p>To test this, let’s use a beta distribution, which is defined on \([0,1]\):</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>prior = stats.beta(1,1)
likelihood = stats.norm(1,3)
</code></pre></div></div>
<p>We know that the posterior will also be defined on \([0,1]\). By defining the support of the posterior inside the the <code class="language-plaintext highlighter-rouge">__init__</code> method of <code class="language-plaintext highlighter-rouge">Posterior_scipyrv</code>, we give SciPy access to this information.</p>
<p>We can time the resulting speedup in calculating <code class="language-plaintext highlighter-rouge">posterior.ppf(0.99)</code>:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">print</span><span class="p">(</span><span class="s">"support:"</span><span class="p">,</span><span class="n">posterior</span><span class="p">.</span><span class="n">support</span><span class="p">())</span>
<span class="n">s</span> <span class="o">=</span> <span class="n">time</span><span class="p">.</span><span class="n">time</span><span class="p">()</span>
<span class="k">print</span><span class="p">(</span><span class="s">"result:"</span><span class="p">,</span><span class="n">posterior</span><span class="p">.</span><span class="n">ppf</span><span class="p">(</span><span class="mf">0.99</span><span class="p">))</span>
<span class="n">e</span> <span class="o">=</span> <span class="n">time</span><span class="p">.</span><span class="n">time</span><span class="p">()</span>
<span class="k">print</span><span class="p">(</span><span class="n">e</span><span class="o">-</span><span class="n">s</span><span class="p">,</span><span class="s">'seconds to evalute ppf'</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>support: (-inf, inf)
result: 0.9901821216897447
3.8804399967193604 seconds to evalute ppf
support: (0.0, 1.0)
result: 0.9901821216904315
0.40013647079467773 seconds to evalute ppf
</code></pre></div></div>
<p>We’re able to achieve an almost 10x speedup, with very meaningful impact on user experience. For less extreme quantiles, like <code class="language-plaintext highlighter-rouge">posterior.ppf(0.5)</code>, I still get a 2x speedup.</p>
<p>The lack of properly defined support causes only inefficiency if we continue to use <code class="language-plaintext highlighter-rouge">split_integral</code> to calculate the cdf. But if we leave the cdf problem unaddressed, it can combine with the too-wide support to produce outright errors.</p>
<p>For example, suppose we use a beta distribution again for the prior, but we don’t use the split integral for the cdf, and nor do we define the support of the posterior as \([0,1]\) instead of \({\rm I\!R}\).</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">prior</span> <span class="o">=</span> <span class="n">stats</span><span class="p">.</span><span class="n">beta</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">)</span>
<span class="n">likelihood</span> <span class="o">=</span> <span class="n">stats</span><span class="p">.</span><span class="n">norm</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="mi">3</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">Posterior_scipyrv</span><span class="p">(</span><span class="n">stats</span><span class="p">.</span><span class="n">rv_continuous</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span><span class="n">d1</span><span class="p">,</span><span class="n">d2</span><span class="p">):</span>
<span class="nb">super</span><span class="p">(</span><span class="n">Posterior_scipyrv</span><span class="p">,</span> <span class="bp">self</span><span class="p">).</span><span class="n">__init__</span><span class="p">()</span>
<span class="bp">self</span><span class="p">.</span><span class="n">d1</span><span class="o">=</span> <span class="n">d1</span>
<span class="bp">self</span><span class="p">.</span><span class="n">d2</span><span class="o">=</span> <span class="n">d2</span>
<span class="bp">self</span><span class="p">.</span><span class="n">normalization_constant</span> <span class="o">=</span> <span class="n">integrate</span><span class="p">.</span><span class="n">quad</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">unnormalized_pdf</span><span class="p">,</span><span class="o">-</span><span class="n">np</span><span class="p">.</span><span class="n">inf</span><span class="p">,</span><span class="n">np</span><span class="p">.</span><span class="n">inf</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">unnormalized_pdf</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span><span class="n">x</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="p">.</span><span class="n">d1</span><span class="p">.</span><span class="n">pdf</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="o">*</span> <span class="bp">self</span><span class="p">.</span><span class="n">d2</span><span class="p">.</span><span class="n">pdf</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">_pdf</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span><span class="n">x</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="p">.</span><span class="n">unnormalized_pdf</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="o">/</span><span class="bp">self</span><span class="p">.</span><span class="n">normalization_constant</span>
<span class="n">posterior</span> <span class="o">=</span> <span class="n">Posterior_scipyrv</span><span class="p">(</span><span class="n">prior</span><span class="p">,</span><span class="n">likelihood</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">"cdf values:"</span><span class="p">)</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">20</span><span class="p">):</span>
<span class="k">print</span><span class="p">(</span><span class="n">i</span><span class="o">/</span><span class="mi">5</span><span class="p">,</span><span class="n">posterior</span><span class="p">.</span><span class="n">cdf</span><span class="p">(</span><span class="n">i</span><span class="o">/</span><span class="mi">5</span><span class="p">))</span>
</code></pre></div></div>
<p>The cdf fails quickly now:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>3.2 0.9999999999850296
3.4 0.0
3.6 0.0
</code></pre></div></div>
<p>When the integration algorithm is looking over all of \((-\infty,3.4]\), it has no way of knowing that all the probability mass is in \([0,1]\). The posterior distribution has only one big bump in the middle, so it’s not surprising that the algorithm misses it.</p>
<p>If we now ask the equation solver in <code class="language-plaintext highlighter-rouge">ppf</code> to find quantiles, without telling it that all the solutions are in \([0,1]\), it will try to evaluate points like <code class="language-plaintext highlighter-rouge">cdf(4)</code>, which return 0 – but <code class="language-plaintext highlighter-rouge">ppf</code> is assuming that the cdf is increasing. This leads to catastrophe. Running <code class="language-plaintext highlighter-rouge">posterior.ppf(0.5)</code> gives a <code class="language-plaintext highlighter-rouge">RuntimeError: Failed to converge after 100 iterations</code>. At first I wondered why beta distributions would always give me <code class="language-plaintext highlighter-rouge">RuntimeError</code>s…</p>
<h1 id="optimization-cdf-memoization">Optimization: CDF memoization</h1>
<p>When we call <code class="language-plaintext highlighter-rouge">ppf</code>, the equation solver calls <code class="language-plaintext highlighter-rouge">cdf</code> for the same distribution many times. This suggests we could optimize things further by storing known cdf values, and only doing the integration from the closest known value to the desired value. This will result in the same number of integration calls, but each will be over a smaller interval (except the first). This is a form of memoization.</p>
<p>We can also squeeze out some additional speedup by considering the cdf to be 1 forevermore once it reaches values close to 1.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Posterior_scipyrv</span><span class="p">(</span><span class="n">stats</span><span class="p">.</span><span class="n">rv_continuous</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">_cdf</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span><span class="n">x</span><span class="p">):</span>
<span class="c1"># exploit considering the cdf to be 1
</span> <span class="c1"># forevermore once it reaches values close to 1
</span> <span class="k">for</span> <span class="n">x_lookup</span> <span class="ow">in</span> <span class="bp">self</span><span class="p">.</span><span class="n">cdf_lookup</span><span class="p">:</span>
<span class="k">if</span> <span class="n">x_lookup</span> <span class="o"><</span> <span class="n">x</span> <span class="ow">and</span> <span class="n">np</span><span class="p">.</span><span class="n">around</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">cdf_lookup</span><span class="p">[</span><span class="n">x_lookup</span><span class="p">],</span><span class="mi">5</span><span class="p">)</span><span class="o">==</span><span class="mf">1.0</span><span class="p">:</span>
<span class="k">return</span> <span class="mi">1</span>
<span class="c1"># check lookup table for largest integral already computed below x
</span> <span class="n">sortedkeys</span> <span class="o">=</span> <span class="nb">sorted</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">cdf_lookup</span> <span class="p">,</span><span class="n">reverse</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="k">for</span> <span class="n">key</span> <span class="ow">in</span> <span class="n">sortedkeys</span><span class="p">:</span>
<span class="c1">#find the greatest key less than x
</span> <span class="k">if</span> <span class="n">key</span><span class="o"><</span><span class="n">x</span><span class="p">:</span>
<span class="n">ret</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">cdf_lookup</span><span class="p">[</span><span class="n">key</span><span class="p">]</span><span class="o">+</span><span class="n">integrate</span><span class="p">.</span><span class="n">quad</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">pdf</span><span class="p">,</span><span class="n">key</span><span class="p">,</span><span class="n">x</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span>
<span class="bp">self</span><span class="p">.</span><span class="n">cdf_lookup</span><span class="p">[</span><span class="nb">float</span><span class="p">(</span><span class="n">x</span><span class="p">)]</span> <span class="o">=</span> <span class="n">ret</span>
<span class="k">return</span> <span class="n">ret</span>
<span class="c1"># Initial run
</span> <span class="n">ret</span> <span class="o">=</span> <span class="n">split_integral</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">pdf</span><span class="p">,</span><span class="bp">self</span><span class="p">.</span><span class="n">splitpoint</span><span class="p">,</span><span class="n">x</span><span class="p">)</span>
<span class="bp">self</span><span class="p">.</span><span class="n">cdf_lookup</span><span class="p">[</span><span class="nb">float</span><span class="p">(</span><span class="n">x</span><span class="p">)]</span> <span class="o">=</span> <span class="n">ret</span>
<span class="k">return</span> <span class="n">ret</span>
</code></pre></div></div>
<p>If we return to our earlier prior and likelihood</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">prior</span> <span class="o">=</span> <span class="n">stats</span><span class="p">.</span><span class="n">lognorm</span><span class="p">(</span><span class="n">s</span><span class="o">=</span><span class="p">.</span><span class="mi">5</span><span class="p">,</span><span class="n">scale</span><span class="o">=</span><span class="n">math</span><span class="p">.</span><span class="n">exp</span><span class="p">(.</span><span class="mi">5</span><span class="p">))</span> <span class="c1"># a lognormal(.5,.5) in SciPy notation
</span><span class="n">likelihood</span> <span class="o">=</span> <span class="n">stats</span><span class="p">.</span><span class="n">norm</span><span class="p">(</span><span class="mi">5</span><span class="p">,</span><span class="mi">1</span><span class="p">)</span>
</code></pre></div></div>
<p>and make calls to <code class="language-plaintext highlighter-rouge">ppf([0.1, 0.9, 0.25, 0.75, 0.5])</code>, the memoization gives us about a 5x speedup:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>memoization False
[2.63571613 5.18538207 3.21825988 4.56703016 3.88645864]
length of lookup table: 0
2.1609253883361816 seconds to evalute ppf
memoization True
[2.63571613 5.18538207 3.21825988 4.56703016 3.88645864]
length of lookup table: 50
0.4501194953918457 seconds to evalute ppf
</code></pre></div></div>
<p>These speed gains again occur over a range that makes quite a difference to user experience: going from multiple seconds to a fraction of a second.</p>
<h1 id="optimization-ppf-with-bounds">Optimization: ppf with bounds</h1>
<p>In my <a href="https://github.com/tmkadamcz/bayes-continuous">webapp</a>, I give the user some standard percentiles: 0.1, 0.25, 0.5, 0.75, 0.9.</p>
<p>Given that <code class="language-plaintext highlighter-rouge">ppf</code> works by numerical equation solving on the cdf, if we give the solver a smaller domain in which to look for the solutions, it should find them more quickly. When we calculate multiple percentiles, each percentile we calculate helps us close in on the others. If the 0.1 percentile is 12, we have a lower bound of 12 for on any percentile \(p>0.1\). If we have already calculated a percentile on each side, we have both a lower and upper bound.</p>
<p>We can’t directly pass the bounds to <code class="language-plaintext highlighter-rouge">ppf</code>, so we have to wrap the method, which is found <a href="https://github.com/scipy/scipy/blob/4c0fd79391e3b2ec2738bf85bb5dab366dcd12e4/scipy/stats/_distn_infrastructure.py#L1681-L1699">here in the source code</a>. (To help us focus, I give a simplified presentation below that cuts out some code designed to deal with unbounded supports. The code below will not run correctly).</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Posterior_scipyrv</span><span class="p">(</span><span class="n">stats</span><span class="p">.</span><span class="n">rv_continuous</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">ppf_with_bounds</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">q</span><span class="p">,</span> <span class="n">leftbound</span><span class="p">,</span> <span class="n">rightbound</span><span class="p">):</span>
<span class="n">left</span><span class="p">,</span> <span class="n">right</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">_get_support</span><span class="p">()</span>
<span class="c1"># SciPy ppf code to deal with case where left or right are infinite.
</span> <span class="c1"># Omitted for simplicity.
</span>
<span class="k">if</span> <span class="n">leftbound</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span><span class="p">:</span>
<span class="n">left</span> <span class="o">=</span> <span class="n">leftbound</span>
<span class="k">if</span> <span class="n">rightbound</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span><span class="p">:</span>
<span class="n">right</span> <span class="o">=</span> <span class="n">rightbound</span>
<span class="c1"># brentq is the equation solver (from Brent 1973)
</span> <span class="c1"># _ppf_to_solve is simply cdf(x)-q, since brentq
</span> <span class="c1"># finds points where a function equals 0
</span> <span class="k">return</span> <span class="n">optimize</span><span class="p">.</span><span class="n">brentq</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">_ppf_to_solve</span><span class="p">,</span><span class="n">left</span><span class="p">,</span> <span class="n">right</span><span class="p">,</span> <span class="n">args</span><span class="o">=</span><span class="n">q</span><span class="p">)</span>
</code></pre></div></div>
<p>To get some bounds, we run the extreme percentiles first, narrowing in on the middle percentiles from both sides. For example in <code class="language-plaintext highlighter-rouge">0.1, 0.25, 0.5, 0.75, 0.9</code>, we want to evaluate them in this order: <code class="language-plaintext highlighter-rouge">0.1, 0.9, 0.25, 0.75, 0.5</code>. We store each of the answers in <code class="language-plaintext highlighter-rouge">result</code>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Posterior_scipyrv</span><span class="p">(</span><span class="n">stats</span><span class="p">.</span><span class="n">rv_continuous</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">compute_percentiles</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">percentiles_list</span><span class="p">):</span>
<span class="n">result</span> <span class="o">=</span> <span class="p">{}</span>
<span class="n">percentiles_list</span><span class="p">.</span><span class="n">sort</span><span class="p">()</span>
<span class="c1"># put percentiles in the order they should be computed
</span> <span class="n">percentiles_reordered</span> <span class="o">=</span> <span class="nb">sum</span><span class="p">(</span><span class="nb">zip</span><span class="p">(</span><span class="n">percentiles_list</span><span class="p">,</span><span class="nb">reversed</span><span class="p">(</span><span class="n">percentiles_list</span><span class="p">)),</span> <span class="p">())[:</span><span class="nb">len</span><span class="p">(</span><span class="n">percentiles_list</span><span class="p">)]</span> <span class="c1"># see https://stackoverflow.com/a/17436999/8010877
</span>
<span class="k">def</span> <span class="nf">get_bounds</span><span class="p">(</span><span class="nb">dict</span><span class="p">,</span> <span class="n">p</span><span class="p">):</span>
<span class="c1"># get bounds (if any) from already computed `result`s
</span> <span class="n">keys</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="nb">dict</span><span class="p">.</span><span class="n">keys</span><span class="p">())</span>
<span class="n">keys</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">p</span><span class="p">)</span>
<span class="n">keys</span><span class="p">.</span><span class="n">sort</span><span class="p">()</span>
<span class="n">i</span> <span class="o">=</span> <span class="n">keys</span><span class="p">.</span><span class="n">index</span><span class="p">(</span><span class="n">p</span><span class="p">)</span>
<span class="k">if</span> <span class="n">i</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">:</span>
<span class="n">leftbound</span> <span class="o">=</span> <span class="nb">dict</span><span class="p">[</span><span class="n">keys</span><span class="p">[</span><span class="n">i</span> <span class="o">-</span> <span class="mi">1</span><span class="p">]]</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">leftbound</span> <span class="o">=</span> <span class="bp">None</span>
<span class="k">if</span> <span class="n">i</span> <span class="o">!=</span> <span class="nb">len</span><span class="p">(</span><span class="n">keys</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">:</span>
<span class="n">rightbound</span> <span class="o">=</span> <span class="nb">dict</span><span class="p">[</span><span class="n">keys</span><span class="p">[</span><span class="n">i</span> <span class="o">+</span> <span class="mi">1</span><span class="p">]]</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">rightbound</span> <span class="o">=</span> <span class="bp">None</span>
<span class="k">return</span> <span class="n">leftbound</span><span class="p">,</span> <span class="n">rightbound</span>
<span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="n">percentiles_reordered</span><span class="p">:</span>
<span class="n">leftbound</span> <span class="p">,</span> <span class="n">rightbound</span> <span class="o">=</span> <span class="n">get_bounds</span><span class="p">(</span><span class="n">result</span><span class="p">,</span><span class="n">p</span><span class="p">)</span>
<span class="n">res</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">ppf_with_bounds</span><span class="p">(</span><span class="n">p</span><span class="p">,</span><span class="n">leftbound</span><span class="p">,</span><span class="n">rightbound</span><span class="p">)</span>
<span class="n">result</span><span class="p">[</span><span class="n">p</span><span class="p">]</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">around</span><span class="p">(</span><span class="n">res</span><span class="p">,</span><span class="mi">2</span><span class="p">)</span>
<span class="n">sorted_result</span> <span class="o">=</span> <span class="p">{</span><span class="n">key</span><span class="p">:</span><span class="n">value</span> <span class="k">for</span> <span class="n">key</span><span class="p">,</span><span class="n">value</span> <span class="ow">in</span> <span class="nb">sorted</span><span class="p">(</span><span class="n">result</span><span class="p">.</span><span class="n">items</span><span class="p">())}</span>
<span class="k">return</span> <span class="n">sorted_result</span>
</code></pre></div></div>
<p>The speedup is relatively minor when calculating just 5 percentiles.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Using ppf bounds? True
total time to compute percentiles: 3.1997928619384766 seconds
Using ppf bounds? False
total time to compute percentiles: 3.306936264038086 seconds
</code></pre></div></div>
<p>It grows a little bit with the number of percentiles, but calculating a large number of percentiles would just lead to information overload for the user.</p>
<p>This was surprising to me. Using the bounds dramatically cuts the width of the interval for equation solving, but leads to only a minor speedup. Using <code class="language-plaintext highlighter-rouge">fulloutput=True</code> in <code class="language-plaintext highlighter-rouge">optimize.brentq</code>, we can see the number of function evaluations that <code class="language-plaintext highlighter-rouge">brentq</code> uses. This lets us see that the number of evaluations needed by <code class="language-plaintext highlighter-rouge">brentq</code> is highly non-linear in the width of the interval. The solver gets quite close to the solution very quickly, so giving it a narrow interval hardly helps.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Using ppf bounds? True
brentq looked between 0.0 10.0 and took 11 iterations
brentq looked between 0.52 10.0 and took 13 iterations
brentq looked between 0.52 2.24 and took 8 iterations
brentq looked between 0.81 2.24 and took 9 iterations
brentq looked between 0.81 1.73 and took 7 iterations
total time to compute percentiles: 3.1997928619384766 seconds
Using ppf bounds? False
brentq looked between 0.0 10.0 and took 11 iterations
brentq looked between 0.0 10.0 and took 10 iterations
brentq looked between 0.0 10.0 and took 10 iterations
brentq looked between 0.0 10.0 and took 10 iterations
brentq looked between 0.0 10.0 and took 9 iterations
total time to compute percentiles: 3.306936264038086 seconds
</code></pre></div></div>
<p>Brent’s method is a very efficient equation solver.</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:extra" role="doc-endnote">
<p>It has a very similar shape to the likelihood (because the likelihood has much lower variance than the prior). <a href="#fnref:extra" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>How long does it take to sample from a distribution?2020-05-31T00:00:00+00:002020-05-31T00:00:00+00:00https://fragile-credences.github.io/sampling<p>Suppose a study comes out about the effect of a new medication and you want to precisely compute how to update your beliefs given this new evidence. You might use Bayes’ theorem for continuous distributions.</p>
\[p(\theta | x) =\frac{p(x | \theta) p(\theta) }{p(x)}=\frac{p(x | \theta) p(\theta) }{\int_\Theta p(x | \theta) p(\theta) d \theta}\]
<p>The normalization constant (the denominator of the formula) is an integral that is not too difficult to compute, as long as the distributions are one-dimensional.</p>
<p>For example, with:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">scipy</span> <span class="kn">import</span> <span class="n">stats</span>
<span class="kn">from</span> <span class="nn">scipy</span> <span class="kn">import</span> <span class="n">integrate</span>
<span class="n">prior</span> <span class="o">=</span> <span class="n">stats</span><span class="p">.</span><span class="n">lognorm</span><span class="p">(</span><span class="n">scale</span><span class="o">=</span><span class="n">math</span><span class="p">.</span><span class="n">exp</span><span class="p">(</span><span class="mi">1</span><span class="p">),</span><span class="n">s</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="n">likelihood</span> <span class="o">=</span> <span class="n">stats</span><span class="p">.</span><span class="n">norm</span><span class="p">(</span><span class="n">loc</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span><span class="n">scale</span><span class="o">=</span><span class="mi">20</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">unnormalized_posterior_pdf</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
<span class="k">return</span> <span class="n">prior</span><span class="p">.</span><span class="n">pdf</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="o">*</span><span class="n">likelihood</span><span class="p">.</span><span class="n">pdf</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="n">normalization_constant</span> <span class="o">=</span> <span class="n">integrate</span><span class="p">.</span><span class="n">quad</span><span class="p">(</span>
<span class="n">unnormalized_posterior_pdf</span><span class="p">,</span><span class="o">-</span><span class="n">np</span><span class="p">.</span><span class="n">inf</span><span class="p">,</span><span class="n">np</span><span class="p">.</span><span class="n">inf</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span>
</code></pre></div></div>
<p>the integration runs in less than 100 milliseconds on my machine. So we can get a PDF for an arbitrary 1-dimensional posterior very easily.</p>
<p>But taking a single sample from the (normalized) distribution takes about a second:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Normalize unnormalized_posterior_pdf
# using the method above and return the posterior as a
# scipy.stats.rv_continuous object.
# This takes about 100 ms
</span><span class="n">posterior</span> <span class="o">=</span> <span class="n">update</span><span class="p">(</span><span class="n">prior</span><span class="p">,</span><span class="n">likelihood</span><span class="p">)</span>
<span class="c1"># Take 1 random sample, this takes about 1 s
</span><span class="n">posterior</span><span class="p">.</span><span class="n">rvs</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
</code></pre></div></div>
<p>And this difference can be even starker for higher-variance posteriors (with <code class="language-plaintext highlighter-rouge">s=4</code> in the lognormal prior, I get 250 ms for the normalization constant and almost 10 seconds for 1 random sample).</p>
<p>For a generic continuous random variable, <code class="language-plaintext highlighter-rouge">rvs</code> uses <a href="https://en.wikipedia.org/wiki/Inverse_transform_sampling">inverse transform sampling</a>. It first generates a random number from the uniform distribution between 0 and 1, then passes this number to <code class="language-plaintext highlighter-rouge">ppf</code>, the percent point function, or more commonly quantile function, of the distribution. This function is the inverse of the CDF. For a given percentile, it tells you what value corresponds to that percentile of the distribution. Randomly selecting a percentile \(x\) and evaluating the \(x\) th percentile of the distribution is equivalent to randomly sampling from the distribution.</p>
<p>How is <code class="language-plaintext highlighter-rouge">ppf</code> evaluated? The CDF, which in general (and in fact most of the time<sup id="fnref:s" role="doc-noteref"><a href="#fn:s" class="footnote" rel="footnote">1</a></sup>) has no explicit expression at all, is inverted by numerical equation solving, also known as root finding. For example, evaluating <code class="language-plaintext highlighter-rouge">ppf(0.7)</code> is equivalent to solving <code class="language-plaintext highlighter-rouge">cdf(x)-0.7=0</code>, which can be done with numerical methods. The simplest such method is the <a href="https://en.wikipedia.org/wiki/Bisection_method">bisection algorithm</a>, but more efficient ones have been developed (<code class="language-plaintext highlighter-rouge">ppf</code> uses <a href="https://en.wikipedia.org/wiki/Brent%27s_method">Brent’s method</a>). The interesting thing for the purposes of runtime is that the root finding algorithm must repeatedly call <code class="language-plaintext highlighter-rouge">cdf</code> in order to narrow in on the solution. Each call to <code class="language-plaintext highlighter-rouge">cdf</code> means an expensive integration of the PDF.</p>
<p><img src="/assets/images/cdf-bisection.png" alt="CDF Bisection" title="CDF Bisection" /><br />
<em>The bisection algorithm to solve <code class="language-plaintext highlighter-rouge">cdf(x)-0.7=0</code></em></p>
<p>An interesting corollary is that getting one random number is just as expensive as computing a chosen percentile of the distribution using <code class="language-plaintext highlighter-rouge">ppf</code> (assuming that drawing a random number between 0 and 1 takes negligible time). For approximately the cost of 10 random numbers, you could characterize the distribution by its deciles.</p>
<p>On the other hand, sampling from a distribution whose family is known (like the lognormal) is extremely fast with <code class="language-plaintext highlighter-rouge">rvs</code>. I’m getting 10,000 samples in a millisecond (<code class="language-plaintext highlighter-rouge">prior.rvs(size=10000)</code>). This is not because there exists an analytical expression for its inverse CDF, but because there are very efficient algorithms<sup id="fnref:algos" role="doc-noteref"><a href="#fn:algos" class="footnote" rel="footnote">2</a></sup> for sampling from these specific distributions<sup id="fnref:python" role="doc-noteref"><a href="#fn:python" class="footnote" rel="footnote">3</a></sup>.</p>
<p>So far I have only spoken about 1-dimensional distributions. The difficulty of computing the normalization constant in multiple dimensions is often given as a reason for using numerical approximation methods like Markov chain Monte Carlo (MCMC). For example, <a href="https://towardsdatascience.com/bayesian-inference-problem-mcmc-and-variational-inference-25a8aa9bce29">here</a>:</p>
<blockquote>
<p>Although in low dimension [the normalization constant] can be computed without too much difficulties, it can become intractable in higher dimensions. In this last case, the exact computation of the posterior distribution is practically infeasible and some approximation techniques have to be used […].
Among the approaches that are the most used to overcome these difficulties we find Markov Chain Monte Carlo and Variational Inference methods.</p>
</blockquote>
<p>However, the difficulty of sampling from a posterior distribution that isn’t in a familiar family could be a reason to use such techniques even in the one-dimensional case. This is true despite the fact that we can easily get an analytic expression for the PDF of the posterior.</p>
<p>For example, with the MCMC package emcee, I’m able to get 10,000 samples from the posterior in 8 seconds, less than a millisecond per sample and a 1,000x improvement over <code class="language-plaintext highlighter-rouge">rvs</code>!</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">ndim</span><span class="p">,</span> <span class="n">nwalkers</span><span class="p">,</span> <span class="n">nruns</span> <span class="o">=</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">20</span><span class="p">,</span> <span class="mi">500</span>
<span class="n">start</span> <span class="o">=</span> <span class="n">time</span><span class="p">.</span><span class="n">time</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">log_prob</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
<span class="k">if</span> <span class="n">posterior</span><span class="p">.</span><span class="n">pdf</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="o">></span><span class="mi">0</span><span class="p">:</span>
<span class="k">return</span> <span class="n">math</span><span class="p">.</span><span class="n">log</span><span class="p">(</span><span class="n">posterior</span><span class="p">.</span><span class="n">pdf</span><span class="p">(</span><span class="n">x</span><span class="p">))</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">return</span> <span class="o">-</span><span class="n">np</span><span class="p">.</span><span class="n">inf</span>
<span class="n">sampler</span> <span class="o">=</span> <span class="n">emcee</span><span class="p">.</span><span class="n">EnsembleSampler</span><span class="p">(</span><span class="n">nwalkers</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">log_prob</span><span class="p">)</span>
<span class="n">sampler</span><span class="p">.</span><span class="n">run_mcmc</span><span class="p">(</span><span class="n">p0</span><span class="p">,</span> <span class="n">nruns</span><span class="p">)</span> <span class="c1">#p0 are the starting samples
</span></code></pre></div></div>
<p>These samples will only be drawn from a distribution approximating the posterior, whereas <code class="language-plaintext highlighter-rouge">rvs</code> is as precise as SciPy’s root finding and integration algorithms. However, I think there are MCMC algorithms out there that converge very well.</p>
<p><a href="/assets/files/integration-sampling-runtime.py">Here</a>’s the code for running the timings on your machine.</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:s" role="doc-endnote">
<p>“For a continuous distribution, however, we need to integrate the probability density function (PDF) of the distribution, which is impossible to do analytically for most distributions (including the normal distribution).” Wikipedia on <a href="https://en.wikipedia.org/wiki/Inverse_transform_sampling">Inverse transform sampling</a>. <a href="#fnref:s" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:algos" role="doc-endnote">
<p>“For the normal distribution, the lack of an analytical expression for the corresponding quantile function means that other methods (e.g. the Box–Muller transform) may be preferred computationally. It is often the case that, even for simple distributions, the inverse transform sampling method can be improved on: see, for example, the ziggurat algorithm and rejection sampling. On the other hand, it is possible to approximate the quantile function of the normal distribution extremely accurately using moderate-degree polynomials, and in fact the method of doing this is fast enough that inversion sampling is now the default method for sampling from a normal distribution in the statistical package R.” Wikipedia on <a href="https://en.wikipedia.org/wiki/Inverse_transform_sampling">Inverse transform sampling</a>. <a href="#fnref:algos" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:python" role="doc-endnote">
<p>The way it works in Python is that, in the definition of the class Lognormal (a subclass of the continuous random variable class), the generic inverse transform <code class="language-plaintext highlighter-rouge">rvs</code> method is overwritten with a more tailored sampling algorithm. SciPy will know to apply the more efficient method when <code class="language-plaintext highlighter-rouge">rvs</code> is called on an instance of class Lognormal. <a href="#fnref:python" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Suppose a study comes out about the effect of a new medication and you want to precisely compute how to update your beliefs given this new evidence. You might use Bayes’ theorem for continuous distributions.Hidden subsidies for cars2019-12-06T00:00:00+00:002019-12-06T00:00:00+00:00https://fragile-credences.github.io/cars<p>Personal vehicles are ubiquitous. They dominate cities. They are actually so entrenched that they can blend into the background, no longer rising to our attention. Having as many cars as we do can seem to be the ‘natural’ state of affairs.</p>
<p>Our level of car use could perhaps be called natural if it were the result of people’s preferences interacting in well-functioning markets. No reader of this blog, I take it, would believe such a claim. The negative externalities of cars are well-documented: pollution, congestion, noise, and so on.</p>
<p>The <em>subsidies</em> for cars are less obvious, but I think they’re also important.</p>
<p>In our relationship to cars in the urban environment, we’re almost like David Foster Wallace’s <a href="https://fs.blog/2012/04/david-foster-wallace-this-is-water/">fish</a> who asked ‘what the hell is water?’. I want to flip that perspective and point out some specific government policies that increase the number of cars in cities.</p>
<p><img src="/assets/images/evelyn-hofer-manhattan.jpg" alt=""Manhattan, 1964 by Evelyn Hofer"" /><br />
<em>“Manhattan, 1964” by Evelyn Hofer</em></p>
<h1 id="free-or-cheap-street-parking">Free or cheap street parking</h1>
<p>Privately provided parking in highly desirable city centres can cost hundreds of dollars a month. But the government provides car storage on the side of the street for a fraction of that, often for free.<sup id="fnref:sh" role="doc-noteref"><a href="#fn:sh" class="footnote" rel="footnote">1</a></sup></p>
<h1 id="the-width-of-roads">The width of roads</h1>
<p>Streets and sidewalks sit on large amounts of strategically placed land that is publicly owned. Most of that land is devoted to cars. On large thoroughfares, I’d guess cars take easily 70% of the space, leaving only thin slivers on each side for pedestrians.</p>
<p><a href="https://oldurbanist.blogspot.com/2011/06/density-on-ground-cities-and-building.html">This blogger</a> estimates, apparently by eyeballing Google Maps, that streets take up 43% of the land in Washington DC, 25% in Paris, and 20% in Tokyo.</p>
<p>Space that is now used for parked cars or moving cars could be used, for example, by shops and restaurants, for bikeshare stations, to plant trees, for <a href="http://pavementtoparks.org/parklets/">parklets</a>, or even to add more housing. And if there was a market for this land I’m sure people would come up with many other clever uses.</p>
<h1 id="highways">Highways</h1>
<p>Even if highways aren’t actually inside the city, they have important indirect effects on urban life. Whether the government pays for highways or train lines to connect cities to each other is a policy choice with clear effects on day to day life in the city, even for those who do not travel.</p>
<p>In the United States, this implicit subsidy for cars is large. According to the department of transportation, in 2018 $49 billion out of the department’s budget of $87 billion was spent on highways<sup id="fnref:dotsource" role="doc-noteref"><a href="#fn:dotsource" class="footnote" rel="footnote">2</a></sup>.</p>
<p>In this post I don’t want to get into the very complicated question of how much governments should optimally spend on highways. For all I know the U.S. policy may be optimal. My point is only that <em>any</em> government spending on highways indirectly subsidises the presence of cars in cities. This is non-obvious and worth pointing out. When the government pays for a Metro in your city, the subsidy to Metros plain to see. Meanwhile, the subsidy to cars via a huge network of roads across the country passes unnoticed by many.</p>
<p>To be fair, in the United States federal spending on highways is largely financed by taxes on taxes on vehicle fuel. So it’s not clear whether federal highways policy is a <em>net</em> subsidy to cars. However, the way highway spending is financed varies by country. For example, <a href="https://www.loc.gov/law/help/infrastructure-funding/germany.php">in Germany</a>, “federal highways are funded by the federation through a combination of general revenue and receipts from tolls imposed on truck traffic”.</p>
<h1 id="minimum-parking-requirements">Minimum parking requirements</h1>
<p>Many zoning codes require new buildings to include some fixed number of off-street parking spaces. This isn’t as much of a problem in the European cities I’m familiar with, but in the US, parking minimums are far beyond what the market would provide, and are a significant cost to developers. One paper estimated that the cost of parking in Los Angeles increases the cost of office space by 27-67%<sup id="fnref:ref" role="doc-noteref"><a href="#fn:ref" class="footnote" rel="footnote">3</a></sup>.</p>
<h1 id="suburban-sprawl">Suburban sprawl</h1>
<p>United States built sprawling suburbs in the postwar period. I still remember the famous aerial view of Levittown, the prototypical prefabricated suburb, from my middle school history book.</p>
<p>The growth of suburbia was aided by specific government policies that tipped the scales in favour of individual homes in the suburbs, and against apartments in cities. The growth of suburbia led to more cars in the city, because people who live in suburbs are much more likely to drive to work.</p>
<p>Devon Zuegel has an <a href="https://medium.com/by-the-bay/financing-suburbia-6076dae990f8">excellent exposition</a> of how federal mortgage insurance subsidized suburbia<sup id="fnref:devon-details" role="doc-noteref"><a href="#fn:devon-details" class="footnote" rel="footnote">4</a></sup>:</p>
<blockquote>
<p>[The federal housing administration] provides insurance on mortgages that meet certain criteria, repaying the principal to lenders if borrowers default. […] Mortgages had to meet an opinionated set of criteria to qualify for the federal insurance. […] The ideal house had “sunshine, ventilation, scenic outlook, privacy, and safety”, and “effective landscaping and gardening” added to its worth. The guide recommended that houses should be set back at least 15 feet from the road, and well-tended lawns that matched the neighbors’ yards helped the rating. […] [The FHA manual] prescribed minimum street widths and other specific measurements.</p>
</blockquote>
<p>The federal government was effectively prescribing how millions of Americans should live, down to their landscaping and gardening! I wonder if Khrushchev brought up this interesting fact about American life in his conversations with Eisenhower. ;)</p>
<h1 id="further-reading">Further reading</h1>
<ul>
<li>A <a href="https://www.vtpi.org/land.pdf">study</a> from the Canadian Victoria Tansport Policy Institute, <em>Transportation Land Valuation</em></li>
<li>Anything by Donald Shoup, an economist and urban planner</li>
<li>Some cool <a href="https://oldurbanist.blogspot.com/2011/12/we-are-25-looking-at-street-area.html">colour-coded maps</a> of U.S. cities, showing the surface area devoted to surface parking, above-ground parking garages, and park space.</li>
<li>Barcelona’s <a href="https://www.google.com/search?q=barcelona+superblocks">superblocks</a></li>
</ul>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:sh" role="doc-endnote">
<p>If you want more on this topic, economist and urban planner Donald Shoup has a 733-page tome called <em><a href="https://en.wikipedia.org/wiki/The_High_Cost_of_Free_Parking">The High Cost of Free Parking</a></em>. <a href="#fnref:sh" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:dotsource" role="doc-endnote">
<p>See the supporting summary table on page 82 of <a href="https://www.transportation.gov/sites/dot.gov/files/docs/mission/budget/333126/budgethighlightsfinal040519.pdf">this document</a>. The sum of spending for the Federal Highway Administration, the Federal Motor Carrier Safety Administration, and the National Traffic Safety Administration comes to $49 billion. Thanks to Devin Jacob for the pointer. <a href="#fnref:dotsource" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:ref" role="doc-endnote">
<p><a href="http://shoup.bol.ucla.edu/Trouble.pdf">Shoup 1999</a>, <em>The trouble with minimum parking requirements</em>, in section 3.1, estimates that parking requirements in Los Angeles increase the cost of office space by 27% for aboveground parking, and 67% for underground parking. <a href="#fnref:ref" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:devon-details" role="doc-endnote">
<p>Devon wrote a two-part series: <a href="https://medium.com/by-the-bay/financing-suburbia-6076dae990f8">Part 1</a>, quoted above, deals with federal mortgage policy, and lays out a convincing case that it included large implicit subsidies. <a href="https://medium.com/by-the-bay/exempting-suburbia-13e339f4e37a">Part 2</a> is about “how suburban sprawl gets special treatment in our tax code”. It shows that owning and building homes is heavily subsidized, for example by the gargantuan mortgage interest deduction. I agree that this means people are encouraged to consume more housing, but I don’t see how it differentially encourages suburban housing. Devon quotes economist Edward Glaeser, who says that</p>
<blockquote>
<p>More than 85 percent of people in detached homes are owner-occupiers, in part because renting leads to home depreciation. More than 85 percent of people in larger buildings rent. Since ownership and structure type are closely connected, subsidizing homeownership encourages people to leave urban high-rises and move into suburban homes.</p>
</blockquote>
<p>So the key link in the argument is the connection between ownership and structure type. I’d like to see it spelled out and sourced better. Could the observed correlation just be due to a selection effect? If there’s a true causal effect, do large buildings have more renters because it’s genuinely more efficient that way, or is there some some market failure that prevents people from being apartment-owners in the city? <a href="#fnref:devon-details" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Personal vehicles are ubiquitous. They dominate cities. They are actually so entrenched that they can blend into the background, no longer rising to our attention. Having as many cars as we do can seem to be the ‘natural’ state of affairs.