# Philosophy success story II: the analysis of computability

December 3, 2017

a computing machine is really a logic machine. Its circuits embody the distilled insights of a remarkable collection of logicians, developed over centuries.

— Martin Davis (2000)

This is part of my series on success stories in philosophy. See this page for an explanation of the project and links to other items in the series. I also have a related discussion of conceptual analysis here

# Contents

The analysis of computability is one of the few examples in history of a non-trivial conceptual analysis that has consensus support. Some might want to quibble that computer science or mathematics rather than philosophy deserves the credit for it. I’m not interested in which artificial faction of academy should lay claim to the spoils of war. The search for, proposal of, and eventual vindication of a formalisation of an everyday concept is philosophical in method, that is my point. If you wish to conclude from this that computer scientists produce better philosophy than philosophers, so be it.

To us living today, the formal idea of an algorithm is so commonsensical that it can he hard to imagine a worldview lacking this precise concept. Yet less a than a century ago, that was exactly the world people like Turing, Gödel, Russel, Hilbert, and everybody else was living in.

The notion of algorithm, of course, is so general that people have been using them for thousands of years. The use of tally marks to count sheep is a form of algorithm. The sieve of Eratosthenes, an algorithm for finding all prime number up to a given limit, was developed in ancient Greece. The success story is therefore not the developement of algorithms, but the understand and formalisation of the concept itself. This improved understanding helped dissolve some confusions.

# The intuitive notion of effective calculability

Soare 1996:

In 1642 the French mathematician and scientist, Blaise Pascal, invented an adding machine which may be the first digital calculator.

Wikipedia:

In 1673, Gottfried Leibniz demonstrated a digital mechanical calculator, called the Stepped Reckoner. […] It could:

• add or subtract an 8-digit number to / from a 16-digit number
• multiply two 8-digit numbers to get a 16-digit result
• divide a 16-digit number by an 8-digit divisor

These primitive calculating devices show that people in the 17th century had to have some inuitive notion of “that which can be calculated by a machine” or by a “mindless process” or “without leaps of insight”. They were, at the very least, an existence proof, showing that addition and subtraction could be performed by such a process.

Wikipedia also tells us:

Before the precise definition of computable function, mathematicians often used the informal term effectively calculable to describe functions that are computable by paper-and-pencil methods.

And:

In 1935 Turing and everyone else used the term “computer” for an idealized human calculating with extra material such as pencil and paper, or a desk calculator, a meaning very different from the use of the word today.

# The Church-Turing analysis of computability

Stanford has a nice and concise explanation:

In the 1930s, well before there were computers, various mathematicians from around the world invented precise, independent definitions of what it means to be computable. Alonzo Church defined the Lambda calculus, Kurt Gödel defined Recursive functions, Stephen Kleene defined Formal systems, Markov defined what became known as Markov algorithms, Emil Post and Alan Turing defined abstract machines now known as Post machines and Turing machines.

Surprisingly, all of these models are exactly equivalent: anything computable in the lambda calculus is computable by a Turing machine and similarly for any other pairs of the above computational systems. After this was proved, Church expressed the belief that the intuitive notion of “computable in principle” is identical to the above precise notions. This belief, now called the “Church-Turing Thesis”, is uniformly accepted by mathematicians.

# Computability theory applied

I take Turing’s (and his contemporaries’) philosophical contribution to be the conceptual analysis of “computable” as “computable by a Turing machine”, i.e. the assertion of the Church-Turing Thesis. As we will often see in this series, once we have a formalism, we can go to town and start proving things left and right about the formal object. What was once a topic of speculation becomes amenable to mathematics. (For much mroe on this topics, see my other post on why good philosophy often looks like mathematics.) Here are some examples.

## The halting problem

Given Pascal’s and Leibnitz’s machines, one might have thought it natural that any function (set $F​$ of ordered pairs such that if $\langle a,b \rangle \in F​$ and $\langle a,c \rangle \in F​$ then $b=c​$ ) which can be precisely specified can be computed in the inuitive sense. But Turing showed that this is not true. For example, the halting problem is not computable; and the Entscheidungsproblem (Turing’s original motivation for developing his formalism) cannot be solved.

## Further applications in mathamtics

Here are some lists of examples of non-computable functions:

There is an analogy here, by the way, to the previous success story: many people thought it natural that any continuous function must be differentiable, the discovery of a function that is everywhere continuous and nowhere differentiable seemed problematic, and the formalisation of the concept of continuity solved the problem.

## The modern computer

The greatest practical impact of Turing’s discoveries was to lay the conceptual ground for the development of modern computers. (Wikipedia has a good summary of the history of computing.)

In his 1936 paper On Computable Numbers, with an Application to the Entscheidungsproblem, once armed with his new formalism, Turing immediately proves an interesting result: the general notion of “computable by some turing machine” can itself be expressed in terms of Turing machines. In particular, a Universal Turing Machine, is a Turing Machine that can simulate an arbitrary Turing machine on arbitrary input.1

This was the first proof that there could be a “universal” programmable machine, capable of computing anything that we know how to compute, when given the recipe. Sometimes in history, as in the case of heavier-than-air flying machines, infamously pronounced impossible by Lord Kelvin, the proof is in the pudding. With the computer, the proof preceded the pudding by several decades.

Jack Copeland (The Essential Turing, 2004, p.21f) writes:

In the years immediately following the Second World War, the Hungarian-American logician and mathematician John von Neumann—one of the most important and influential figures of twentieth-century mathematics—made the concept of the stored-programme digital computer widely known, through his writings and his charismatic public addresses […] It was during Turing’s time at Princeton that von Neumann became familiar with the ideas in ‘On Computable Numbers’. He was to become intrigued with Turing’s concept of a universal computing machine. […] The Los Alamos physicist Stanley Frankel […] has recorded von Neumann’s view of the importance of ‘On Computable Numbers’:

I know that in or about 1943 or ’44 von Neumann was well aware of the fundamental importance of Turing’s paper of 1936 ‘On computable numbers . . .’, which describes in principle the ‘Universal Computer’ of which every modern computer […] is a realization. […] Many people have acclaimed von Neumann as the ‘father of the computer’ (in a modern sense of the term) but I am sure that he would never have made that mistake himself. He might well be called the midwife, perhaps, but he firmly emphasized to me, and to others I am sure, that the fundamental conception is owing to Turing—insofar as not anticipated by Babbage, Lovelace, and others. In my view von Neumann’s essential role was in making the world aware of these fundamental concepts introduced by Turing […].

# Epilogue: the long reach of the algorithm

The following is an example of progress in philosophy which, while quite clear-cut in my view, hasn’t achieved consensus in the discipline, so I wouldn’t count it as a success story quite yet. It also has more to do with the development of advanced computers and subsequent philosophical work than with the conceptual analysis of computability. But Turing, as the father of the algorithm, does deserve a nod of acknowledgement for it, so I included it here.

Peter Millican an excellent, concise summary of the point (Turing Lectures, HT16, University of Oxford):

Information processing, and informationally sensitive processes, can be understood in terms of symbolic inputs and outputs, governed by explicit and automatic processes. So information processing need not presuppose an “understanding” mind, and it therefore becomes possible in principle to have processes that involved sophisticated information processing without concious purpose, in much the same way as Darwin brought us sophisticated adaptation without intentional design.

On the idea of natural selection as an algorithm, see Dennett.

1. The universal machine achieves this by reading both the description of the machine to be simulated as well as the input thereof from its own tape. Extremly basic sketch: if $M'$ simulates $M$, $M'$ will print out, in sequence, the complete configurations that $M$ would produce. It will have a record of the last complete configuration at the right of the tape, and a record of $M$’s rules at the left of the tape. It will shuttle back and forth, checking the latest configuration from the right, the finding the rule that it matches at the left, the moving back to build the next configuration accordingly on the right. (Peter Millican, Turing Lectures, HT16, University of Oxford)

# Philosophy success story I: predicate logic

December 3, 2017

This is part of my series on success stories in philosophy. See this page for an explanation of the project and links to other items in the series.

# Background

Frege “dedicated himself to the idea of eliminating appeals to intuition in the proofs of the basic propositions of arithmetic”. For example:

A Kantian might very well simply draw a graph of a continuous function which takes values above and below the origin, and thereby ‘demonstrate’ that such a function must cross the origin. But both Bolzano and Frege saw such appeals to intuition as potentially introducing logical gaps into proofs.

In 1872, Weierstrass described a real-valued function that is continuous everywhere but differentiable nowhere. All the mathematics Weierstrass was building on had been established by using “obvious” intuitions. But now, the intuitive system so built up had led to a highly counter-intuitive result. This showed that intuitions can be an unreliable guide: by the lights of intuition, Weierstrass’s result introduced a contradiction in the system. So, Frege reasoned, we should ban intuitive proof-steps in favour of a purely formal system of proof. This formal system would (hopefully) allow us to derive the basics propositions of arithmetic. Armed with such a system, we could then simply check whether Weierstrass’s result, and others like it, hold or not.

So Frege developed predicate logic. In what follows I’ll assume familiarity with this system.

While originally developed for this mathematical purpose, predicate logic turned out to be applicable to a number of philosophical issues; this process is widely considered among the greatest success stories of modern philosophy.

# The problem of multiple generality

## How people were confused (a foray into the strange world of suppositio)

Dummett 1973:

Aristotle and the Stoics had investigated only those inferences involving essentially sentences containing not more than one expression of generality.

Aristotle’s system, which permits only four logical forms, seems comically limited1 by today’s standards, yet Kant “famously claimed, in Logic (1800), that logic was the one completed science, and that Aristotelian logic more or less included everything about logic there was to know.” (Wikipedia).

Some medieval logicians attempted to go beyond Aristotle and grappled with the problem of multiple generality. As Dummet writes (my emphasis),

Scholastic logic had wrestled with the problems posed by inferences depending on sentences involving multiple generality – the occurrence of more than one expression of generality. In order to handle such inferences, they developed ever more complex theories of different types of ‘suppositio’ (different manners in which an expression could stand for or apply to an object): but these theories, while subtle and complex, never succeeded in giving a universally applicable account.
It is necessary, if Frege is to be understood, to grasp the magnitude of the discovery of the quantifier-variable notation, as thus resolving an age-old problem the failure to solve which had blocked the progress of logic for centuries. […] for this resolved a deep problem, on the resolution of which a vast area of further progress depended, and definitively, so that today we are no longer conscious of the problem of which it was the solution as a philosophical problem at all.

Medieval philosophers got themselves into terrible depths of confusion when trying to deal with these sentences having more than one quantifier. For example, from “for each magnitude, there is a smaller magnitude”, we want to validate “each magnitude is smaller than at least one magnitude” but not “there is a magnitude smaller than every magnitude”. The medievals analysed this in terms of context-dependence of the meanings of quantified terms:

The general phenomenon of a term’s having different references in different contexts was called suppositio (substitution) by medieval logicians. It describes how one has to substitute a term in a sentence based on its meaning—that is, based on the term’s referent. (Wikipedia)

The scholastics specified many different types of substitution, and which operations were legimitate for each; but never progressed beyond a set of ham-fisted, ad-hoc rules.

To show examples, I had to go to modern commentaries of the scholastics, since the actual texts are simply impenetrable.

Swiniarski 1970. Ockham’s Theory of Personal Supposition

Broadie 1993, which is Oxford University Press’ Introduction to medieval logic:

a term covered immediately by a sign of universality, for example, by ‘all’ or ‘every’, has distributive supposition, and one covered mediately by a sign of affirmative universality has merely confused supposition. A term is mediately covered by a given sign if the term comes at the predicate end of a proposition whose subject is immediately covered by the sign. Thirdly, a term covered, whether immediately or mediately, by a sign of negation has confused distributive supposition (hereinafter just ‘distributive supposition’). Thus in the universal negative proposition ‘No man is immortal’, both the subject and the predicate have distributive supposition, and in the particular negative proposition ‘Some man is not a logician’, the predicate has distributive supposition and the subject has determinate supposition. […]

Given the syntactic rules presented earlier for determining the kind of supposition possessed by a given term, it follows that changing the position of a term in a proposition can have an effect on the truth value of that proposition. In:

(10) Every teacher has a pupil

‘pupil’ has merely confused supposition, and consequently the proposition says that this teacher has some pupil or other and that teacher has some pupil or other, and so on for every teacher. But in:

(11) A pupil every teacher has

‘pupil’ has determinate supposition, and since ‘teacher’ has distributive supposition descent must be made first under ‘pupil’ and then under ‘teacher’. Assuming there to be just two teachers and two pupils, the first stage of descent takes us to:

(12) PupiI1 every teacher has or pupil2 every teacher has.

The next stage takes us to:

(13) Pupil1 teacher1 has and pupil1 teacher2 has, or pupil2 teacher1 has and pupil2 teacher2 has.

(13) implies that some one pupil is shared by all the teachers, and that is plainly not implied by (10), though it does imply (10).

In all this talk of supposition, we can discern a flailing attempt to deal with ambiguities of quantifier scope, but these solutions are, of course, hopelessly ad hoc. Not to mention that the rules of supposition were in flux, and their precise content is still a matter of debate among specialists of Scholasticism2.

And now just for fun, a representative passage from Ockham:

And therefore the following rule can be given concerning negations of this kind : Negation which mobilizes what is immobile, immobilizes what is mobile, that is, when such a negation precedes a term which supposits determinately it causes the term to supposit in a distributively confused manner, and when it precedes a term suppositing in a distributively confused manner it causes the term to supposit determinately.

According to one commentator (Swiniarski 1970), in this passage “Ockham formulates a De Morgan-like rule concerning the influence of negations which negate an entire proposition and not just one of the terms.” I’ll let you be the judge. For at this point it is I who supposit in a distributively confused manner.

## How predicate logic dissolved the confusion

The solution is now familiar to anyone who has studied logic. Wikipedia gives a simple example:

Using modern predicate calculus, we quickly discover that the statement is ambiguous. “Some cat is feared by every mouse” could mean

• For every mouse m, there exists a cat c, such that c is feared by m, i.e. $\forall m (M(m) \rightarrow \exists c (C(c) \land F(m,c)))$

But it could also mean

• there exists one cat c, such that for every mouse m, c is feared by m, i.e. $\exists c (C(c) \land \forall m (M(m) \rightarrow F(m,c))$.

Of course, this is only the simplest case. Predicate logic allows arbitrarily deep nesting of quantifiers, helping us understand sentences which the scholastics could not even have made intuitive sense of, let alone provide a formal semantics for.

# Definite descriptions

## How people were confused

The problem here is with sentences like “Unicorns have horns” which appear to refer to non-existent objects. People were quite confused about them:

Meinong, an Austrian philosopher active at the turn of the 20th century, believed that since non-existent things could apparently be referred to, they must have some sort of being, which he termed sosein (“being so”). A unicorn and a pegasus are both non-being; yet it’s true that unicorns have horns and pegasi have wings. Thus non-existent things like unicorns, square circles, and golden mountains can have different properties, and must have a ‘being such-and-such’ even though they lack ‘being’ proper. The strangeness of such entities led to this ontological realm being referred to as “Meinong’s jungle”. (Wikipedia)

The delightfully detailed Stanford page on Meinong provides further illustration:

Meinong tries to give a rational account of the seemingly paradoxical sentence “There are objects of which it is true that there are no such objects” by referring to two closely related principles: (1) the “principle of the independence of so-being from being” [“Prinzip der Unabhängigkeit des Soseins vom Sein”], and (2) the “principle of the indifference of the pure object to being” (“principle of the outside-being of the pure object” [“Satz vom Außersein des reinen Gegenstandes”]) (1904b, §3–4). […]

Meinong repeatedly ponders the question of whether outside-being is a further mode of being or just a lack of being (1904b, §4; 1910, §12; 1917, §2; 1978, 153–4, 261, 358–9, 377). He finally interprets outside-being as a borderline case of a kind of being. Every object is prior to its apprehension, i.e., objects are pre-given [vorgegeben] to the mind, and this pre-givenness is due do the (ontological) status of outside-being. If so, the most general determination of so-being is being an object, and the most general determination of being is outside-being. The concept of an object cannot be defined in terms of a qualified genus and differentia. It does not have a negative counterpart, and correlatively outside-being does not seem to have a negation either (1921, Section 2 B, 102–7).

In fact, as John P. Burgess writes:

as Scott Soames reveals, in his Philosophical Analysis in the Twentieth Century, volume I: The Dawn of Analysis, Russell himself had briefly held a similar view [to Meinong’s]. It was through the development of his theory of descriptions that Russell was able to free himself from anything like commitment to Meinongian “objects.”

## How predicate logic dissolved the confusion

Russell’s On denoting, as the most famous case of a solved philosophical problem, needs no introduction. (Wikipedia gives a good treatment, and so does Sider’s Logic for Philosophy, section 5.3.3.)

Russell’s analysis of definite descriptions could have stood on its own as a success story. The tools of predicate logic were not, strictly speaking, necessary to discover the two possible interpretations of empty definite descriptions. In fact it may seem surprising that no-one made this discovery earlier. But as literate people of the 21st century, it can be hard for us to imagine the intellectual poverty of a world without predicate logic. So we must not be too haughty. The most likely conclusion, it seems to me, is that Russell’s insight was, in fact, very difficult to achieve without the precision afforded by Frege’s logic.

# The epsilon-delta definition of a limit

## How people were confused

As Wikipedia writes:

The need for the concept of a limit came into force in the 17th century when Pierre de Fermat attempted to find the slope of the tangent line at a point $x$ of a function such as $f(x)=x^{2}$. Using a non-zero, but almost zero quantity, $E$, Fermat performed the following calculation:

The key to the above calculation is that since $E$ is non-zero one can divide $f(x+E)-f(x)$ by $E$, but since $E$ is close to $0$, $2x+E$ is essentially $2x$. Quantities such as $E$ are called infinitesimals. The problem with this calculation is that mathematicians of the era were unable to rigorously define a quantity with properties of $E$ although it was common practice to ‘neglect’ higher power infinitesimals and this seemed to yield correct results.

SEP states:

Infinitesimals, differentials, evanescent quantities and the like coursed through the veins of the calculus throughout the 18th century. Although nebulous—even logically suspect—these concepts provided, faute de mieux, the tools for deriving the great wealth of results the calculus had made possible. And while, with the notable exception of Euler, many 18th century mathematicians were ill-at-ease with the infinitesimal, they would not risk killing the goose laying such a wealth of golden mathematical eggs. Accordingly they refrained, in the main, from destructive criticism of the ideas underlying the calculus. Philosophers, however, were not fettered by such constraints. […]

Berkeley’s arguments are directed chiefly against the Newtonian fluxional calculus. Typical of his objections is that in attempting to avoid infinitesimals by the employment of such devices as evanescent quantities and prime and ultimate ratios Newton has in fact violated the law of noncontradiction by first subjecting a quantity to an increment and then setting the increment to 0, that is, denying that an increment had ever been present. As for fluxions and evanescent increments themselves, Berkeley has this to say:

And what are these fluxions? The velocities of evanescent increments? And what are these same evanescent increments? They are neither finite quantities nor quantities infinitely small, nor yet nothing. May we not call them the ghosts of departed quantities?

Kline 1972 also tells us:

Up to about 1650 no one believed that the length of a curve could equal exactly the length of a line. In fact, in the second book of La Geometrie, Descartes says the relation between curved lines and straight lines is not nor ever can be known.

## How predicate logic dissolved the confusion

Simply let $f$ be a real-valued function defined on $\mathbb{R}$. Let $c$ and $L$ be real numbers. We can rigorously define a limit as:

From this it’s easy to define the slope as the limit of a rate of increase, to define continuity, and so on.

Note that there are two nested quantifiers here, and an implication sign. When we remind ourselves how much confusion just one nested quantifier caused ante-Frege, it’s not surprising that this new definition was not discovered prior to the advent of predicate logic.

# On the connection between the analysis of definite descriptions and that of limit

John P. Burgess, in The Princeton companion to mathematics, elaborates on the conceptual link between these two success stories:

[Definite descriptions] illustrate in miniature two lessons: first, that the logical form of a statement may differ significantly from its grammatical form, and that recognition of this difference may be the key to solving or dissolving a philosophical problem; second, that the correct logical analysis of a word or phrase may involve an explanation not of what that word or phrase taken by itself means, but rather of what whole sentences containing the word or phrase mean. Such an explanation is what is meant by a contextual definition: a definition that does not provide an analysis of the word or phrase standing alone, but rather provides an analysis of contexts in which it appears.

In the course of the nineteenth-century rigorization, the infinitesimals were banished: what was provided was not a direct explanation of the meaning of $df (x)$ or $dx$, taken separately, but rather an explanation of the meaning of contexts containing such expressions, taken as wholes. The apparent form of $df (x)/dx$ as a quotient of infinitesimals $df (x)$ and $dx$ was explained away, the true form being $(d/dx)f (x)$, indicating the application of an operation of differentiation $d/dx$ applied to a function $f (x)$.

# “That depends what the meaning of ‘is’ is”

Bill Clinton’s quote has become infamous, but he’s got a point. There are at least four meanings of ‘is’. They can bec clearly distinguished using predicate logic.

Hintikka ‎2004:

Perhaps the most conspicuous feature that distinguishes our contemporary « modern » logic created by Frege, Peirce, Russell and Hilbert from its predecessors is the assumption that verbs for being are ambiguous between the is of predication (the copula), the is of existence, the is of identity, and the is of subsumption. This assumption will be called the Frege-Russell ambiguity thesis. This ambiguity thesis is built into the usual notation of first-order logic and more generally into the usual notation for quantifiers of any order, in that the allegedly different meanings of verbs like « is » are expressed differently in it. The is of existence is expressed by the existential quantifier (\exists x), the is of predication by juxtaposition (or, more accurately speaking, by a singular term’s filling the argument slot of a predicative expression), the is of identity by = , and the is of subsumption by a general conditional.

1. Not to mention arbitrary in its limitations.

2. For instance, Parsons (1997), writes: “On the usual interpretation, there was an account of quantifiers in the early medieval period which was obscure; it was “cleaned up” by fourteenth century theorists by being defined in terms of ascent and descent. I am suggesting that the cleaning up resulted in a totally new theory. But this is not compelling if the obscurity of the earlier view prevents us from making any sense of it at all. In the Appendix, I clarify how I am reading the earlier accounts. They are obscure, but I think they can be read so as to make good sense. These same issues arise in interpreting the infamous nineteenth century doctrine of distribution; I touch briefly on this.”

# Philosophy success stories

December 3, 2017

Philosophical problems are never solved for the same reason that treasonous conspiracies never succeed: as successful conspiracies are never called “treason,” so solved problems are no longer called “philosophy.”

— John P. Burgess

# Contents

In this new series of essays, I aim to collect some concrete examples of success stories of philosophy (more below on quite what I mean by that). This is the introductory chapter in the series, where I describe why and how I embarked on this project.

Most academic disciplines love to dwell on their achievements. Economists will not hesitate to tell you that the welfare theorems, or the understanding of comparative advantage, were amazing achievements. (In Economics rules Dani Rodrik explicitly talks about the “crown jewels” of the discipline). Biology has the Nobel Prize to celebrate its prowess, and all textbooks duly genuflect to Watson and Crick and other heroes. Physics and Mathematics are so succesful that they needn’t brag for their breakthroughs to be widely admired. Psychologists celebrate Kahneman, linguists Chomsky.

Philosophy, on the other hand, like a persecuted child that begins to internalise its bullies’ taunts, has developed an unfortunate inferiority complex. As if to pre-empt those of the ilk of Stephen Hawking, who infamously pronocuned philosophy dead, philosophers are often the first to say that their discipline has made no progress in 3000 years. Russell himself said in The Problems of Philosophy:

Philosophy is to be studied not for the sake of any definite answers to its questions, since no definite answers can, as a rule, be known to be true, but rather for the sake of the questions themselves.

This view is very much alive today, as in Van Iwagen (2003):

Disagreement in philosophy is pervasive and irresoluble. There is almost no thesis in philosophy about which philosophers agree.

Among some writers, one even finds a sort of perverse pride that some topic is “one of philosophy’s oldest questions” and “has been discussed by great thinkers for 2000 years”, as if this were a point in its favour.

# The consequences of defeatism

This state of affairs would be of no great concern if the stakes were those of a mere academic pissing contest. But this defeatism about progress has real consequences about how the discipline is taught.

The first is history-worship. A well-educated teenager born this century would not commit the fallacies that litter the writings of the greats. The first sentence of Nicomachean Ethics is a basic quantificational fallacy. Kant’s response to the case of the inquiring murderer is an outrageous howler. Yet philosophy has a bizzare obsession with its past. In order to teach pre-modern texts with a straight face, philosophers are forced to stretch the principle of charity beyond recognition, and to retrofit newer arguments onto the fallacies of old. As Dustin Locke writes here, “The principle of charity has created the impression that there is no progress in philosophy by preserving what appear to be the arguments and theories of the great thinkers in history. However, what are being preserved are often clearly not the actual positions of those thinkers. Rather, they are mutated, anachronistic, and frankensteinian reconstructions of those positions.” Much time is wasted subjecting students to this sordid game, and many, I’m sure, turn their backs on philosophy as a result.

The second, related consequence is the absence of textbooks. No one would dream of teaching classical mechanics out of Principia or geometry out of Euclid’s Elements. Yet this is what philosophy departments do. Even Oxford’s Knowledge and Reality, which is comparatively forward-looking, has students read from original academic papers, some as old as the 1950s, as you can see here. It’s just silly to learn about counterfactuals and causation from Lewis 1973 (forty-four years ago!). Thankfully, there is the Stanford Encyclopeadia, but it’s incomplete and often pitched at too high a level for beginners. And even if Stanford can be counted as a sort of textbook, why just one? There should be hundreds of textbooks, all competing for attention by the clarity and percision of their explanations. That’s what happens for any scientific topic taught at the undergraduate level.

# My approach

## Identifiable successes

In this series, I want to focus on succcess stories that are as atomic, clear-cut, and precise as possible. In the words of Russell:

Modern analytical empiricism […] differs from that of Locke, Berkeley, and Hume by its incorporation of mathematics and its development of a powerful logical technique. It is thus able, in regard to certain problems, to achieve definite answers, which have the quality of science rather than of philosophy. It has the advantage, in comparison with the philosophies of the system-builders, of being able to tackle its problems one at a time, instead of having to invent at one stroke a block theory of the whole universe. Its methods, in this respect, resemble those of science.

Some of the greatest philosophical developments of the modern era, both intellectually speaking and social-impact wise, were not of this clear-cut kind. Two examples seem particularly momentous:

• The triumph of naturalism, the defeat of theism, and the rise of science a.k.a “natural philosophy”.
• The expanding circle of moral consideration: to women, children, those of other races, and, to some extent, to non-human animals. (See Pinker for an extended discussion).

These changes are difficult to pin down to a specific success story. They are cases of society’s worldview shifting wholesale, over the course of centuries. With works such as Novum Organum or On the Subjection of Women, philosophising per se undoubtedly deserves a share of the credit. Yet the causality may also run the other way, from societal circumstances to ideas; technological and political developments surely had their role to play, too.

Instead I want to focus on smaller, but hopefully still significant success stories, whose causal story should hopefully be easier to extricate.

## From confusion to consensus

The successes need to be actual successes of the discipline, not just theories I think are successful. For example, consequentialism or eliminativism about caustion don’t count, since there is considerable debate about them still1. Philosophers being a contrarian bunch, I won’t require complete unanimity either, but rather a wide consensus, perhaps something like over 80% agreement among academics at analytic departments.

Relatedly, there needs to have been actual debate and/or confusion about the topic, previous to the success story. This is often the hardest desideratum to intuitively accept, since philosophical problems, once solved, tend to seem puzzlingly unproblematic. We think “How could people possibly have been confused by that?”, and we are hesitant to attribute basic misunderstandings to great thinkers of the past. I will therefore take pains to demonstrate, with detailed quotes, how each problem used to cause real confusion.

## No mere disproofs

In order to make the cases I present as strong as possible, I will adopt a narrow definition of success. Merely showing the fallacies of past thinkers does not count. Philosophy has often been able to conclusively restrict the space of possible answers by identifying certain positions as clearly wrong. For example, no-one accepts Mill’s “proof” of utilitarianism as stated, or Anselm’s ontological argument. And that is surely a kind of progress2, but I don’t want to rely on that here. When physics solved classical mechanics, it did not just point out that Aristotle had been wrong, rather it identified an extremely small area of possibility-space as the correct one. That is the level of success we want to be gunning for here. For the same reason, I also won’t count coming up with new problems, such as Goodman’s New Riddle of Induction, as progress for my purposes.

# Successes: my list so far

Here are the individual success stories, in no particular order:

1. Predicate logic: arguably launched analytic philosophy, clarified ambiguities that had held back logic for centuries
2. Computability: a rare example of an undisputed, non-trivial conceptual analysis
3. Modal logic and its possible world semantics: fully clarified the distinciton between sense and reference, dissolved long-standing debates arising from modal fallacies.
4. The formalisation of probability: how should we reason about unsure things? Before the 1650s, everyone from Plato onwards got this wrong.
5. Bayesianism: the analysis of epistemic rationality and the solution to (most of) philosophy of science.
6. Compatibilism about free will (forthcoming)

It’s very important to see these five stories as illustrations of what success looks like in philosophy. The list is not meant to be exhaustive. Nor are all five stories supposed to follow the same pattern of discovery; on the contrary, they are examples of different kinds of progress.

# Related posts

These posts don’t describe success stories, but are related:

1. Over the course of writing this series, I have frequently found to my consternation that topics I thought were prime candidates for success stories were in fact still being debated copiously. Perhaps one day I’ll publish a list of these, too. In case it wasn’t clear, by the way, this series should not be taken to mean that I am a huge fan of philosophy as an academic discipline. But I do think that, in some circles, the pendulum has swung too far towards dismissal of philosophy’s achievements.

2. In fact, there’s likely been far more of this kind of progress than you would guess from reading contemporary commentaries of philosophers of centuries past, as Dustin Locke argues here

# Modesty and diversity: a concrete suggestion

November 8, 2017

In online discussions, the number of upvotes or likes a contribution receives is often highly correlated with the social status of the author within that community. This makes the community less epistemically diverse, and can contribute to feelings of groupthink or hero worship.

Yet both the author of a contribution and its degree of support contain bayesian evidence about its value. If the author is a widely respected expert, the amount of evidence is arguably so large that it should overwhelm your own inside view.

We want each individual to invest the socially optimal amount of resources into critically evaluating other people’s writing (which is higher than the amount that would be optimal for individual epistemic rationality). Yet we also all and each want to give sufficient weight to authority in forming our all-things-considered views.

As Greg Lewis writes:

The distinction between ‘credence by my lights’ versus ‘credence all things considered’ allows the best of both worlds. One can say ‘by my lights, P’s credence is X’ yet at the same time ‘all things considered though, I take P’s credence to be Y’. One can form one’s own model of P, think the experts are wrong about P, and marshall evidence and arguments for why you are right and they are wrong; yet soberly realise that the chances are you are more likely mistaken; yet also think this effort is nonetheless valuable because even if one is most likely heading down a dead-end, the corporate efforts of people like you promises a good chance of someone finding a better path.

Full blinding to usernames and upvote counts is great for critical thinking. If all you see is the object level, you can’t be biased by anything else. The downside is you lose a lot of relevant information. A second downside is that anonymity reduces the selfish incentives to produce good content (we socially reward high-quality, civil discussion, and punish rudeness.)

I have a suggestion for capturing (some of) the best of both worlds:

• first, do all your reading, thinking, upvoting and commenting with full blinding
• once you have finished, un-blind yourself and use the new information to
• form your all-things-considered view of the topic at hand
• update your opinion of the people involved in the discussion (for example, if someone was a jerk, you lower your opinion of them).

To enable this, there are now two user scripts which hide usernames and upvote counts on (1) the EA forum and (2) LessWrong 2.0. You’ll need to install the Stylish browser extension to use them.

November 8, 2017

# A tension between bayesiansim and intuition

When considering arguments from authority, there would appear to be a tension between widely shared intuitions about these arguments, and how Bayesianism treats them. Under the Bayesian definition of evidence, the opinion of experts, of people with good track records, even of individuals with a high IQ, is just another source of data. Provided the evidence is equally strong, there is nothing to distinguish it from other forms of inference such as carefully gathering data, conducting experiments, and checking proofs.

Yet we feel that there would be something wrong about someone who entirely gave up on learning and thinking, in favour the far more efficient method unquestionably adopting all expert views. Personally, I still feel embarassed when, in conversation, I am forced to say “I believe X because Very Smart Person Y said it”.

And it’s not just that we think it unvirtuous. We strongly associate arguments from authority with irrationality. Scholastic philosophy went down a blind alley by worshipping the authority of Aristotle. We think there is something espistemicaly superior about thinking for yourself, enough to justify the effort, at least sometimes.1

# Attempting to reconcile the tension

## Argument screens of authority

Eliezer Yudkowsky has an excellent post, “Argument screens off authority”, about this issue. You should read it to understand the rest of my post, which will be an extension of it.

I’ll give you the beginning of the post:

Scenario 1: Barry is a famous geologist. Charles is a fourteen-year-old juvenile delinquent with a long arrest record and occasional psychotic episodes. Barry flatly asserts to Arthur some counterintuitive statement about rocks, and Arthur judges it 90% probable. Then Charles makes an equally counterintuitive flat assertion about rocks, and Arthur judges it 10% probable. Clearly, Arthur is taking the speaker’s authority into account in deciding whether to believe the speaker’s assertions.

Scenario 2: David makes a counterintuitive statement about physics and gives Arthur a detailed explanation of the arguments, including references. Ernie makes an equally counterintuitive statement, but gives an unconvincing argument involving several leaps of faith. Both David and Ernie assert that this is the best explanation they can possibly give (to anyone, not just Arthur). Arthur assigns 90% probability to David’s statement after hearing his explanation, but assigns a 10% probability to Ernie’s statement. Read more

I think Yudkowsky’s post gets things conceptually right, but ignores the important pragmatic benefits of arguments from authority. At the end of the post, he writes:

In practice you can never completely eliminate reliance on authority. Good authorities are more likely to know about any counterevidence that exists and should be taken into account; a lesser authority is less likely to know this, which makes their arguments less reliable. This is not a factor you can eliminate merely by hearing the evidence they did take into account.

It’s also very hard to reduce arguments to pure math; and otherwise, judging the strength of an inferential step may rely on intuitions you can’t duplicate without the same thirty years of experience.

And elsewhere:

Just as you can’t always experiment today, you can’t always check the calculations today. Sometimes you don’t know enough background material, sometimes there’s private information, sometimes there just isn’t time. There’s a sadly large number of times when it’s worthwhile to judge the speaker’s rationality. You should always do it with a hollow feeling in your heart, though, a sense that something’s missing.

These two quotes, I think, overstate how often checking for yourself2 is a worthwhile option, and correspondingly underjustify the claim that you should have a “hollow feeling in your heart” when you rely on authority.

## Ain’t nobody got time for arguments

Suppose you were trying to decide which diet is best for your long-term health. The majority of experts believe that the Paleo diet is better than the Neo diet. To simplify, we can assume that either Paleo provides $V$ units more utility than Neo, or vice versa. The cost of research is $C$. If you conduct research, you act according to your conclusions, otherwise, you do what the experts recommend. We can calculate the expected value of research using this value of information diagram:

$EV(research)$ simplifies to $Vpq-Vkp+Vk-C$.

If we suppose that

• the probability that the experts are correct is $p = 0.75$
• conditional on the experts being correct, your probability of getting the right answer is $q = 0.9$
• conditional on the experts being incorrect, your probability of correctly overturning the expert view is $k = 0.5$

How long would it take to do this research? For a 50% chance of overturning the consensus, conditional on it being wrong, a realistic estimate might be several years to get a PhD-level knowledge in the field. But let’s go with one month, as a lower bound. We can conservatively estimate that to be worth \$ 5000. Then you should do the research if and only if $V > 80,000$. That number is high. This suggests it would likely be instrumentally rational to just believe the experts.

Of course, this is just one toy example with very questionable numbers. (In a nascent field, such as wild animal suffering research, the “experts” may be people who know little more than you. Then $p$ could be low and $k$ could be higher.) I invite you to try your own parameter estimates.

There are also a number of complications not captured in this model:

• If the relevant belief is located in a dense part of your belief-network, where it is connected to many other beliefs, adopting the views of experts on individual questions might leave you with inconsistent beliefs. But this problem can be avoided by choosing belief-nodes that are relatively isolated, and by adopting entire world-views of experts, composed of many linked beliefs.
• In reality, you don’t just have a point probability for the parameters $p$, $q$, $k$, but a probability distribution. That distribution may be very non-robust or, in other words, “flat”. Doing a little bit of research could help you learn more about whether experts are likely to be correct, tightening the distribution.

Still, I would claim that the model is not sufficiently wrong to reverse my main conclusion.

At least given numbers I find intuitive, this model suggests it’s almost never worth thinking independently instead of acting on the views of the best authorities. Perhaps thinking critically should leave me with a hollow feeling in my heart, the feeling of goals ill-pursed? Argument may screen off authority, but in the real world, ain’t nobody got time for arguments. More work needs to be done if we want to salvage our anti-authority intuitions in a Bayesian framework.

## Free-riding on authority?

Here’s one attempt to do so. From a selfish individual’s point of view, V is small. But not so for a group.

Assuming that others can see when you pay the cost to acquire evidence, they come to see you as an authority, to some degree. Every member of the group thus updates their beliefs slightly based on your research, in expectation moving towards the truth.

More importantly, the value of the four outcomes from the diagram above can differ drastically under this model. In particular, the value of correctly overturning the expert consensus can be tremendous. If you publish your reasoning, the experts who can understand it may update strongly towards the truth, leading the non-experts to update as well.

It is only if we consider the positive externalities of knowledge that eschewing authority becomes rational. For selfish individuals, it is rational to free-ride on expert opinion. This suggests that our aversion to arguments from authority can partially be explained as the epistemic analogue of our dislike for free-riders.

This analysis also suggests that most learning and thinking is not done to personally acquire more accurate beliefs. It may be out of altruism, for fun, to signal intelligence, or to receive status in a community that rewards discoveries, like academia.

Is the free-riding account of our anti-authority intuitions accurate? In a previous version of this essay, I used to think so. But David Moss commented:

Even in a situation where an individual is the only non-expert, say there are only five other people and they are all experts, I think the intuition against deferring to epistemic authority would remain strong. Indeed I expect it may be even stronger than it usually is. Conversely, in a situation where there are many billions of non-experts all deferring to only a couple of experts, I expect the intuition against deferring would remain, though likely be weaker. This seems to count against the intuition being significantly driven by positive epistemic externalities.

This was a great point, and convinced me that at the very least, the free-riding picture can’t fully explain our anti-authority intuitions. However, my intuitions about more complicated cases like David’s are quite unstable; and at this point my intuitions are heavily influenced by bayesian theory as well. So it would be interesting to get more thoughtful people’s intuitions about such cases.

# What to do?

It looks like the common-sense intuitions against authority are hard to salvage. Yet this empirical conclusion does not imply that, normatively, we should entirely give up on learning and thinking.

Instead the cost-benefit analysis above offers a number of slightly different normative insights:

• The majority of the value of research is altruistic value, and is realised through changing the minds of others. This may lead you to: (i) choose questions that are action-guiding for many people, even if they are not for you (ii) present your conclusions in a particularly accessible format.
• Specialisation is beneficial. It is an efficient division of labour if each person acquires knowledge in one field, and everyone accepts the authority of the specialists over their magisterium.
• Reducing C can have large benefits for an epistemic community by allowing far more people to cheaply verify arguments. This could be one reason formalisation is so useful, and has tended to propel formal disciplines towards fast progress. To an idealised solitary scientist, translating into formal language arguments he already knows with high confidence to be sound may seem like a waste of time. But the benefit of doing so is that it replaces intuitions others can’t duplicate without thirty years of experience with inferential steps that they can check mechanically with a “dumb” algorithm.

A few months after I wrote the first version of this piece, Grew Lewis wrote (my emphasis):

Modesty could be parasitic on a community level. If one is modest, one need never trouble oneself with any ‘object level’ considerations at all, and simply cultivate the appropriate weighting of consensuses to defer to. If everyone free-rode like that, no one would discover any new evidence, have any new ideas, and so collectively stagnate. Progress only happens if people get their hands dirty on the object-level matters of the world, try to build models, and make some guesses - sometimes the experts have gotten it wrong, and one won’t ever find that out by deferring to them based on the fact they usually get it right.

The distinction between ‘credence by my lights’ versus ‘credence all things considered’ allows the best of both worlds. One can say ‘by my lights, P’s credence is X’ yet at the same time ‘all things considered though, I take P’s credence to be Y’. One can form one’s own model of P, think the experts are wrong about P, and marshall evidence and arguments for why you are right and they are wrong; yet soberly realise that the chances are you are more likely mistaken; yet also think this effort is nonetheless valuable because even if one is most likely heading down a dead-end, the corporate efforts of people like you promises a good chance of someone finding a better path.

I probably agree with Greg here; and I believe that the bolded part was a crucial and somewhat overlooked part of his widely-discussed essay. While Greg believes we should form our credences entirely based on authority, he also believes it can be valuable to deeply explore object-level questions. The much more difficult question is how to navigate this trade-off, that is, how to decide when it’s worth investigating an issue.

1. This is importantly different from another concern about updating based on other people’s beliefs, that of double counting evidence or evidential overlap. Amanda Askell writes: “suppose that as I’m walking down the street I meet six people in a row who all tell me that a building four blocks away is on fire. I reasonably assume that some of these six people have seen the fire themselves or that they’ve heard that there’s a fire from different people who have seen it. I conclude that I’ve got good testimonial evidence that there’s a fire four blocks away. But suppose that none of them have seen the fire: they’ve all just left a meeting in which a charismatic person Bob told them that there is a fire four blocks away. If I knew that there wasn’t actually any more evidence for the fire claim than Bob’s testimony, I would not have been so confident that there’s a fire four blocks away.

In this case, the credence that I ended up with was based on the testimony of those six people, which I reasonably assumed represented a diverse body of evidence. This means that anyone asking me what makes me confident that there’s a fire will also receive misleading evidence that there’s a diverse body of evidence for the fire claim. This is a problem of evidential overlap: when several people independently tell me that they have some credence in P, I have a reasonable prior about how much overlap there is in their evidence. But in cases like the one above, that prior is incorrect.”

The problem of evidential overlap stems from reasonable-seeming but incorrect priors about the truth of a proposition, conditional on (the conjunction of) various testimonies. The situations I want to talk about concern agents with entirely correct priors, who update on testimony the adequate Bayesian amount. In my case the ideal Bayesian behaves counterintuitively, in Amanda’s example, Bayesianism and intuition agree since bad priors lead to bad beliefs.

2. In this post, I use “checking for yourself”, “thinking for yourself”, “thinking and learning”, etc., as a stand-in for anything that helps evaluate the truth-value of the “good argument” node in Yudkowsky’s diagram. This could include gathering empirical evidence, checking arguments and proofs, as well as acquiring the skills necessary to do this.