Philosophy success story V: Bayesianism

March 31, 2018

This is part of my series on success stories in philosophy. See this page for an explanation of the project and links to other items in the series.

Contents

  1. Bayesianism: the correct theory of rational inference
    1. Probabilism
    2. Conditionalisation
    3. Justifications for probabilism and conditionalisation
      1. Dutch book arguments
      2. Cox’s theorem
      3. Obviousness argument
  2. Science as a special case of rational inference
  3. Previous theories of science
    1. Hypothetico-deductivism
      1. Hypothetico-deductivism and the problem of irrelevant conjunction
    2. Instance confirmation
      1. Instance confirmation and the paradox of the ravens
      2. Bootstrapping and relevance relations
    3. Falsificationism
  4. The Quine-Duhem problem
  5. Uncertain judgements and value of information (resilience)
  6. Issues around Occam’s razor

Bayesianism: the correct theory of rational inference

Unless specified otherwise, by “Bayesianism” I mean normative claims constraining rational credences (degrees of belief), not any descriptive claim. Bayesianism so understood has, I claim, consensus support among philosophers. It has two core claims: probabilism and conditionalisation.

Probabilism

What is probabilism? (Teruji Thomas, Degrees of Belief, Part I: degrees of belief and their structure.)

Suppose that Clara has some confidence that is true. Then, in so far as Clara is rational:

  1. We can quantify credences: we can represent Clara’s credence in by a number, . The higher the number, the more confident Clara is that is true.
  2. More precisely, we can choose these numbers to fit together in a certain way: they satisfy the probability axioms, that is, they behave like probabilities do: (a) is always between 0 and 1. (b) (c) .

Conditionalisation

Suppose you gain evidence E. Let Cr be your credences just before and Cr_NEW new your credences just afterwards. Then, insofar as you are rational, for any proposition P: .1

Justifications for probabilism and conditionalisation

Dutch book arguments

The basic idea: an agent failing to use probabilism or conditionalisation can be made to accept a series of bets that will lead to a sure loss (such a series of bets is called a dutch book).

I won’t go into detail here, as this has been explained very well in many places. See for instance, Teruji Thomas, Degrees of Belief II or Earman, Bayes or Bust Chapter 2.

Cox’s theorem

Bayes or Bust, Chapter 2, p 45: cox

Jaynes (2011, 1.7 p.17) thinks the axioms formalise “qualitative correspondence with common sense” — but his argument is sketchy and I rather agree with Earman that the assumptions of Cox’s theorem do not recommend themselves with overwhelming force.

Obviousness argument

Dutch books and Cox’s theorem aside, there’s something to be said for the sheer intuitive plausibility of probabilism and conditionalisation. If you want to express your beliefs as a number between 0 and 1, it just seems obvious that they should behave like probabilities. To me, accepting probabilism and conditionalisation outright feels more compelling than the premises of Cox’s theorem do. “Degrees of belief should behave like probabilities” seems near-tautological.

Science as a special case of rational inference

Philosophers have long realised that science was extremely successful: predicting the motions of the heavenly bodies, building aeroplanes, producing vaccines, and so on. There must be a core principle underlying the disparate activities of scientists — measuring, experimenting, writing equations, going to conferences, etc. So they set about trying to find this core principle, in order to explain the success of science (the descriptive project) and to apply the core principle more accurately and more generally (normative project). This was philosophy of science.

Scientists are presitigious people in universities. Science, lab coats and all, seems like a specific activity separate from normal life. So it seemed natural that there should be a philosophy of science. This turned out to be a blind alley. The solution to philosophy of science was to come from a far more general theory — the theory of rational inference. This would reveal science as merely a watered-down special case of rational inference.

We will now see how Bayesianism solves most of the problems philosophers of science were preoccupied with. As far as I can tell, this view has wide acceptance among philosophers.

Now let’s review how people were confused and how Bayesianism dissolved the confusion.

Previous theories of science

Hypothetico-deductivism

SEP:

In a seminal essay on induction, Jean Nicod (1924) offered the following important remark:

Consider the formula or the law: F entails G. How can a particular proposition, or more briefly, a fact affect its probability? If this fact consists of the presence of G in a case of F, it is favourable to the law […]; on the contrary, if it consists of the absence of G in a case of F, it is unfavourable to this law. (219, notation slightly adapted)

SEP:

The central idea of hypothetico-deductive (HD) confirmation can be roughly described as “deduction-in-reverse”: evidence is said to confirm a hypothesis in case the latter, while not entailed by the former, is able to entail it, with the help of suitable auxiliary hypotheses and assumptions. The basic version (sometimes labelled “naïve”) of the HD notion of confirmation can be spelled out thus:

For any such that is consistent:

  • HD-confirms relative to if and only if and ;

  • HD-disconfirms relative to if and only if , and ;

  • is HD-neutral for hypothesis relative to otherwise.

Hypothetico-deductivism and the problem of irrelevant conjunction

SEP:

The irrelevant conjunction paradox. Suppose that confirms relative to (possibly empty) . Let statement e logically consistent with , but otherwise ntirely irrelevant for all of those conjuncts. Does confirm (relative to ) as it does with ? One would want to say no, and this implication can be suitably reconstructed in Hempel’s theory. HD-confirmation, on the contrary, can not draw yhis distinction: it is easy to show that, on the conditions specified, if the HD clause for confirmation is satisfied for and (given ), so it is for and (given ). (This is simply because, if , then , too, by the monotonicity of classical logical entailment.)

The Bayesian solution:

In the statement below, indicating this result, the irrelevance of for hypothesis and evidence (relative to ) is meant to amount to the probabilistic independence of from and their conjunction (given ), that is, to , and , respectively.

Confirmation upon irrelevant conjunction (ordinal solution) (CIC)
For any and any if confirms relative to and is irrelevant for and relative to , then</p>

So, even in case it is qualitatively preserved across the tacking of onto , the positive confirmation afforded by is at least bound to quantitatively decrease thereby.

Instance confirmation

Bayes or Bust (p. 63):

When Carl Hempel published his seminal “Studies in the Logic of Confir- mation” (1945), he saw his essay as a contribution to the logical empiricists’ program of creating an inductive logic that would parallel and comple- ment deductive logic. The program, he thought, was best carried out in three stages: the first stage would provide an explication of the qualitative concept of confirmation (as in ‘E confirms H’); the second stage would tackle the comparative concept (as in ‘E confirms H more than E’ confirms H”); and the final stage would concern the quantitative concept (as in ‘E confirms H to degree r’). In hindsight it seems clear (at least to Bayesians) that it is best to proceed the other way around: start with the quantitative concept and use it to analyze the comparative and qualitative notions. […]

Hempel’s basic idea for finding a definition of qualitative confirmation satisfying his adequacy conditions was that a hypothesis is confirmed by its positive instances. This seemingly simple and straightforward notion turns out to be notoriously difficult to pin down. Hempel’s own explica— tion utilized the notion of the development of a hypothesis for a finite set I of individuals. Intuitively, is what asserts about a domain consisting ofjust the individuals in . Formally, for a quantified is arrived at by peeling off universal quantifiers in favor of conjunctions over I and existential quantifiers in favor of disjunctions over I . Thus, for example, if and H is (e.g., “Everybody loves somebody”), is . We are now in a position to state the main definition[] that constitute[s] Hempel’s account:

  • E directly Hempel-confirms H iff , where is the class of individuals mentioned in .

It’s easy to check that Hempel’s instance confirmation, like Bayesiansim, successfully avoids the paradox or irrelevant conjunction. But it’s famously vulnerable to the following problem case.

Instance confirmation and the paradox of the ravens

The ravens paradox (Hempel 1937, 1945). Consider the following statements:

  • , i.e., all ravens are black;

  • , i.e., is a black raven;

  • , i.e., is a non-black non-raven (say, a green apple).

Is hypothesis confirmed by and alike? One would want to say no, but Hempel’s theory is unable to draw this distinction. Let’s see why.

As we know, (directly) Hempel-confirms , according to Hempel’s reconstruction of Nicod. By the same token, (directly) Hempel-confirms the hypothesis that all non-black objects are non-ravens, i.e., . But ( and are just logically equivalent). So, (the observation report of a non-black non-raven), like (black raven), does (indirectly) Hempel-confirm (all ravens are black). Indeed, as entails , it can be shown that is (directly) Hempel-confirmed by the observation of any object that is not a raven (an apple, a cat, a shoe, or whatever), apparently disclosing puzzling “prospects for indoor ornithology” (Goodman 1955, 71).

Just as HD, Bayesian relevance confirmation directly implies that confirms given and confirms given (provided, as we know, that and That’s because and But of course, to have confirmed, sampling ravens and finding a black one is intuitively more significant than failing to find a raven while sampling the enormous set of the non-black objects. That is, it seems, because the latter is very likely to obtain anyway, whether or not is true, so that is actually quite close to unity. Accordingly, (SP) implies that is indeed more strongly confirmed by given than it is by given —that is, —as long as the assumption applies.

Bootstrapping and relevance relations

In a pre-Bayesian attempt to solve the problem of the ravens, people developed some complicated and ultimately unconvincing theories.

SEP:

To overcome the latter difficulty, Clark Glymour (1980a) embedded a refined version of Hempelian confirmation by instances in his analysis of scientific reasoning. In Glymour’s revision, hypothesis h is confirmed by some evidence e even if appropriate auxiliary hypotheses and assumptions must be involved for e to entail the relevant instances of h. This important theoretical move turns confirmation into a three-place relation concerning the evidence, the target hypothesis, and (a conjunction of) auxiliaries. Originally, Glymour presented his sophisticated neo-Hempelian approach in stark contrast with the competing traditional view of so-called hypothetico-deductivism (HD). Despite his explicit intentions, however, several commentators have pointed out that, partly because of the due recognition of the role of auxiliary assumptions, Glymour’s proposal and HD end up being plagued by similar difficulties (see, e.g., Horwich 1983, Woodward 1983, and Worrall 1982).

Falsificationism

“statements or systems of statements, in order to be ranked as scientific, must be capable of conflicting with possible, or conceivable observations” (Popper 1962, 39).

SEP:

For Popper […] the important point was not whatever confirmation successful prediction offered to the hypotheses but rather the logical asymmetry between such confirmations, which require an inductive inference, versus falsification, which can be based on a deductive inference. […]

Popper stressed that, regardless of the amount of confirming evidence, we can never be certain that a hypothesis is true without committing the fallacy of affirming the consequent. Instead, Popper introduced the notion of corroboration as a measure for how well a theory or hypothesis has survived previous testing.

Popper was clearly onto something, as in his critique of psychoanalysis:

Neither Freud nor Adler excludes any particular person’s acting in any particular way, whatever the outward circumstances. Whether a man sacrificed his life to rescue a drowning child (a case of sublimation) or whether he murdered the child by drowning him (a case of repression) could not possibly be predicted or excluded by Freud’s theory; the theory was compatible with everything that could happen.

But his stark asymmetry between logically disproving a theory and “corroborating” it was actually a mistake. And it led to many problems.

First, successful science often did not involve rejecting a theory as disproven when it failed an empirical test. SEP:

Originally, Popper thought that this meant the introduction of ad hoc hypotheses only to save a theory should not be countenanced as good scientific method. These would undermine the falsifiabililty of a theory. However, Popper later came to recognize that the introduction of modifications (immunizations, he called them) was often an important part of scientific development. Responding to surprising or apparently falsifying observations often generated important new scientific insights. Popper’s own example was the observed motion of Uranus which originally did not agree with Newtonian predictions, but the ad hoc hypothesis of an outer planet explained the disagreement and led to further falsifiable predictions.

Second, Popper’s idea of corroboration was intolerably vague. A theory is supposed to be well-corroborated if it stuck its neck out by being falsifiable, and has resisted falsification for a long time. But how, for instance, do we compare how well-corroborated two theories are? And how are we supposed to act in the meantime, when there are still several contending theories? The intuition is that well-tested theories should have higher probability, but Popper’s “corroboration” idea is ill-equipped to account for this.

Bayesianism dissolves these problems, but captures the grain of truth in falsificationism. I’ll just quote from the Arbital page on the bayesian view of scientific virtues, which is despite its silly style is excellent, and should probably be read in full.

In a Bayesian sense, we can see a hypothesis’s falsifiability as a requirement for obtaining strong likelihood ratios in favor of the hypothesis, compared to, e.g., the alternative hypothesis “I don’t know.”

Suppose you’re a very early researcher on gravitation, named Grek. Your friend Thag is holding a rock in one hand, about to let it go. You need to predict whether the rock will move downward to the ground, fly upward into the sky, or do something else. That is, you must say how your theory assigns its probabilities over and

As it happens, your friend Thag has his own theory which says “Rocks do what they want to do.” If Thag sees the rock go down, he’ll explain this by saying the rock wanted to go down. If Thag sees the rock go up, he’ll say the rock wanted to go up. Thag thinks that the Thag Theory of Gravitation is a very good one because it can explain any possible thing the rock is observed to do. This makes it superior compared to a theory that could only explain, say, the rock falling down.

As a Bayesian, however, you realize that since and are mutually exclusive and exhaustive possibilities, and something must happen when Thag lets go of the rock, the conditional probabilities must sum to

If Thag is “equally good at explaining” all three outcomes - if Thag’s theory is equally compatible with all three events and produces equally clever explanations for each of them - then we might as well call this probability for each of and . Note that Thag theory’s is isomorphic, in a probabilistic sense, to saying “I don’t know.”

But now suppose Grek make falsifiable prediction! Grek say, “Most things fall down!”

Then Grek not have all probability mass distributed equally! Grek put 95% of probability mass in Only leave 5% probability divided equally over and in case rock behave like bird.

Thag say this bad idea. If rock go up, Grek Theory of Gravitation disconfirmed by false prediction! Compared to Thag Theory that predicts 1/3 chance of will be likelihood ratio of 2.5% : 33% ~ 1 : 13 against Grek Theory! Grek embarrassed!

Grek say, she is confident rock does go down. Things like bird are rare. So Grek willing to stick out neck and face potential embarrassment. Besides, is more important to learn about if Grek Theory is true than to save face.

Thag let go of rock. Rock fall down.

This evidence with likelihood ratio of 0.95 : 0.33 ~ 3 : 1 favoring Grek Theory over Thag Theory.

“How you get such big likelihood ratio?” Thag demand. “Thag never get big likelihood ratio!”

Grek explain is possible to obtain big likelihood ratio because Grek Theory stick out neck and take probability mass away from outcomes and risking disconfirmation if that happen. This free up lots of probability mass that Grek can put in outcome to make big likelihood ratio if happen.

Grek Theory win because falsifiable and make correct prediction! If falsifiable and make wrong prediction, Grek Theory lose, but this okay because Grek Theory not Grek.

The Quine-Duhem problem

SEP:

Duhem (he himself a supporter of the HD view) pointed out that in mature sciences such as physics most hypotheses or theories of real interest can not be contradicted by any statement describing observable states of affairs. Taken in isolation, they simply do not logically imply, nor rule out, any observable fact, essentially because (unlike “all ravens are black”) they involve the mention of unobservable entities and processes. So, in effect, Duhem emphasized that, typically, scientific hypotheses or theories are logically consistent with any piece of checkable evidence. […]

Let us briefly consider a classical case, which Duhem himself thoroughly analyzed: the wave vs. particle theories of light in modern optics. Across the decades, wave theorists were able to deduce an impressive list of important empirical facts from their main hypothesis along with appropriate auxiliaries, diffraction phenomena being only one major example. But many particle theorists’ reaction was to retain their hypothesis nonetheless and to reshape other parts of the “theoretical maze” (i.e., k; the term is Popper’s, 1963, p. 330) to recover those observed facts as consequences of their own proposal.

Quine took this idea to its radical conclusion with his confirmation holism. Wikipedia:

Duhem’s idea was, roughly, that no theory of any type can be tested in isolation but only when embedded in a background of other hypotheses, e.g. hypotheses about initial conditions. Quine thought that this background involved not only such hypotheses but also our whole web-of-belief, which, among other things, includes our mathematical and logical theories and our scientific theories. This last claim is sometimes known as the Duhem–Quine thesis. A related claim made by Quine, though contested by some (see Adolf Grünbaum 1962), is that one can always protect one’s theory against refutation by attributing failure to some other part of our web-of-belief. In his own words, “Any statement can be held true come what may, if we make drastic enough adjustments elsewhere in the system.”

Bayes or Bust p 73:

It makes a nice sound when it rolls off the tongue to say that our claims about the physical world face the tribunal of experience not individually but only as a corporate body. But scientists, no less than business executives, do not typically act as if they are at a loss as to how to distribute praise through the corporate body when the tribunal says yea, or blame when the tribunal says nay. This is not to say that there is always a single correct way to make the distribution, but it is to say that in many cases there are firm intuitions.

Howson and Urbach 2006 (p 108):

We shall illustrate the argument through a historical example that Lakatos (1970, pp. 138-140; 1968, pp. l74-75) drew heavily upon. In the early nineteenth century, William Prout (1815, 1816), a medical practitioner and chemist, advanced the idea that the atomic weight of every element is a whole-number multiple of the atomic weight of hydrogen, the underlying assumption being that all matter is built up from different combinations of some basic element. Prout believed hydrogen to be that fundamental building block. Now many of the atomic weights recorded at the time were in fact more or less integral multiples of the atomic weight of hydrogen, but some deviated markedly from Prout’s expectations. Yet this did not shake the strong belief he had in his hypothesis, for in such cases he blamed the methods that had been used to measure those atomic weights. Indeed, he went so far as o adjust the atomic weight of the element chlorine, relative to that f hydrogen, from the value 35.83, obtained by experiment, to 36, he nearest whole number. […]

Prout’s hypothesis t, together with an appropriate assumption a, asserting the accuracy (within specified limits) of the measuring techniques, the purity of the chemicals employed, and so forth , implies that the ratio of the measured atomic weights of chlorine and hydrogen will approximate (to a specified degree) a whole number. In 1815 that ratio was reported as 35.83-call this the evidence e-a value judged to be incompatible with the conjunction of t and a. The posterior and prior probabilities of t and of a are related by Bayes’s theorem, as follows:

[…] Consider first the prior probabilities of and of . J.S. Stas, a distinguished Belgian chemist whose careful atomic weight measurements were highly influential, gives us reason to think that chemists of the period were firmly disposed to believe in t. […] It is less easy to ascertain how confident Prout and his contemporaries were in the methods used to measure atomic weights, but their confidence was probably not great, in view of the many clear sources of error. […] On the other hand, the chemists of the time must have felt that that their atomic weight measurements were more likely to be accurate than not, otherwise they would hardly have reported them. […] For these reasons, we conjecture that was in the neighbourhood of 0.6 and that was around 0.9, and these are the figures we shall work with. […]

We will follow Dorling in taking and to be independent, viz, and hence, . As Dorling points out (1996), this independence assumption makes the calculations simpler but is not crucial to the argument. […]

Finally, Bayes’s theorem allows us to derive the posterior probabilities in which we are interested:

(Recall that and ) We see then that the evidence provided by the measured atomic weight of chlorine affects Prout’s hypothesis and the set of auxiliary hypotheses very differently; for while the probability of the first is scarcely changed, that of the second is reduced to a point where it has lost all credibility

Uncertain judgements and value of information (resilience)

Crash course in state spaces and events: There is a set of states which represents the ways the world could be. Sometimes is described as the set of “possible worlds” (SEP). An event is a subset of . There are many states of the world where Labour wins the next election. The event “Labour wins the next election” is the set of these states.

Here is the important point: a single numerical probability for event is not just the probability you assign to one state of the world. It’s a sum over the probabilities assigned to states in . We should think of ideal Bayesians as having probability distributions over the state space, not just scalar probabilities for events.

This simple idea is enough to cut through many decades of confusion. SEP:

probability theory seems to impute much richer and more determinate attitudes than seems warranted. What should your rational degree of belief be that global mean surface temperature will have risen by more than four degrees by 2080? Perhaps it should be 0.75? Why not 0.75001? Why not 0.7497? Is that event more or less likely than getting at least one head on two tosses of a fair coin? It seems there are many events about which we can (or perhaps should) take less precise attitudes than orthodox probability requires. […] As far back as the mid-nineteenth century, we find George Boole saying:

It would be unphilosophical to affirm that the strength of that expectation, viewed as an emotion of the mind, is capable of being referred to any numerical standard. (Boole 1958 [1854]: 244)

People have long thought there is a distinction between risk (probabilities different from 0 or 1) and ambiguity (imprecise probabilities):

One classic example of this is the Ellsberg problem (Ellsberg 1961).

I have an urn that contains ninety marbles. Thirty marbles are red. The remainder are blue or yellow in some unknown proportion.

Consider the indicator gambles for various events in this scenario. Consider a choice between a bet that wins if the marble drawn is red (I), versus a bet that wins if the marble drawn is blue (II). You might prefer I to II since I involves risk while II involves ambiguity. A prospect is risky if its outcome is uncertain but its outcomes occur with known probability. A prospect is ambiguous if the outcomes occur with unknown or only partially known probabilities.

To deal with purported ambiguity, people developed models where the probability lies in some range. These probabilities were called “fuzzy” or “mushy”.

Evidence can be balanced because it is incomplete: there simply isn’t enough of it. Evidence can also be balanced if it is conflicted: different pieces of evidence favour different hypotheses. We can further ask whether evidence tells us something specific—like that the bias of a coin is 2/3 in favour of heads—or unspecific—like that the bias of a coin is between 2/3 and 1 in favour of heads.

Fuzzy probabilities gave rise to a number of problem cases, which, predictably engendered a wide literature. The SEP article notes the problems of:

  1. Dilation (Imprecise probabilists violate the relfection principle)
  2. Belief intertia (How do we learn from an imprecise prior?)
  3. Decision making (How should an imprecise probabilist act? Can she avoid dutch books?)

A PhilPapers search indicates that at least 65 papers have been published on these topics.

The Bayesian solution is simply: when you are less confident, you have a flatter probability distribution, though it may have the same mean. Flatter distributions move more in response to evidence. They are less resilient. See Skyrms (2011) or Leitgeb (2014). It’s not surprising that single probabilities don’t adequately describe your evidential state, since they are summary statistics over a distribution.

Issues around Occam’s razor

SEP distinguishes three questions about simplicity:

(i) How is simplicity to be defined? [Definition]

(ii) What is the role of simplicity principles in different areas of inquiry? [Usage]

(iii) Is there a rational justification for such simplicity principles? [Justification]

The Bayesian solution to (i) is to formalise Occam’s razor as a statement about which priors are better than others. Occam’s razor is not, as many philosophers have thought, a rule of inference, but a constraint on prior belief. One should have a prior that assigns higher probability to simpler worlds. SEP:

Jeffreys argued that “the simpler laws have the greater prior probability,” and went on to provide an operational measure of simplicity, according to which the prior probability of a law is , where k = order + degree + absolute values of the coefficients, when the law is expressed as a differential equation (Jeffreys 1961, p. 47).

Since then, the definition of simplicity has been further formalised using algorithmic information theory. The very informal gloss is that we formalise hypotheses as by the shortest computer program that can fully describe them, and our prior weights each hypothesis by its simplicity (, where is the program length).

This algorithmic formalisation, finally, sheds light on the limits of this understanding of simplicity, and provides an illuminating new interpretation of Goodman’s new riddle of induction. The key idea is that we can only formalise simplicity relative to a programming language (or relative to a universal turing machine).

Hutter and Rathmanner 2011, Section 5.9 “Andrey Kolmogorov”:

Natural Turing Machines. The final issue is the choice of Universal Turing machine to be used as the reference machine. The problem is that there is still subjectivity involved in this choice since what is simple on one Turing machine may not be on another. More formally, it can be shown that for any arbitrarily complex string as measured against the UTM there is another UTM machine for which has Kolmogorov complexity . This result seems to undermine the entire concept of a universal simplicity measure but it is more of a philosophical nuisance which only occurs in specifically designed pathological examples. The Turing machine would have to be absurdly biased towards the string which would require previous knowledge of . The analogy here would be to hard-code some arbitrary long complex number into the hardware of a computer system which is clearly not a natural design. To deal with this case we make the soft assumption that the reference machine is natural in the sense that no such specific biases exist. Unfortunately there is no rigorous definition of natural but it is possible to argue for a reasonable and intuitive definition in this context.

Vallinder 2012, Section 4.1 “Language dependence”:

In section 2.4 we saw that Solomonoff’s prior is invariant under both reparametrization and regrouping, up to a multiplicative constant. But there is another form of language dependence, namely the choice of a uni- versal Turing machine.

There are three principal responses to the threat of language dependence. First, one could accept it flat out, and admit that no language is better than any other. Second, one could admit that there is language dependence but argue that some languages are better than others. Third, one could deny language dependence, and try to show that there isn’t any.

For a defender of Solomonoff’s prior, I believe the second option is the most promising. If you accept language dependence flat out, why intro- duce universal Turing machines, incomputable functions, and other need- lessly complicated things? And the third option is not available: there isn’t any way of getting around the fact that Solomonoff’s prior depends on the choice of universal Turing machine. Thus, we shall somehow try to limit the blow of the language dependence that is inherent to the framework. Williamson (2010) defends the use of a particular language by saying that an agent’s language gives her some information about the world she lives in. In the present framework, a similar response could go as follows. First, we identify binary strings with propositions or sensory observations in the way outlined in the previous section. Second, we pick a UTM so that the terms that exist in a particular agent’s language gets low Kolmogorov complexity.

If the above proposal is unconvincing, the damage may be limited some- what by the following result. Let be the Kolmogorov complexity of relative to universal Turing machine , and let be the Kolmogorov complexity of relative to Turing machine (which needn’t be universal). We have that That is: the difference in Kolmogorov complexity relative to and rela- tive to is bounded by a constant that depends only on these Turing machines, and not on . (See Li and Vitanyi (1997, p. 104) for a proof.) This is somewhat reassuring. It means that no other Turing machine can outperform infinitely often by more than a fixed constant. But we want to achieve more than that. If one picks a UTM that is biased enough to start with, strings that intuitively seem complex will get a very low Kolmogorov complexity. As we have seen, for any string it is always possible to find a UTM such that . If , the corresponding Solomonoff prior will be at least . So for any binary string, it is always possible to find a UTM such that we assign that string prior probability greater than or equal to . Thus some way of discriminating between universal Turing machines is called for.

  1. Technically, the diachronic language “just before”/”just after” is a mistake. It fails to model cases of forgetting, or loss of discriminating power of evidence. This was shown by Arntzenius (2003)

Philosophy success story IV: the formalisation of probability

March 31, 2018

Thus, joining the rigour of demonstrations in mathematics with the uncertainty of chance, and conciliating these apparently contradictory matters, it can, taking its name from both of them, with justice arrogate the stupefying name: The Mathematics of Chance (Aleae Geometria).

— Blaise Pascal, in an address to the Académie Parisienne de Mathématiques, 1654

Researchers in the field have wondered why the development of probability theory was so slow—especially why the apparently quite simple mathematical theory of dice throwing did not appear until the 1650s. The main part of the answer lies in appreciating just how diffi- cult it is to make concepts precise.

— James Franklin, The Science of Conjecture

Wherefore in all great works are Clerkes so much desired? Wherefore are Auditors so richly fed? What causeth Geometricians so highly to be enhaunsed? Why are Astronomers so greatly advanced? Because that by number such things they finde, which else would farre excell mans minde.

— Robert Recorde, Arithmetic (1543)

This is part of my series on success stories in philosophy. See this page for an explanation of the project and links to other items in the series.

Contents

  1. How people were confused
    1. Degrees of belief
    2. Probability as a binary property
    3. Ordinal probability
    4. Stakes-sensitivity
    5. The problem of points
  2. Pascal and Fermat’s solution
  3. Extensions
    1. Handing over to mathematics
    2. Axiomatisation
  4. Counter-intuitive implications of probability theory
    1. The conjunction fallacy
    2. The monty hall problem
    3. The mammography problem

How people were confused

Degrees of belief

The first way to get uncertainty spectacularly wrong is given to us by Plato, who outright rejects non-certain reasoning (The Science of Conjecture: Evidence and Probability Before Pascal, James Franklin):

Plato has Socrates say to Theaetetus, “You are not offering any argument or proof, but relying on likelihood (eikoti). If Theodorus, or any other geometer, were prepared to rely on likelihood when doing geometry, he would be worth nothing. So you and Theodorus must consider whether, in matters as important as these, you are going to accept arguments from plausibility and likelihood (pithanologia te kai eikosi).”

Probability as a binary property

One step in the right direction would be to accept that statements can fail to be definite truths, yet be in some sense be “more likely” than definite falsehoods. On this view, such statements have the property of being “probable”. SEP writes:

Pre-modern probability was not a number or ratio, but mainly a binary property which a proposition either had or did not have.

In this vein, Circeo wrote:

That is probable which for the most part usually comes to pass, or which is a part of the ordinary beliefs of mankind, or which contains in itself some resemblance to these qualities, whether such resemblance be true or false. (Cicero, De inventione, I.29.46)

The quote not only displays the error of thinking of probability as binary. It also shows that Cicero mixed the most promising notion of probability (that which “for the most part usually comes to pass”) with the completely different notions of ordinary belief and opinion, resulting in a general mess of confusion. According to SEP: “Until the thirteenth century, the definitions of “probable” by Cicero and Boethius very much shaped the medieval understanding of probability”.

Ordinal probability

Going further, one might realise that there are degrees of probability. With a solid helping of the principle of charity, Aristotle can be read as saying this:

Therefore it is not enough for the defendant to refute the accusation by proving that the charge is not bound to be true; he must do so by showing that it is not likely to be true. For this purpose his objection must state what is more usually true than the statement attacked.

Here is another quote:

Hence, in this proposal we have men and women, who at age 25 buy a life-long annuity for a price which they recover within eight years and although they can die within these eight years it is more probable that they live twice the time. In this way what happens more frequently and is more probable is to the advantage of the buyer. (Alexander of Alessandria, Tractatus de usuris, c. 72, Y f. 146r)

Aristotle did not realise that probabilities could be applied to chancy events, and nor did his medieval followers. According to A. Hall:

According to van Brake (1976) and Schneider (1980), Aristotle classified events into three types: (1) certain events that happen necessarily; (2) probable events that happen in most cases; and (3) unpredictable or unknowable events that happen by pure chance. Furthermore, he considered the outcomes of games of chance to belong to the third category and therefore not accessible to scientific investigation, and he did not apply the term probability to games of chance.

The cardinal notion of probability did not emerge before the seventeenth century.

Stakes-sensitivity

One can find throughout history people grasping at the intuition that when the stakes are high, unlikely things can be important. In many cases, legal scholars were interested in what to do if no definite proof of innocence or guilt can be given. Unfortunately, they invariably get the details wrong. James Franklin writes:

In the Talmud itself, the demand for a high standard of evidence in criminal cases developed into a prohibition of any uncertainty in evidence:

Witnesses in capital charges were brought in and warned: perhaps what you say is based only on conjecture, or hearsay, or is evidence from the mouth of another witness, or even from the mouth of an untrustworthy person: perhaps you are unaware that ultimately we shall scrutinize your evidence by cross-examination and inquiry? Know then that capital cases are not like monetary cases. In civil suits, one can make restitution in money, and thereby make his atonement; but in capital cases one is held responsible for his blood and the blood of his descendants till the end of the world . . . whoever destroys a single soul of Israel, scripture imputes to him as though he had destroyed a whole world . . . Our Rabbis taught: What is meant by “based only on conjecture”?—He [the judge] says to them: Perhaps you saw him running after his fellow into a ruin, you pursued him, and found him sword in hand with blood dripping from it, whilst the murdered man was writhing. If this is what you saw, you saw nothing.

Thomas Aquinas wrote:

And yet the fact that in so many it is not possible to have certitude without fear of error is no reason why we should reject the certitude which can probably be had [quae probabiliter haberi potest] through two or three witnesses … (Thomas Aquinas, Summa theologiae, II-II, q. 70, 2, 1488)

James Franklin writes:

Further reflection on the kinds of evidence short of certainty led to a word that expressed the most significant and original idea of the Glossators for probabilistic argument: half-proof (semiplena probatio). In the 1190s, this word was invented for the class of items of evidence that were neither null nor full proof. The word expresses the natural thought that, if two witnesses are in theory full proof, then one witness must be half.

The problem of points

By the renaissance, thinkers had sharpened these intuitions into a concrete problem. It took centuries of fallacies to arrive at the correct answer to this problem.

The problem of points concerns a game of chance with two players who have equal chances of winning each round. The players contribute equally to a prize pot, and agree in advance that the first player to have won a certain number of rounds will collect the entire prize. Now suppose that the game is interrupted by external circumstances before either player has achieved victory. Player 1 has won rounds and player 2 has won rounds. How does one then divide the pot fairly? (Wikipedia, The problem of points)

Before Pascal formalised the now-obvious concept of expected value, this problem was a matter of debate. The problem of points is especially clear-cut evidence that people were confused about probability, since they arrived at different numerical answers.

Anders Hald writes (Section 4.2, p. 35ff):

The division problem is presumably very old. It is first found in print by Pacioli (1494) for = 6, , and . Pacioli considers it as a problem in proportion and proposes to divide the stakes as to . […] The next attempt to solve the problem is by Cardano (1539). He shows by example that Pacioli’s proposal is ridiculous [in a game interrupted after only one round, Pacioli’s method would award the entire pot to the player with the single point, even though the outcome would be far from certain] and proceeds to give a deeper analysis of the problem. We shall return to this after a discussion of some other, more primitive, proposals. Tartaglia (1556) criticizes Pacioli and is sceptical of the possibility of finding a mathematical solution. He thinks that the problem is a juridical one. Nevertheless, he proposes that if is larger than , A should have his own stake plus the fraction of B’s stake. Assuming that the stakes are equal, the division will be as to . Forestani (1603) formulates the following rule: First A and B should each get a portion of the total stake determined by the number of games they have won in relation to the maximum duration of the play, i.e., the proportions and , as also proposed by Pacioli. But then Forestani adds that the remainder should be divided equally between them, because Fortune in the next play may reverse the results. Hence the division will be as to . Comparison with Tartaglia’s rule will show that has been replaced by . Cardano (1539) is the first to realize that the division rule should not depend on but only on the number of games each player lacks in winning, and , say. He introduces a new play where A, starting from scratch, is the winner if he wins games before B wins games, and he asks what the stakes should be for the play to be fair. He then takes for a fair division rule in the stopped play the ratio of the stakes in this new play and concludes that the division should be as to . His reasons for this result are rather obscure. Considering an example for and he writes:

He who shall win 3 games stakes 2 crowns; how much should the other stake. I say that he should stake 12 crowns for the following reasons. If he shall win only one game it would suffice that he stakes 2 crowns; and if he shall win 2 games he should stake three times as much because by winning two games he would win 4 crowns but he has had the risk of losing the second game after having won the first and therefore he ought to have a threefold compensation. And if he shall win three games his compensation should be sixfold because the difficulty is doubled, hence he should stake 12 crowns. It will be seen that Cardano uses an inductive argument. Setting B’s stake equal to 1, A’s stake becomes successively equal to , , and . Cardano then concludes that in general A’s stake should be . He does not discuss how to go from the special case to the general case , but presumably he has just used the symmetry between the players.1

Note how different this type of disagreement is from mathematical disagreements. When people reach different solutions about a “toy” problem case, and muddle through with heursitics, they are not facing a recalcitrant mathematical puzzle. They are confused on a much deeper level. Newcomb’s problem might be a good analogy.

Anders Hald also has this interesting quote:

In view of the achievements of the Greeks in mathematics and science, it is surprising that they did not use the symmetry of games of chance or the stability of relative frequencies to create an axiomatic theory of probability analogous to their geometry. However, the symmetry and stability which is obvious to us may not have been noticed in ancient times because of the imperfections of the randomizers used. David (1955, 1962) has pointed out that instead of regular dice, astragali (heel bones of hooved animals) were normally used, and Samburski (1956) remarks that in a popular game with four astragali, a certain throw was valued higher than all the others despite the fact that other outcomes have smaller probabilities, which indicates that the Greeks had not noticed the magnitudes of the corresponding relative frequencies.

Pascal and Fermat’s solution

Pascal and Fermat’s story is well known. In a famous correspondence in the 1654, they developed the basic notion of probability and expected value.

Keith Devlin (2008):

Before we take a look at their exchange and the methods it contains, let’s look at a present-day solution of the simple version of the problem. In this version, the players, Blaise and Pierre, place equal bets on who will win the best of five tosses of a fair coin. We’ll suppose that on each round, Blaise chooses heads, Pierre tails. Now suppose they have to abandon the game after three tosses, with Blaise ahead 2 to 1. How do they divide the pot? The idea is to look at all possible ways the game might have turned out had they played all five rounds. Since Blaise is ahead 2 to 1 after round three, the first three rounds must have yielded two heads and one tail. The remaining two throws can yield

HH HT TH TT

Each of these four is equally likely. In the first (H H), the final outcome is four heads and one tail, so Blaise wins; in the second and the third (H T and T H), the final outcome is three heads and two tails, so again Blaise wins; in the fourth (T T), the final outcome is two heads and three tails, so Pierre wins. This means that in three of the four possible ways the game could have ended, Blaise wins, and in only one possible play does Pierre win. Blaise has a 3-to-1 advantage over Pierre when they abandon the game; therefore, the pot should be divided 3/4 for Blaise and 1/4 for Pierre. Many people, on seeing this solution, object, saying that the first two possible endings (H H and H T) are in reality the same one. They argue that if the fourth throw gives a head, then at that point, Blaise has his three heads and has won, so there would be no fifth throw. Accordingly, they argue, the correct way to think about the end of the game is that there are actually only three possibilities, namely

H TH TT

in which case, Blaise has a 2-to-1 advantage and the pot should be divided 2/3 for Blaise and 1/3 for Pierre, not 3/4 and 1/4. This reasoning is incorrect, but it took Pascal and Fermat some time to resolve this issue. Their colleagues, whom they consulted as they wrestled with the matter, had differing opinions. So if you are one of those people who finds this alternative argument appealing (or even compelling), take heart; you are in good company (though still wrong).

The issue behind the dilemma here is complex and lies at the heart of probability theory. The question is, What is the right way to think about the future (more accurately, the range of possible futures) and model it mathematically?

The key insight was one that Cardano had already flailingly grapsed at, but was difficult to understand even for Pascal:

As I observed earlier in this chapter, Cardano had already realized that the key was to look at the number of points each player would need in order to win, not the points they had already accumulated. In the second section of his letter to Fermat, Pascal acknowledged the tricky point we just encountered ourselves, that you have to look at all possible ways the game could have played out, ignoring the fact that the players would normally stop once one person had clearly won. But Pascal’s words make clear that he found this hard to grasp, and he accepted it only because the great Fermat had explained it in his previous letter.

Elsewhere, Keith Devlin writes:

Today, we would use the word probability to refer to the focus of Pascal and Fermat’s discussion, but that term was not introduced until nearly a century after the mathematicians’ deaths. Instead, they spoke of “hazards,” or number of chances. Much of their difficulty was that they did not yet have the notion of mathematical probability—because they were in the process of inventing it.

From our perspective, it is hard to understand just why they found it so difficult. But that reflects the massive change in human thinking that their work led to. Today, it is part of our very worldview that we see things in terms of probabilities.

Extensions

Handing over to mathematics

Solving a philosophical problem is to take it out of the realm of philosophy. Once the fundamental methodology is agreed upon, the question can be spun off into its own independent field.

The development of probability is often considered part of Pascal’s mathematical rather than philosophical work. But I think the mathematisation of probability is in an important sense philosophical. In another post, I write much more about why successful philosophy often looks like mathematics in retrospect.

After Pascal and Fermat’s breakthrough, things developed very fast, highlighting once again the specificity of that ititial step.

Keith Devlin writes:

In 1654, Pascal had struggled hard to understand why Fermat counted endings of the unfinished game that would never have arisen in practice (“it is not a general method and it is good only in the case where it is necessary to play exactly a certain number of times”). Just fifteen years later, in 1669, Christiaan Huygens was using axiom-based abstract mathematics on top of statistically processed data tables to determine the probability that a sixteen-year-old young man would die before he reached thirty-six.

After the crucial first step for formalisation, probability was ripe to be handed over to mathematicians. SEP writes:

These early calculations [of Pascal, Fermay and Huygens] were considerably refined in the eighteenth century by the Bernoullis, Montmort, De Moivre, Laplace, Bayes, and others (Daston 1988; Hacking 2006; Hald 2003).

For example, the crucial idea of conditional probability was developed. According to MathOverflow, in the 1738 second edition of The Doctrine of Chances, de Moivre writes,

The Probability of the happening of two Events dependent, is the product of the Probability of the happening of one of them, by the Probability which the other will have of happening, when the first shall be consider’d as having happened; and the same Rule will extend to the happening of as many Events as may be assigned.

People began to get it, philosophically speaking. We begin to see quotes that, unlike those of Circeo, sound decidedly modern. In his book Ars conjectandi (The Art of Conjecture, 1713), Jakob Bernoulli wrote:

To conjecture about something is to measure its probability. The Art of Conjecturing or the Stochastic Art is therefore defined as the art of measuring as exactly as possible the probabilities of things so that in our judgments and actions we can always choose or follow that which seems to be better, more satisfactory, safer and more considered.

Keth Devlin writes:

Within a hundred years of Pascal’s letter, life-expectancy tables formed the basis for the sale of life annuities in England, and London was the center of a flourishing marine insurance business, without which sea transportation would have remained a domain only for those who could afford to assume the enormous risks it entailed.

Axiomatisation

Much later, probability theory was put on an unshakeable footing, with Kolomogorov’s axioms.

Counter-intuitive implications of probability theory

I’ve given many examples of how people used to be confused about probability. In case you find it hard to empathise with these past thinkers, I should remind you that even today probability theory can be hard to grasp intuitively.

The conjunction fallacy

Wikipedia:

The most often-cited example of this fallacy originated with Amos Tversky and Daniel Kahneman. Although the description and person depicted are fictitious, Amos Tversky’s secretary at Stanford was named Linda Covington, and he named the famous character in the puzzle after her.

Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations.

Which is more probable?

  1. Linda is a bank teller.
  2. Linda is a bank teller and is active in the feminist movement.

The majority of those asked chose option 2. However, the probability of two events occurring together (in “conjunction”) is always less than or equal to the probability of either one occurring alone.

The monty hall problem

Wikipedia:

Suppose you’re on a game show, and you’re given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what’s behind the doors, opens another door, say No. 3, which has a goat. He then says to you, “Do you want to pick door No. 2?” Is it to your advantage to switch your choice?

Vos Savant’s response was that the contestant should switch to the other door (vos Savant 1990a). Under the standard assumptions, contestants who switch have 2/3 chance of winning the car, while contestants who stick to their initial choice have only a 1/3 chance.

The given probabilities depend on specific assumptions about how the host and contestant choose their doors. A key insight is that, under these standard conditions, there is more information about doors 2 and 3 that was not available at the beginning of the game, when the door 1 was chosen by the player: the host’s deliberate action adds value to the door he did not choose to eliminate, but not to the one chosen by the contestant originally.

The mammography problem

Yudkowsky:

1% of women at age forty who participate in routine screening have breast cancer. 80% of women with breast cancer will get positive mammographies. 9.6% of women without breast cancer will also get positive mammographies. A woman in this age group had a positive mammography in a routine screening. What is the probability that she actually has breast cancer?

What do you think the answer is? If you haven’t encountered this kind of problem before, please take a moment to come up with your own answer before continuing.

Next, suppose I told you that most doctors get the same wrong answer on this problem - usually, only around 15% of doctors get it right. (“Really? 15%? Is that a real number, or an urban legend based on an Internet poll?” It’s a real number. See Casscells, Schoenberger, and Grayboys 1978; Eddy 1982; Gigerenzer and Hoffrage 1995; and many other studies. It’s a surprising result which is easy to replicate, so it’s been extensively replicated.)

Most doctors estimate the probability to be between 70% and 80%. The correct answer is 7.8%.

  1. More on Cardano, in Section 4.3 of Hald:

    [Cardano’s] De Ludo Aleae is a treatise on the moral, practical, and theoretical aspects of gambling, written in colorful language and containing some anecdotes on Cardano’s own experiences. Most of the theory in the book is given in the form of examples from which general principles are or may be inferred. In some cases Cardano arrives at the solution of a problem through trial and error, and the book contains both the false and the correct solutions. He also tackles some problems that he cannot solve and then tries to give approximate solutions. […] In Chap. 14, he defines the concept of a fair games in the following terms:

    So there is one general rule, namely, that we should consider the whole circuit [the total number of equally possible cases], and the number of those casts which represents in how many ways the favorable result can occur, and compare that number to the remainder of the circuit, and according to that proportion should the mutual wagers be laid so that one may contend on equal terms.

Philosophy success story III: possible words semantics

March 31, 2018

This is part of my series on success stories in philosophy. See this page for an explanation of the project and links to other items in the series.

Contents

  1. Intensions rescued from darkness
  2. Applications
    1. Future contingents
    2. Modality de dicto vs modality de re

Intensions rescued from darkness

Grant that “animals with a kidney” and “animals with a heart” designate the same set. They have the same extension. Yet their meaning is clearly different.1 In On Sense and Reference, (“Über Sinn und Bedeutung”, 1892) Frege had already noticed this.

Classical predicate logic’s achievement was to give a precise and universal account of how the designation of a sentence depends on the designation of its parts. It was a powerful tool for both deduction and clarification, revealing the ambiguity of ordinary language. I discuss this in detail in the first success story.

Classical logic was developed to model the reasoning needed in mathematics, where the difference between meaning and designation is unimportant. Outside of mathematics, where meaning and designation can come apart, classical logic was inadequate. A formal account of meaning was lacking. Frege called it sense (“Sinn”). According to Sam Cumming, “Frege left his notion of sense somewhat obscure”. Frege appeared to endorse the criterion of difference for senses:

Two sentences S and S* differ in sense if and only if some rational agent who understood both could, on reflection, judge that S is true without judging that S* is true.

This is not adequately formal. Letting meaning depend on the conclusions of some “rational agent” leaves it at the level of intuition. The criterion does not even attempt to give a formal model of meaning; it simply gives a condition for meanings to differ.

Meaning began to seem metaphysically suspect, like a ghostly “extra” property tacked on to every predicate. SEP tells us:

Intensional entities have of course featured prominently in the history of philosophy since Plato and, in particular, have played natural explanatory roles in the analysis of intentional attitudes like belief and mental content. For all their prominence and importance, however, the nature of these entities has often been obscure and controversial and, indeed, as a consequence, they were easily dismissed as ill-understood and metaphysically suspect “creatures of darkness”2 (Quine 1956, 180) by the naturalistically oriented philosophers of the early- to mid-20th century.

The contribution of possible worlds semantics was to give a precise formal description of these “creatures of darkness”, bringing them into the realm of respectability.

Simply: intensions are extensions across possible worlds.

Sider (Logic for Philosophy p.290) writes:

we relativize the interpretation of predicates to possible worlds. The interpretation of a two-place predicate, for example, was in nonmodal predicate logic a set of ordered pairs of members of the domain; now it is a set of ordered triples, two members of which are in the domain, and one member of which is a possible world. When is in the interpretation of a two-place predicate , that represents ’s applying to and in possible world . This relativization makes intuitive sense: a predicate can apply to some objects in one possible world but fail to apply to those same objects in some other possible world. These predicate-interpretations are known as “intensions”. The name emphasizes the analogy with extensions, which are the interpretations of predicates in nonmodal predicate logic. The analogy is this: the intension of a predicate predicate can be thought of as determining an extension within each possible world”.

Applications

Future contingents

Aristotle famously used the case of a sea-battle to (seemingly) argue against the law of the excluded middle:

Let me illustrate. A sea-fight must either take place to-morrow or not, but it is not necessary that it should take place to-morrow, neither is it necessary that it should not take place, yet it is necessary that it either should or should not take place to-morrow. Since propositions correspond with facts, it is evident that when in future events there is a real alternative, and a potentiality in contrary directions, the corresponding affirmation and denial have the same character.

This is the case with regard to that which is not always existent or not always nonexistent. One of the two propositions in such instances must be true and the other false, but we cannot say determinately that this or that is false, but must leave the alternative undecided. One may indeed be more likely to be true than the other, but it cannot be either actually true or actually false. It is therefore plain that it is not necessary that of an affirmation and a denial one should be true and the other false. For in the case of that which exists potentially, but not actually, the rule which applies to that which exists actually does not hold good. The case is rather as we have indicated.

People appear to have been confused about this for many centuries. It doesn’t help that Aristotle wrote very ambiguously. Colin Strang (1960) tells us:

VERY briefly, what Aristotle is saying in De Interpretatione, chapter ix is this: if of two contradictory propositions it is necessary that one should be true and the other false, then it follows that everything happens of necessity; but in fact not everything happens of necessity; therefore it is not the case that of two contradictory propositions it is necessary that one should be true and the other false; the propositions for which this does not hold are certain particular propositions about the future.

The reader is warned that what Aristotle is saying is ambiguous (cf. Miss Anscombe, loc. cit. p. 1).

SEP tells us:

The interpretative problems regarding Aristotle’s logical problem about the sea-battle tomorrow are by no means simple. Over the centuries, many philosophers and logicians have formulated their interpretations of the Aristotelian text (see Øhrstrøm and Hasle 1995, p. 10 ff.).

The SEP article is very long, and features Leibniz and some pretty funky-looking graphs. I recommend it if you want to experience some confusion.

Aristole’s could be taken to reason thus:

  1. If Battle, then it cannot be that No Battle
  2. If if cannot be that no Battle, then necessarily Battle
  3. If Battle, then Necessarily Battle

But this is an obvious modal fallacy, drawing on the ambiguity of (1) between

  • The true statement which implies
  • The false statement

Philosophy is littered with variations on this confusion between necessity of the consequence and necessity of the consequent.

Modality de dicto vs modality de re

As the SEP page on Medieval theories of modality will amply demonstrate, confusion reigned long after Aristotle’s day. Quine (Word and Object) was baffled by talk of a difference between necessary and contingent attributes of an object, but used some quite fallacious arguments in attacking that difference:

Perhaps I can evoke the appropriate sense of bewilderment as follows. Mathematicians may conceivably be said to be necessarily rational and not necessarily two-legged; and cyclists necessarily two-legged and not necessarily rational. But what of an individual who counts among his eccentricities both mathematics and cycling? Is this concrete individual necessarily rational and contingently two-legged or vice versa? Just insofar as we are talking referentially of the object, with no special bias towards a background grouping of mathematicians as against cyclists or vice versa, there is no semblance of sense in rating some of his attributes as necessary and others as contingent. Some of his attributes count as important and others as unimportant, yes, some as enduring and others as fleeting; but none as necessary or contingent.

SEP writes: “Most philosophers are now convinced, however, that Quine’s “mathematical cyclist” argument has been adequately answered by Saul Kripke (1972), Alvin Plantinga (1974) and various other defenders of modality de re.”

And elsewhere:

(15) Algol is a dog essentially:

Sentences like (15) in which properties are ascribed to a specific individual in a modal context are said to exhibit modality de re (modality of the thing). Modal sentences that do not, like

Necessarily, all dogs are mammals: are said to exhibit modality de dicto (roughly, modality of the proposition).

As Plantiga writes Quine has us confused:

The essentialist, Quine thinks, will presumably accept (35) Mathematicians are necessarily rational but not necessarily bipedal and (36) Cyclists are necessarily bipedal but not necessarily rational.

But now suppose that (37) Paul J. Swiers is both a cyclist and a mathematician. From these we may infer both (38) Swiers is necessarily rational but not necessarily bipedal and (39) Swiers is necessarily bipedal but not necessarily rational

which appear to contradict each other twice over. This argument is unsuccessful as a refutation of the essentialist. For clearly enough the inference of (39) from (36) and (37) is sound only if (36) is read de re; but, read de re, there is not so much as a ghost of a reason for thinking that the essentialist will accept it.

But possible worlds semantics also illuminates the intuition that was likely behind Quine’s dismissal of de re modality. SEP:

Possible world semantics provides an illuminating analysis of the key difference between [modality de re and modality de dicto]: The truth conditions for both modalities involve a commitment to possible worlds; however, the truth conditions for sentences exhibiting modality de re involve in addition a commitment to the meaningfulness of transworld identity, the thesis that, necessarily, every individual (typically, at any rate) exists and exemplifies (often very different) properties in many different possible worlds.

Beautiful.

  1. Ordinary-language predicates can be ambiguous between sense and reference. Ordinary-language names can also be ambiguous in the same way, as with “Hesperus = Phosoporus”. But Kripke himself (!) didn’t appear to see this, and it took the development of two-dimensional semantics (Stanford, see also Sider’s Logic for Philosophy, chapter 10, and Chalmers). I don’t count this as a success story because 2D semantics has yet to gain consensus approval. 

  2. In Quantifiers and Propositional Attitudes (1956) Quine wrote: “Intensions are creatures of darkness, and I shall rejoice with the reader when they are exorcised, but first I want to make certain points with help of them.” My understanding is that Quine had a pre-possible worlds understanding of “intensions”, equivalent to Frege’s senses and hence still informal. So in today’s usage the quote would be rendered as “Meanings are creatures of darkness”. Quine was writing in 1956. Kripke published Semantical Considerations on Modal Logic in 1963. 

What success in philosophy sometimes looks like

March 30, 2018

Many success stories in philosophy can usefully be viewed as disambiguations or formalisations.

Disambiguation

Wittgenstein wrote that “philosophy is a battle against the bewitchment of our intelligence by means of language”. Ordinary language developed to work in ordinary contexts. When we deal with philosophically tricky issues, however, ordinary language rarely coincides with the underlying concepts in a one-to-one mapping. Sometimes ordinary language will use two different words for the same concept. This case rarely leads to problems. But when instead ordinary terms are ambiguous between two or more meanings, this is fertile ground for confusion. A lot of good philosophy disambiguates between these meanings to dissolve apparent paradoxes.

Phrases that have been disambiguated include:

Formalisation

Sometimes people find my purported success stories mathematical rather than philosophical. I’ve even been accused of lumping the whole of mathematics into philosophy. I see why this intuition is compelling. Logic, the analysis of computability, Bayesianism and so on just look mathsy. It seems natural to cluster them with maths rather than philosophy. And that definitely makes sense in some contexts.

Here, I’m trying to understand how philosophy works, and what it can do for us when it’s successful. In that context, I claim, these stories should be clustered with philosophy. We should look beyond superficial patterns, like what the work looks like on the printed page, and instead ask: what kind of cognitive work is being done?

Now is a good time to ask: what do we call mathematics? In primary school, you might get away with defining mathematics as that which deals with quantity or number. But modern mathematics goes far beyond that. Wikipedia tells us: “Starting in the 19th century, when the study of mathematics increased in rigour and began to address abstract topics such as group theory and projective geometry, which have no clear-cut relation to quantity and measurement, mathematicians and philosophers began to propose a variety of new definitions. Some of these definitions emphasize the deductive character of much of mathematics, some emphasize its abstractness, some emphasize certain topics within mathematics”.

I want to emphasise that whenever something is sufficiently formal, we tend to call it mathematical. Mathematics uses the form of strings to manipulate them according to perfectly precise rules. (I hope this is uncontroversial. I take no view on whether mathematics is only formalism).

Before we knew how to reason about the trajectories of medium-sized objects, we speculated and used vague verbiage. Since classical mechanics was solved, we use coordinates and derivatives. Object trajectories have been mathematised. But nothing about the subject matter of trajectories has changed, or (I claim) was distinctive in the first place. Formalisation is just what is looks like to fully solve a conceptual problem. Once we fully understood trajectories, they “became part of mathematics”.

Here’s another example. Logic has nothing to do with quantity or number, but is often called mathematical, and ‘’ and ‘’ are said to be mathematical symbols. Sider (Logic for Philosophy) writes: “Modern logic is called “mathematical” or “symbolic” logic, because its method is the mathematical study of formal languages. Modern logicians use the tools of mathematics (especially, the tools of very abstract mathematics, such as set theory) to treat sentences and other parts of language as mathematical objects.” But logic is just culmination of a long-standing project: to distinguish good from bad arguments. Formal logic means we have succeeded fully. We have wholly clarified certain kinds of deductive reasoning.

I don’t mean to claim that all of mathematics should be clustered with philosophy. I just mean the initial mathematisation of a previously informal area of study. Once the formal cornerstones have been laid, philosophy really does hand off to mathematics. My rough picture of intellectual progress is the following:

  1. Confusion reigns. People get lost in vague verbiage, and there is no standard way to adjudicate disagreements.
  2. Much work is done in the service of clarification. Ultimately, maximal clarification is achieved through formalisation.
  3. With a formal system at hand, people go to town with it, proving things left and right, extending the system, and so on.
  4. We begin to view this area of study as mathematical or even part of mathematics.

Stage (1) is what most people think philosophy looks like. I say: it’s philosophy when it’s still failing. Stage (2) is successful philosophy (or at least one kind it). But the philosophical nature of the contribution in (2) is often forgotten in the subsequent wave of mathematical enthusiasm for steps (3) and (4).

I hope I’ve now built the intuition enough to move on to the success stories that people have found most counter-intuitive.

With the analysis of computability, the philosophical work of clarification was to formalise the notion of effective calculability with a Turing machine. This allowed mathematical work to be done with the formal notion. In this case, Turing did step (3) immediately, in the same paper, he went on to prove many results about Turing machines. So Turing’s paper is, in some sense, first some philosophy, then some mathematics. Wikipedia tells us that Hilbert’s problems ranged greatly in precision. Some of them are propounded precisely enough to enable a clear affirmative or negative answer, while others had to be substantially clarified. The Entscheidungsproblem was more philosophical because it involved significant work of clarification. And it’s a particularly cool story, because the precisification proposed by Turing turned out to (i) gain virtually universal approval and (ii) have wide philosophical significance and applicability.

In the case of the development of probability theory, it’s emphatically not the case that, pre-Pascal, people were disagreeing on a point of mathematics. They were much more deeply confused. They just had no appropriate notion of probability or expected value, and were trying to cobble together solutions to particular problems using ad-hoc intuitions. Because Pascal launched probability theory, it seems only natural to view his first step as part of probability theory. But in an important sense the first step is very different. It’s much more philosophical.

The successes and failures of conceptual analysis

March 30, 2018

Source: SMBC

Contents

  1. Introduction
  2. Examples
    1. Knowledge
    2. Belief
    3. Species
    4. Temperature
    5. Speed and acceleration
    6. The epsilon-delta definition of a limit
    7. Effective calculability
    8. Causation
  3. How the most successful conceptual analyses become definitions

Introduction

The point has been made often and well (Wittgenstein, Ramsey, Muehlhauser, Yudkowsky), that conceptual analysis is doomed by resting on falsified assumptions about human cognition, and a mistaken view of the nature of empirical categories.

A first problem is with necessary and sufficient conditions:

Category-membership for concepts in the human brain is not a yes/no affair, as the “necessary and sufficient conditions” approach of the classical view assumes. Instead, category membership is fuzzy. (Muehlhauser)

This first problem could be solved with a new type of conceptual analysis, one admitting of degree. However, a deeper problem arises from the requirement that an analysis admit of no intuitive counterexamples:

[…] most of our empirical concepts are not delimited in all possible directions. Suppose I come across a being that looks like a man, speaks like a man, behaves like a man, and is only one span tall – shall I say it is a man? Or what about the case of a person who is so old as to remember King Darius? Would you say he is an immortal? Is there anything like an exhaustive definition that finally and once for all sets our mind at rest? ‘But are there not exact definitions at least in science?’ Let’s see. The notion of gold seems to be defined with absolute precision, say by the spectrum of gold with its characteristic lines. Now what would you say if a substance was discovered that looked like gold, satisfied all the chemical tests for gold, whilst it emitted a new sort of radiation? ‘But such things do not happen.’ Quite so; but they might happen, and that is enough to show that we can never exclude altogether the possibility of some unforeseen situation arising in which we shall have to modify our definition. (Waismann)

Waismann called this feature of our language open texture.

Clearly these two requirements must be abandoned (there go entire literatures…).

Is all conceptual analysis therefore useless? The view has some appeal. If all we want is to dissolve philosophical confusions through clarification of ambiguities; this can be achieved by stipulating definitions that allow us to be as precise as we want, after which we can abandon other verbiage. Hence SEP tells us:

Another view, held at least in part by Gottlob Frege and Wilhelm Leibniz, is that because natural languages are fraught with vagueness and ambiguity, they should be replaced by formal languages. A similar view, held by W. V. O. Quine (e.g., [1960], [1986]), is that a natural language should be regimented, cleaned up for serious scientific and metaphysical work.

My view is the following: abandoning ambiguous terms in favour of more precise, stipulatively defined ones, i.e. regimentation, is always a legitimate philosophical move. Pragmatically, however, there are costs to doing so. Technical texts with a lot of jargon are difficult to read for a reason. It takes time to communicate the definitions of one’s terms to others. And it takes longer still until our audience gains intuitive familiarity with the new terminology, and can manipulate it with speed and accuracy.

When deciding which words to use, we face a trade-off between precision on the one hand, and agreement with intuitive terminology on the other.

Programming languages are an example of the fully regimented extreme. There is no ambiguity, but coding must be learnt the hard way. The language of small children or pre-scientific civilisations (“a whale is heavier than a bowling ball”), on the other hand, is completely intuitive.

The old view of conceptual analysis, requiring necessary and sufficient conditions, and admitting of no counter-examples, was an attempt to achieve both complete precision and complete intuitiveness. But from its failure it does not follow that all our old words should be regimented away. In some cases that may be the best we can do; some unsalvageable concepts, like what it means for a storm-cloud to be angry, are to be consigned to the dustbin of language. But for other terms, like “causation”, it’s not a foregone conclusion that the optimal way to navigate the trade-off is to abandon the word. We may do better to keep the word, along with its “good enough” definition. Conceptual analysis, on a more modest and fruitful view, is a tool that can help us to find such opportunities.

In general, therefore, I don’t find conceptual analysis particularly exciting. If the use of regimented language dissolves a controversy of analysis, it’s clear that nothing of “philosophical” importance was hanging in the balance in the first place. However, conceptual analyses can be pragmatically useful, and indeed there have been a number of examples I enjoyed. In what follows I list some intellectual phenomena I consider examples of conceptual analysis, and comment on them.

Examples

Knowledge

The “analysis of knowledge merry-go-round” (Weatherson 2003), has rightly been much derided.

Here’s a nice quote by Scott Sturgeon (who used to be my tutor!):

Thirty years ago this journal published the most influential paper of > modern analytic epistemology - Edmund Gettier’s ‘Is Justified True Belief > Knowledge?’. In it Gettier refuted a classic theory of propositional knowledge by constructing thought experiments to test the theory. A cottage industry was born. Each response to Gettier was quickly met by a new Gettier-style case. In turn there would be a response to the case, a further Gettier scenario, and a reiteration of the process. The industry’s output was staggering. Its literature became so complicated, its thought experiments so baroque, that commonsense was stretched beyond limit.

This is a clear example where regimentation is appropriate. Our epistemic state can be fully described by our beliefs and our evidence. What about “knowledge”? Commit it then to the flames!

Belief

Quoting from an essay I wrote in 2017:

We want a theory of when it is rational to have an outright belief. It seems like we might easily get this from our theory of when it is rational to have a graded belief. Simply say, “it is rational to believe something simpliciter iff it is rational to believe it with a probability p>y.” Let’s call this the threshold view. We won’t be able to put an exact number on y. This merely points to the fact that outright belief-language is somewhat vague. Similarly, in “a person is bald iff they have fewer than z hairs on their head”, z is imprecisely specified, but we still understand what it means to be bald, and we know that .

But the cases of preface and lottery appear to show that the threshold view is false.

Consider the lottery: “Let the threshold y required for belief be any real number less than 1. For example, let y = 0.99. Now imagine a lottery with 100 tickets, and suppose it is rational for you to believe with full confidence that the lottery is fair and that as such there will be only one winning ticket. […] So, it is rational for you to have 0.99 confidence that ticket #1 will not win, 0.99 confidence that ticket #2 will not win, and so on for each of the other tickets. According to the [threshold view], it is rational for you to believe each of these propositions, since it is rational for you to have a degree of confidence in each that is sufficient for belief. But given that rational belief is closed under conjunction, it is also rational for you to believe that (ticket #1 will not win and ticket #2 will not win . . . and ticket #100 will not win)” (Foley 1992). However, this is a contradiction with your belief that the lottery is fair, i.e., that exactly one ticket will win the lottery. Thus y cannot be 0.99. The same conclusion can be reached for any probability y<1: simply create a lottery with 1/(1-y) tickets, and argue as before. Thus the threshold cannot be any less than 1. This clearly will not do, as it violates our intuitions about everyday uses of ‘believe’, as in “I believe it will rain tomorrow”.

Similarly, consider now the preface: “You write a book, say a history book. In it you make many claims, each of which you can adequately defend. In particular, suppose it is rational for you to have a degree of confidence x in each of these propositions, where x is sufficient for belief but less than l. Even so, you admit in the preface that you are not so naive as to think that your book contains no mistakes. You understand that any book as ambitious as yours is likely to contain at least a few errors. So, it is highly likely that at least one of the propositions you assert in the book, you know not which, is false. Indeed, if you were to add appendices with propositions whose truth is independent of those you have defended previously, the chances of there being an error somewhere in your book becomes greater and greater. Nevertheless, given that rational belief is closed under conjunction, it cannot be rational for you to believe that your book contains any errors” (Foley 1992). Thus, if it is rational to believe each of the propositions that make up your book, then it is also rational to believe their conjunction, despite your having a low degree of confidence in that conjunction. Indeed, as before, your degree of confidence in the conjunction can be made arbitrarily low by adding more chapters to the book.

“After all, what reasons do we have to be interested in an [invariantist] theory of rational belief [simpliciter] if we have an adequate [invariantist] theory of rational degrees of belief? Does the former tell us anything useful above and beyond the latter? Is it really needed for anything? It doesn’t seem to be needed for the theory of rational decision making.” (Foley 1992). The fact that our ordinary-language usage of ‘belief’ cannot fully account for the laws of probability is simply a kink of ordinary language. Ordinarily, we do not speak about things like very long conjunctions concerning lottery tickets. The shorthand word ‘belief’ deals well with most cases we do ordinarily encounter. In other cases, we can simply retreat to using the language of degrees of belief.

Yeah, we don’t need to conceptually analyse ‘belief’. It’s probably outright harfum to keep using that word.

Species

Humans have long understood that animals come in relatively sharply delineated clusters. By using a word for, say, “pig” and another for “dog”, we are making use of these categories. More recently, modern biology has developed the concept of “species”. Wikipedia explains that “a species is often defined as the largest group of organisms in which two individuals can produce fertile offspring, typically by sexual reproduction”.

This definition can be viewed as a proposed conceptual analysis of our pre-scientific, or folk-biological, concept of “type of animal”.

This analysis does really well, on hundreds of folk biological categories! We are so used now to the concept of species that this remarkable fact may appear obvious. There are some problem cases, too: Elephants are three species; while a caterpillar and a butterfly can be the same species.

What is more, even the more regimented concept of species is too imprecise for some use cases. Wikipedia says: “For example, with hybridisation, in a species complex of hundreds of similar microspecies, or in a ring species, the boundaries between closely related species become unclear.”

Temperature

The definition of temperature as mean molecular kinetic energy can be viewed as a conceptual analysis. Wikipedia says that temperature is “a physical quantity expressing the subjective perceptions of hot and cold”. And by and large it does excellently. However, it fails with spicy (“hot”) food.

Does this exception mean we need to regiment away common-sense notions of hot and cold? No! This illustrates how analysis that admit of exceptions can still be useful.

Speed and acceleration

Heisenberg:

The concepts of classical physics are just a refinement of the concepts of daily life and are an essential part of the language which forms the basis of all natural science.

Speed is the first derivative of location with respect to time, and acceleration is the second derivative.

This is a conceptual analysis so successful that the definition resulting from the analysis has replaced, in most adult speakers, the intuitive notion. (Something we have already seen to some extent with species). For this reason, it’s hard to see that it was, in fact, a conceptual analysis.

Where can we find remnants of the pre-scientific, ur-intuitive notion of speed? The theories of Aristotle and small children seem like a good place to start in search of this pristine naiveté.

Per Macagno 1991, Aristotle had piecemeal correct intuitions about speed, but he did not see the more general point:

Although Aristotle discusses in detail when a motion is faster than another by considering the space traversed and the corresponding time, he never arrived at what is so elementary for us: . He considers several cases; for instance, in the case , velocity is larger than if . To divide a distance by a time was not an acceptable operation, if it was considered at all […]

Similarly, 11-year olds get time, distance, and speed right in most but not all cases (Siegler and Richards 1979):

Children were shown two parallel train tracks with a locomotive on each of them. The two locomotives could start from the same or different points, could stop at the same or different points, and could go the same or different distances. They could start at the same or different times, could stop at the same or different times, and could travel for the same or different total time. Finally, they could go at the same or different speeds. […]

On the time concept, the state before full mastery seemed to be one in which time and distance were only partially differentiated. This was evident in the use of the distance rule to judge time by a large number of 11- year-olds.

Similarly, I would expect (although citation needed) that many children who have a good grasp of the difference between position and speed (i.e. they would not say that whichever train ended farther ahead on the tracks travelled for the longer time, or the faster speed, or the greater distance), still do not clearly distinguish speed from acceleration. For instance, they might say: “whoa, the car went so fast just then - I was really pressed up against my seat.”

Once we have conceptually analysed speed and acceleration as the first and second derivatives of position with respect to time, we have a powerful new formal tool. We can use this tool to create new concepts which did not exist in natural language. For example, the third time-derivative of position is jerk. Understanding jerk has many uses, for instance to build quadcopters and other drones.

Finally, here’s an analogy from the opposite direction, taken from Mori, Kojima and Tadang (1976).

Children are likely to judge the speed from temporal precedence and say, “It went faster because it arrived earlier.” In the Japanese language, the two words meaning fast in speed and early in temporal precedence, respectively, have the same pronunciation, i.e., hayai. On the other hand, in the Thai language, these two words are differentiated in their pronunciation as well as in meaning; the one that means high speed is re0 and the other one that means temporal precedence is khon. […]

The Japanese and Thai children were shown the same visual displays of moving objects and asked to compare the speed of those moving objects. The results significantly indicate that Thai children’s concept of speed is further advanced than that of Japanese children.

The Japanese children are to the Thai children like Aristotle is to a modern student armed with the formal notion of accelaration. It’s possible to go beyond ordinary English with the formal language of physics, but it’s also possible to lag behind ordinary English with (children’s understanding of) Japanese. Similarly, “the Pirahã language and culture seem to lack not only the words but also the concepts for numbers, using instead less precise terms like “small size”, “large size” and “collection”.”

The epsilon-delta definition of a limit

See my other post on the success story of predicate logic.

Effective calculability

See my other post on the success story of computability.

Causation

The conceptual analysis of causation fills many a textbook. Here I’ll focus on just the counterfactual analyses, that is, analyses of causal claims in terms of counterfactual conditionals.

A first attempt might be:

Where c and e are two distinct actual events, c causes e if and only if, if c were not to occur e would not occur.

But cases of Preemption offer a counter-example (SEP):

Suppose that two crack marksmen conspire to assassinate a hated dictator, agreeing that one or other will shoot the dictator on a public occasion. Acting side-by-side, assassins A and B find a good vantage point, and, when the dictator appears, both take aim. A pulls his trigger and fires a shot that hits its mark, but B desists from firing when he sees A pull his trigger. Here assassin A’s actions are the actual cause of the dictator’s death, while B’s actions are a preempted potential cause.

To deal with Preemption, we can move to the following account:

[Lewis’s] truth condition for causal dependence becomes:

(3) Where c and e are two distinct actual events, e causally depends on c if and only if, if c were not to occur e would not occur.

He defines a causal chain as a finite sequence of actual events c, d, e,… where d causally depends on c, e on d, and so on throughout the sequence. Then causation is finally defined in these terms:

(5) c is a cause of e if and only if there exists a causal chain leading from c to e.

But take the following case:

A person is walking along a mountain trail, when a boulder high above is dislodged and comes careering down the mountain slopes. The walker notices the boulder and ducks at the appropriate time. The careering boulder causes the walker to duck and this, in turn, causes his continued stride. (This second causal link involves double prevention: the duck prevents the collision between walker and boulder which, had it occurred, would have prevented the walker’s continued stride.) However, the careering boulder is the sort of thing that would prevent the walker’s continued stride and so it seems counterintuitive to say that it causes the stride.

Hence:

Some defenders of transitivity have replied that our intuitions about the intransitivity of causation in these examples are misleading. For instance, Lewis (2004a) points out that the counterexamples to transitivity typically involve a structure in which a c-type event generally prevents an e-type but in the particular case the c-event actually causes another event that counters the threat and causes the e-event. If we mix up questions of what is generally conducive to what, with questions about what caused what in this particular case, he says, we may think that it is reasonable to deny that c causes e. But if we keep the focus sharply on the particular case, we must insist that c does in fact cause e.

Aha, but we simply need to modify the marksman case to get a case of late preemption:

Billy and Suzy throw rocks at a bottle. Suzy throws first so that her rock arrives first and shatters the glass. Without Suzy’s throw, Billy’s throw would have shattered the bottle. However, Suzy’s throw is the actual cause of the shattered bottle, while Billy’s throw is merely a preempted potential cause. This is a case of late preemption because the alternative process (Billy’s throw) is cut short after the main process (Suzy’s throw) has actually brought about the effect.

Lewis’s theory cannot explain the judgement that Suzy’s throw was the actual cause of the shattering of the bottle. For there is no causal dependence between Suzy’s throw and the shattering, since even if Suzy had not thrown her rock, the bottle would have shattered due to Billy’s throw. Nor is there a chain of stepwise dependences running cause to effect, because there is no event intermediate between Suzy’s throw and the shattering that links them up into a chain of dependences. Take, for instance, Suzy’s rock in mid-trajectory. Certainly, this event depends on Suzy’s initial throw, but the problem is that the shattering of the bottle does not depend on it, because even without it the bottle would still have shattered because of Billy’s throw.

To be sure, the bottle shattering that would have occurred without Suzy’s throw would be different from the bottle shattering that actually occurred with Suzy’s throw. For a start, it would have occurred later. This observation suggests that one solution to the problem of late preemption might be to insist that the events involved should be construed as fragile events. Accordingly, it will be true rather than false that if Suzy had not thrown her rock, then the actual bottle shattering, taken as a fragile event with an essential time and manner of occurrence, would not have occurred. Lewis himself does not endorse this response on the grounds that a uniform policy of construing events as fragile would go against our usual practices, and would generate many spurious causal dependences. For example, suppose that a poison kills its victim more slowly and painfully when taken on a full stomach. Then, the victim’s eating dinner before he drinks the poison would count as a cause of his death since the time and manner of the death depend on the eating of the dinner.

Lewis then further modifies his theory:

The central notion of the new theory is that of influence.

(7) Where c and e are distinct events, c influences e if and only if there is a substantial range of c1, c2, … of different not-too-distant alterations of c (including the actual alteration of c) and there is a range of e1, e2, … of alterations of e, at least some of which differ, such that if c1 had occurred, e1 would have occurred, and if c2 had occurred, e2 would have occurred, and so on.

Where one event influences another, there is a pattern of counterfactual dependence of whether, when, and how upon whether, when, and how. As before, causation is defined as an ancestral relation.

(8) c causes e if and only if there is a chain of stepwise influence from c to e.

One of the points Lewis advances in favour of this new theory is that it handles cases of late as well as early pre-emption. (The theory is restricted to deterministic causation and so does not address the example of probabilistic preemption described in section 3.4.) Reconsider, for instance, the example of late preemption involving Billy and Suzy throwing rocks at a bottle. The theory is supposed to explain why Suzy’s throw, and not Billy’s throw, is the cause of the shattering of the bottle. If we take an alteration in which Suzy’s throw is slightly different (the rock is lighter, or she throws sooner), while holding fixed Billy’s throw, we find that the shattering is different too. But if we make similar alterations to Billy’s throw while holding Suzy’s throw fixed, we find that the shattering is unchanged.

At this point, I’m hearing distinct echoes of the knowledge merry-go-round. After over forty years of analyses of causation, it’s a good time to ask ourselves: what would be the value of success in this enterprise? What would be the use of a conceptual analysis that captured all these strange edge cases?

I think the value would be very limited. We are able to fully describe any situtation without making use of the word “causation” (see below). Why then spend all this time considering baroque thought experiments? In the case of Suzy and Billy’s bottle, I honestly haven’t got that strong an intuition of what was the cause of the shattering. I think it’s playing games with open texture.

How is it that we can eliminate1 causation from our langauge? To describe what actually happens in the world, including in the above cases, we only need to describe each counterfactual situation. Brain Tomasik describes one way of doing so:

But if we had a complete physical model of the multiverse (e.g., a giant computer program that specified how the multiverse evolved), [we could ] change the program to remove X in some way and see if Y still happens.

Alternatively, you could specify your model using a causal graph. Once the causal graph is fully specified, it’s an empty question what truly caused the bottle to shatter.

How the most successful conceptual analyses become definitions

The analysis of limit, has become a universally accepted definition. The same thing is in the (largely completed) process of happening for the analysis of computability. Soare 1996 draws the analogy beautifully:

In the early 1800’s mathematicians were trying to make precise the intuitive notion of a continuous function, namely one with no breaks. What we might call the “Cauchy-Weierstrass Thesis” asserts that a function is intuitively continuous iff it satisfies the usual formal episilon-delta-definition found in elementary calculus books.

Similarly, what we might call the “Curve Thesis” asserts that the intuitive notion of the length of a continuous curve in 2-space is captured by the usual definition as the limit of sums of approximating line segments. [Kline 1972: “Up to about 1650 no one believed that the length of a curve could equal exactly the length of a line. In fact, in the second book of La Geometrie, Descartes says the relation between curved lines and straight lines is not nor ever can be known.”]

The “Area Thesis” asserts that the area of an appropriate continuous surface in 3-space is that given by the usual definition of the limit of the sum of the areas of appropriate approximating rectangles.

These are no longer called theses, rather they are simply taken as definitions of the underlying intuitive concept.

  1. This idea has a good pedigree: in the words of Russell: “The law of causation, […] is a relic of a bygone age, surviving, like the monarchy, only because it is erroneously supposed to do no harm. […] In the motions of mutually gravitating bodies, there is nothing that can be called a cause, and nothing that can be called an effect; there is merely a formula.” For more discussion see Stanford and Judea Pearl