Why scientific fraud is hard to catch

August 2, 2019

It’s nearly impossible to catch a scientific fraudster if they’re halfway competent.

Uri Simonsohn has become a minor nerd celeb by exposing fraudulent academic scientists who used fabricated data to get published. The Atlantic called him “the data vigilante”. I’ll describe two simple statistical techniques he has used – and why I’m pessimistic about the impact of such techniques.

If a parameter is measured with many significant digits, the last digit should be distributed uniformly 0-9. In a study of an intervention to increase factory workers’ use of hand sanitizer, sanitizer use was measured with a scale sensitive to the 100th of a gram. But the data had an unusual prevalence of 6s, 7s and 9s on the last digit. Uri Simonsohn and colleagues conducted a chi-square test and reject the hypothesis that the digits follow a uniform distribution, p=0.00000000000000001.1

A second sign of fraudulent data is if the baseline means are too similar between treatment groups. In one of the hand sanitizer studies, there were 40 participants, 20 in the control condition and 20 in the treatment condition. Simonsohn used a “bootstrapping” technique – randomly shuffling the 40 observations into two groups of 20, and repeating this millions of times, in order to estimate how often we would see such similar means if the data were truly drawn randomly (less than once in a 100,000)2.

There are other, more mathematically intense techniques for forensic data analysis3, but the common theme among them is to detect fraudsters creating suspiciously non-random data.

I want to tell these hand sanitizer people: come on, how hard can it be to use a random number generator? We know people are bad at producing randomness. In poker, it’s often optimal to play a mixed strategy, which requires randomising your play. But we have a strong natural tendency to play non-randomly, so poker players have developed ad hoc randomisation devices, like looking at your watch and playing call if you’re in the first half of the minute and fold if you’re in the second half. A similar incapacity to produce enough randomness seems to have befallen these amateurish scientific fakers. In order to produce data that violates the last-digit-uniformity law, you have to literally be writing the fake numbers by hand into a computer!

Savvier baddies would not shoot themselves in the foot in this way. It’s very easy to just draw some random numbers from a pre-specified distribution.

I can imagine that as you run more complex experiments, with multiple treatment arms and many potentially correlated parameters, it becomes difficult to create realistic fake data, even if you randomly draw it from a distribution. Some inconsistency could always escape your notice, and a sufficiently determined data sleuth might catch you.

But there’s a much easier solution: just run a legitimate experiment, and then add a constant of your choice to all observations in the treatment group. This data would look exactly like the real thing – the only lie would be that the “treatment” was you logging on to the computer in the middle of the night and changing the numbers. I can’t think of any way this misconduct could be detected statistically. And it has the additional benefit that you’re running an experiment, so people in your department won’t be wondering where you’re getting all that data from.

Statistical sleuthing is fun, but I suspect it’s powerless against the majority of fraud.

My broader hope is that we’ll see a rise in the norm of having multiple independent replications of a study. This single tide should wash away many of the problems with current science. If a study fails to replicate multiple times, the result will lose credibility – even if we never find out whether it was due to outright fraud or merely flawed science.

  1. http://datacolada.org/74, Figure 2 

  2. http://datacolada.org/74, Problem 4 

  3. see the “fake data” category of Simonsohn’s blog Data Colada, which by the way is excellent on many topics besides fraud. 

A shift in arguments for AI risk

May 25, 2019

Different arguments have been made for prioritising AI. In Superintelligence, we find a detailed argument with three features: (i) the alignment problem as the source of AI risk, (ii) the assumption that there will be a sharp, discontinuous jump in AI capabilities, and (iii) the resulting conclusion that an existential catastrophe is likely. Arguments that abandon some of these features have recently become prominent. Christiano and Grace drop the discontinuity assumption, but keep the focus on alignment. Even under more gradual scenarios, they argue, misaligned AI could cause human values to lose control of the future. Moreover, others have proposed AI risks that are unrelated to the alignment problem: for example, the risk that AI might be misused or could make war between great powers more likely. It would be beneficial to clarify which arguments actually motivate people who prioritise AI.

Long summary

Many people now work on ensuring that advanced AI has beneficial consequences. But members of this community have made several quite different arguments for prioritising AI.

Early arguments, and in particular Superintelligence, identified the “alignment problem” as the key source of AI risk. In addition, the book relies on the assumption that superintelligent AI is likely to emerge through a discontinuous jump in the capabilities of an AI system, rather than through gradual progress. This assumption is crucial to the argument that a single AI system could gain a “decisive strategic advantage”, that the alignment problem cannot be solved through trial and error, and that there is likely to be a “treacherous turn”. Hence, the discontinuity assumption underlies the book’s conclusion that existential catastrophe is a likely outcome.

The argument in Superintelligence combines three features: (i) a focus on the alignment problem, (ii) the discontinuity assumption, and (iii) the resulting conclusion that an existential catastrophe is likely.

Arguments that abandon some of these features have recently become prominent. They also generally tend to have been made in less detail than the early arguments.

One line of argument, promoted by Paul Christiano and Katja Grace, drops the discontinuity assumption, but continues to view the alignment problem as the source of AI risk. Even under more gradual scenarios, they argue that, unless we solve the alignment problem before advanced AIs are widely deployed in the economy, these AIs will cause human values to eventually fade from prominence. They appear to be agonistic about whether these harms would warrant the label “existential risk”.

Moreover, others have proposed AI risks that are unrelated to the alignment problem. I discuss three of these: (i) the risk that AI might be misused, (ii) that it could make war between great powers more likely, and (iii) that it might lead to value erosion from competition. These arguments don’t crucially rely on a discontinuity, and the risks are rarely existential in scale.

It’s not always clear which of the arguments actually motivates members of the beneficial AI community. It would be useful to clarify which of these arguments (or yet other arguments) are crucial for which people. This could help with evaluating the strength of the case for prioritising AI, deciding which strategies to pursue within AI, and avoiding costly misunderstanding with sympathetic outsiders or sceptics.

Note: This post was written in February 2019 while at the Governance of AI Programme, within the Future of Humanity Institute. I’m publishing it as it stood in February, despite significant flaws, since I’m starting a new job and anticipate I won’t have time to update it. I thank Markus Anderljung, Max Daniel, Jeffrey Ding, Eric Drexler, Carrick Flynn, Richard Ngo, Cullen O’Keefe, Stefan Schubert, Rohin Shah, Toby Shevlane, Matt van der Merwe and Remco Zwetsloot for help with previous versions of this document. Ben Garfinkel was especially generous with his time and many of the ideas in this document were originally his.

Contents

  1. Long summary
  2. Early arguments: the alignment problem and discontinuity assumptions
    1. Concerns about AI before Superintelligence
    2. Bostrom’s Superintelligence
      1. How a single AI system could obtain a decisive strategic advantage
      2. The impossibility of alignment by trial and error
      3. The treacherous turn
  3. The alignment problem without a discontinuity
    1. The basic picture
    2. The importance of competitive pressures
    3. Questions about this argument
  4. Arguments unrelated to the alignment problem
    1. Misuse risks
      1. The basic argument
        1. Questions about this argument
      2. Robust totalitarianism
        1. Questions about this argument
    2. Increased likelihood of great-power war
      1. Questions about this argument
    3. Value erosion from competition
      1. Questions about this argument
  5. People who prioritise AI risk should clarify which arguments are causing them to do so
    1. How crucial is the alignment problem?
    2. What is the attitude towards discontinuity assumptions?
    3. Benefits of clarification
  6. Appendix: What I mean by “discontinuities”
    1. Discontinuities aren’t defined by absolute speed
    2. Discontinuities could happen before “human-level”

Early arguments: the alignment problem and discontinuity assumptions

Concerns about AI before Superintelligence

Since the early days of the field of AI, people have expressed scattered concerns that AI might have a large-scale negative impact. In a 1959 lecture, Speculations on Perceptrons and other Automata, I.J. Good wrote that

whether [an intelligence explosion1] will lead to a Utopia or to the extermination of the human race will depend on how the problem is handled by the machines. The important thing will be to give them the aim of serving human beings.

Around the turn of the millenium, related concerns were being gestured at in Ray Kurzweil’s The Age of Spiritual Machines (1999) and in a popular essay by Bill Joy, Why the Future Doesn’t Need Us (2000). These concerns did not directly draw on I.J. Good’s concept of an intelligence explosion, but did suggest that progress in artificial intelligence could ultimately lead to human extinction. Joy’s emphasizes the idea that AI systems “would compete vigorously among themselves for matter, energy, and space,” suggesting this may cause their prices to rise “beyond human reach” and therefore causing biological humans to be “squeezed out of existence.”

As early as 1997, in How long before superintelligence?, Nick Bostrom highlighted the need to suitably “arrange the motivation systems of [….] superintelligences”. In 2000, Eliezer Yudkowsky co-founded the Machine Intelligence Research Institute (MIRI), then named Singularity institute, with the goal of “sparking the Singularity” by creating a “transhuman AI.” From its inception, MIRI emphasized the importance of ensuring that advanced AI systems are “Friendly,” in the sense of being “beneficial to humans and humanity.” Over the following decade, MIRI’s aims shifted away from building the first superintelligent AI system and toward ensuring that the first such system – no matter who it is built by – will be beneficial to humanity. In a series of essays, Yudkowsky produced the first extensive body of writing describing what is now known as the alignment problem: the problem of building powerful AI systems which reliably try to do what their operators want them to do. He argued that superintelligent AI is likely to come very suddenly, in a single event that leaves humans powerless; if we haven’t already solved the alignment problem by that time, the AI will cause an existential catastrophe.

In Facing the Intelligence Explosion (2013), Luke Muehlhauser, a former executive director of MIRI, gave a succinct account of this concern:

AI leads to intelligence explosion, and, because we don’t know how to give an AI benevolent goals, by default an intelligence explosion will optimize the world for accidentally disastrous ends. A controlled intelligence explosion, on the other hand, could optimize the world for good.

The intelligence explosion, where an AI rapidly and recursively self-improves to become superintelligent, features prominently in this picture. For this essay I find useful the broader notion of a discontinuity in AI capabilities. I’ll define a discontinuity as an improvement in the capabilities of powerful AI that happens much more quickly than what would be expected based on extrapolating past progress. (I further disambiguate this term in the appendix). An intelligence explosion is clearly sufficient, but isn’t necessary for there to be a discontinuity.

In Yudkowsky’s Artificial Intelligence as a Positive and Negative Factor in Global Risk (2008), he expands on the importance of discontinuities to his argument:

From the standpoint of existential risk, one of the most critical points about Artificial Intelligence is that an Artificial Intelligence might increase in intelligence extremely fast. […]

The possibility of sharp jumps in intelligence […] implies a higher standard for Friendly AI techniques. The technique cannot assume the programmers’ ability to monitor the AI against its will, rewrite the AI against its will, bring to bear the threat of superior military force; nor may the algorithm assume that the programmers control a “reward button” which a smarter AI could wrest from the programmers; et cetera.2

Bostrom’s Superintelligence

Superintelligence remains by far the most detailed treatment of the issue, and came to be viewed by many as the canonical statement of the case for prioritising AI. It retains some of the key features of the earlier writing by Bostrom, Yudkowsky, and Muehlhauser.

In particular, in the book we find:

  • the alignment problem as the key source of AI risk
  • discontinuities in AI trajectories as a premise for the argument that:
    • 1) a single AI system could gain a decisive strategic advantage3
    • 2) we cannot use trial and error to ensure that this AI is aligned
    • 3) the treacherous turn will make it much more difficult to react
  • the resulting conclusion4 that an existential catastrophe is likely

If a decisive strategic advantage were gained by an AI that is not aligned with human values, the result would likely be human extinction:

Taken together, these three points [decisive strategic advantage, the orthogonality thesis, and instrumental convergence] thus indicate that the first superintelligence may shape the future of Earth-originating life, could easily have non-anthropomorphic final goals, and would likely have instrumental reasons to pursue open-ended resource acquisition. If we now reflect that human beings consist of useful resources (such as conveniently located atoms) and that we depend for our survival and flourishing on many more local resources, we can see that the outcome could easily be one in which humanity quickly becomes extinct. (Chapter 8).

Let us now turn to the three ways in which discontinuity assumptions are crucial to the argument.

How a single AI system could obtain a decisive strategic advantage

It is the discontinuity assumption that enables Bostrom to argue that a single AI system will gain a decisive strategic advantage, over humans and other AI systems.

If there is no discontinuity, the AI frontrunner is unlikely to obtain far more powerful capabilities than its competitors. The first system that could be deemed superintelligent will emerge in a world populated by only slightly less powerful systems. On the other hand, if an AI system does make discontinuous progress, this progress would put it head and shoulders above the competition, and it could even gain a decisive strategic advantage.

Bostrom’s analysis of AI trajectories focuses on “takeoff”, the time between the “human-level general intelligence” and “radical superintelligence”. A “fast take-off” is one that occurs over as minutes, hours, or days. Bostrom argues that “if and when a takeoff occurs, it will likely be explosive.”

Notice that my definition of a discontinuity in AI capabilities does not exactly coincide with that of a “fast take-off”. This difference, which I explain in more detail in the appendix, is sometimes important. In Chapter 5, Bostrom writes that the frontrunner could “attain a decisive strategic advantage even if the takeoff is not fast”. However, he justifies this with reference to a scenario that involves a strong discontinuity5.

The impossibility of alignment by trial and error

The discontinuity removes the option of using trial and error to solve the alignment problem. The technical problem of aligning an AI with human interests remains regardless of the speed of AI development6. But if AI systems are developed more slowly, one might expect these problems to be solved by trial and error as the AI gains in capability and begins to cause real-world accidents. In a continuous scenario, AI remains at the same level of capability long enough for us to gain experience with deployed systems of that level, witness small accidents, and fix any misalignment. The slower the scenario, the easier it is to do this. In a moderately discontinuous scenario, there could be accidents that kill thousands of people. But it seems to me that a very strong discontinuity would be needed to get a single moment in which the AI causes an existential catastrophe.

The treacherous turn

A key concept in Bostrom’s argument is that of the treacherous turn:

The treacherous turn—While weak, an AI behaves cooperatively (increasingly so, as it gets smarter). When the AI gets sufficiently strong—without warning or provocation—it strikes, forms a singleton7, and begins directly to optimize the world according to the criteria implied by its final values.

The treacherous turn implies that:

  • the AI might gain a decisive strategic advantage without anyone noticing
  • the AI might hide the fact that it is misaligned

Bostrom explains that:

[A]n unfriendly AI of sufficient intelligence realizes that its unfriendly final goals will be best realized if it behaves in a friendly manner initially, so that it will be let out of the box. […] At some point, an unfriendly AI may become smart enough to realize that it is better off concealing some of its capability gains. It may underreport on its progress and deliberately flunk some of the harder tests, in order to avoid causing alarm before it has grown strong enough to attain a decisive strategic advantage. The programmers may try to guard against this possibility by secretly monitoring the AI’s source code and the internal workings of its mind; but a smart-enough AI would realize that it might be under surveillance and adjust its thinking accordingly.

In these scenarios, Bostrom is imagining an AI with the ability for very sophisticated deception. Crucially, the AI goes from being genuinely innocuous to being a cunning deceiver without passing through any intermediate steps: there are no small-scale accidents that could reveal the AI’s misaligned goals, nor does the AI ever make a botched attempt at deception that other actors can discover. This relies on the assumption of a very strong discontinuity in the AI’s abilities. The more continuous the scenario, the more experience people are likely to have with deployed systems of intermediate sophistication, the lower the risk of a treacherous turn.

The alignment problem without a discontinuity

More recently, Paul Christiano and Katja Grace have argued that, even if there is no discontinuity, AI misalignment still poses a risk of negatively affecting the long-term trajectory8 of earth-originating intelligent life. According to this argument, once AIs do nearly all productive work, humans are likely to lose control of this trajectory to the AIs. Christiano and Grace argue that (i) solving the alignment problem and (ii) reducing competitive pressures to deploy AI would help ensure that human values continue to shape the future.

In terms of our three properties: Christiano and Grace drop the discontinuity assumption, but continue to view the alignment problem as the source of AI risk. It’s unclear whether the risks they have in mind would qualify as existential.

The arguments in this section and the next section (“arguments unrelated to the alignment problem”) have been made much more briefly than the early arguments. As a result, they leave a number of open questions which I’ll discuss for each argument in turn.

The basic picture

The argument appears to be essentially the following. When AIs become more capable than humans at economically useful tasks, they will be given increasingly more control over what happens. The goals programmed into AIs, rather than human values, will become the primary thing shaping the future. Once AIs make most of the decisions, it will become difficult to remove them or change the goals we have given them. So, unless we solve the alignment problem, we will lose (a large chunk of) the value of the future.

This story is most clearly articulated in the writings of Paul Christiano, a prominent member of the AI safety community who works in the safety team at OpenAI. In a 2014 blog post, Three Impacts of Machine Intelligence, he writes:

it becomes increasingly difficult for humans to directly control what happens in a world where nearly all productive work, including management, investment, and the design of new machines, is being done by machines. […] I think human management becomes increasingly implausible as the size of the world grows (imagine a minority of 7 billion humans trying to manage the equivalent of 7 trillion knowledge workers; then imagine 70 trillion), and as machines’ abilities to plan and decide outstrip humans’ by a widening margin. In this world, the AI’s that are left to do their own thing outnumber and outperform those which remain under close management of humans.

As a result, AI values, rather than human values, will become the primary thing shaping the future. The worry is that we might therefore get “a future where our descendants maximiz[e] some uninteresting values we happened to give them because they were easily specified and instrumentally useful at the time.”

In his interview on the 80,000 Hours podcast, Christiano explains that he sees two very natural categories of things that affect the long run trajectory of civilisation: extinction, which is sticky because we can never come back from it, and changes in the distribution of values among agents, which “can be sticky in the sense that if you create entities that are optimizing something, those entities can entrench themselves and be hard to remove”. The most likely way the distribution of values will change, according to him, is that as we develop AI, we’ll “pass the torch from humans, who want one set of things, to AI systems, that potentially want a different set of things.”

Katja Grace, the founder of AI Impacts, explicitly addresses the point about development trajectories (also on the 80,000 Hours podcast): “even if things happen very slowly, I expect the same problem to happen in the long run: AI being very powerful and not having human values.” She gives an example of this slow-moving scenario:

suppose you’re a company mining coal, and you make an AI that cares about mining coal. Maybe it knows enough about human values to not do anything terrible in the next ten years. But it’s a bunch of agents who are smarter than humans and better than humans in every way, and they just care a lot about mining coal. In the long run, the agents accrue resources and gain control over things, and make us move toward mining a lot of coal, and not doing anything that humans would have cared about.9

The importance of competitive pressures

There is likely to be a trade-off, when building an AI, between making it maximally competent at some instrumentally useful goal, and aligning it with human values.10

In the 80,000 Hours interview, Christiano said: “I think the competitive pressure to develop AI, in some sense, is the only reason there’s a problem”, because it takes away the option of slowing down AI development until we have a good solution to the alignment problem.

According to Christiano, there are therefore two ways to make a bad outcome less likely: coordinating to overcome the competitive pressure, or making technical progress to alleviate the trade-off.

Questions about this argument

This argument for prioritising AI has so far only been sketched out in a few podcast interviews and blog posts. It has also been made at a high level of abstraction, as opposed to relying on a concrete story of how things might go wrong. Some key steps in the argument have not yet been spelled out in detail. For example:

  • There isn’t really a very detailed explanation of why misalignment at an early stage (e.g. of a coal-mining AI) couldn’t be reversed as the AI begins to do undesirable things. If AIs only gradually gain the upper hand on humanity, one might think there would be many opportunities to update the AIs’ values if they cease to be instrumentally useful.
  • In particular, competitive pressures explain why we would deploy AI faster than is prudent, but they don’t explain why relatively early misalignment should quickly become irreversible. If my AI system is accidentally messing up my country, and your AI system is accidentally messing up your country, we both still have strong incentives to figure out how to correct the problem in our own AI system.

Arguments unrelated to the alignment problem

Recently, people have given several new arguments for prioritising AI, including: (i) risks that AI might be misused by bad actors, (ii) that it might make great-power war more likely and (iii) value erosion from competition. These risks are unrelated to the alignment problem. Like those in the previous section, these new arguments have mostly been made briefly.

Misuse risks

The basic argument

The Open Philanthropy Project (OpenPhil) is a major funder in AI safety and governance. In OpenPhil’s main blog post on potential risks from advanced AI, their CEO Holden Karnofsky writes:

One of the main ways in which AI could be transformative is by enabling/accelerating the development of one or more enormously powerful technologies. In the wrong hands, this could make for an enormously powerful tool of authoritarians, terrorists, or other power-seeking individuals or institutions. I think the potential damage in such a scenario is nearly limitless (if transformative AI causes enough acceleration of a powerful enough technology), and could include long-lasting or even permanent effects on the world as a whole.11

Karnofsky’s argument (which does not crucially rely on discontinuities) seems to be the following:

  • AI will be a powerful tool
  • If AI will be a powerful tool, then AI presents severe bad-actor risks
  • The damage from bad-actor AI risks could be long-lasting or permanent

For a more detailed description of particular misuse risks, we might turn to the report titled The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation (2018). However, this report focuses on negative impacts that are below the level of a global catastrophic risk, for example: cyberattacks, adversarial examples and data poisoning, autonomous weapons, causing autonomous vehicles to crash, and similar.

Questions about this argument

  • Overall, the argument from the misuse risks discussed above seems to have only been briefly sketched out.
  • Karnofsky’s argument is very general, and doesn’t fully explain the focus on AI as opposed to other technologies
  • A similar argument to Karnofsky’s could be made for any potentially transformative technology (e.g. nanotechnology). Why focus on the misuse of AI? There are many potential reasons, for example:
    • AI is far more transformative than other technologies, and therefore far more dangerous in the wrong hands.
    • We are in a particularly good position to prevent misuse of AI, compared to misuse of other technologies.
    • The blog post does not say which reasons are the crucial drivers of Karnofsky’s view that AI misuse risks are particularly deserving of attention.
  • The inference “If AI will be a powerful tool, then AI presents severe bad-actor risks” hasn’t been explained in detail.
    • A technology can be powerful without increasing bad actor risks. Whether a given technology increases bad actor risks seems to hinge on complicated questions around the relative efficacy of offensive vs. defensive applications, the way in which capabilities will be distributed between different actors.
    • Even nuclear weapons have arguably decreased the risk of “bad actor” states initiating invasions or wars.
  • No-one has yet made a detailed case for why we should expect the risks discussed in this section to rise to the level of global catastrophic risks

Robust totalitarianism

One type of misuse risk that has been described in slightly more detail is that of totalitarian regimes using AI to entrench their power, possibly for the very long-run. One of the four sources of catastrophic risk on the research agenda of the Center for the Governance of AI (GovAI) is “robust totalitarianism ​[…] enabled by advanced lie detection, social manipulation, autonomous weapons, and ubiquitous physical sensors and digital footprints.” The research agenda states that “power and control could radically shift away from publics, towards elites and especially leaders, making democratic regimes vulnerable to totalitarian backsliding, capture, and consolidation.” The argument from totalitarianism does not crucially depend on discontinuity assumptions.12

According to this argument, AI technology has some specific properties, such that AI will shift the balance of power towards leaders, and facilitate totalitarian control.

Questions about this argument

  • No detailed case yet regarding the effects of AI on totalitarianism
    • It seems plausible that the technologies mentioned (“advanced lie detection, social manipulation, autonomous weapons, and ubiquitous physical sensors and digital footprints”) would be useful to totalitarians. But some applications of them surely push in the other direction. For example, lie detection could be applied to leaders to screen for people likely to abuse their power or turn away from democratic institutions.
    • In addition, it is conceivable that other AI-enabled technologies might push against totalitarianism.
    • As of yet, in the public literature, there has been no systematic examination of the overall effect of AI on the probability of totalitarianism.
  • Long-term significance has not been much argued for yet
    • Suppose that AI-facilitated totalitarianism is plausible. From a long-termist point of view, the important question is whether this state of affairs is both (i) relatively avoidable and (ii) stable for the very long term.13 Such points of leverage, where something could go one way or the other, but then “sticks” in a foreseeably good or bad way, are probably rare.
    • The only academic discussion of the topic I could find is Caplan 2008, “The Totalitarian Threat”. The article discusses risk factors for stable totalitarianism, including technological ones, but takes the view that improved surveillance technology is unlikely to make totalitarianism last longer.14

Increased likelihood of great-power war

The GovAI research agenda presents four sources of catastrophic risk from AI. One of these is the risk of “preventive, inadvertent, or unmanageable great-power (nuclear) war​.” The research agenda explains that:

Advanced AI could give rise to extreme first-strike advantages, power shifts, or novel destructive capabilities, each of which could tempt a great power to initiate a preventive war. Advanced AI could make crisis dynamics more complex and unpredictable, and enable faster escalation than humans could manage, increasing the risk of inadvertent war.15

Breaking this down, we have two risks, and for each risk, some reasons AI could heighten it:

  1. Preventive war
    1. First-strike advantages
    2. Power shifts
    3. Novel destructive capabilities
  2. Inadvertent war
    1. More complex and unpredictable crisis dynamics
    2. Faster escalation than humans can manage

This publication from the RAND Corporation summarises the conclusions from a series of workshops that brought together experts in AI and nuclear security to explore how AI might affect the risk of nuclear war by 2040. The authors discuss several illustrative cases, for example the possibility that AI might undermine second-strike capability by allowing better targeting and tracking of mobile missile launchers.16

Questions about this argument

  • Specificity to AI is still unclear
    • With the exception of point 2.2 (AIs enabling faster escalation than humans can manage), these arguments don’t seem very specific to AI.
    • Many technologies could lead to more complex crisis dynamics, or give rise to first-strike advantages, power shifts, or novel destructive capabilities.
    • It could still be legitimate to prioritise the AI-caused risks most highly. But it would require additional argument, which I haven’t seen made yet.
  • What is the long-termist significance of a great-power war?
    • Great-power nuclear war would lead to a nuclear winter, in which the burning of cities sendings smoke into the upper atmosphere.
    • There is significant uncertainty about whether a nuclear winter would cause an existential catastrophe. My impression is that most people in the existential risk community believe that even if there were an all-out nuclear war, civilisation would eventually recover, but I haven’t carefully checked this claim17.
    • According to a blog post by Nick Beckstead, many long-termists believe that a catastrophic risk reduction strategy should be almost exclusively focused on reducing risks that would kill 100% of the world’s population, but Beckstead believes that sub-extinction catastrophic risks should also receive attention in a long-termist portfolio.
    • It has been suggested that great-power war could accelerate the development of new and potentially very dangerous technologies.
  • What are the practical implications of the argument? If great-power nuclear war were one of the main risks from AI, this might lead us to work directly on improving relations between great powers or reducing risks of nuclear war rather than prioritising AI.

Value erosion from competition

According to the GovAI research agenda, another source of catastrophic risk from AI is

systematic value erosion from competition, in which each actor repeatedly confronts a steep trade-off between pursuing their final values or pursuing the instrumental goal of adapting to the competition so as to have more power and wealth.

As stated, this is an extremely abstract concern. Loss of value due to competition rather than cooperation is ubiquitous, from geopolitics to advertising. Scott Alexander vividly describes the value that is destroyed in millions of suboptimal Nash equilibria throughout society.

Why might AI increase the risk of such value erosion to a catastrophic level?

In the publicly available literature, this risk has not been described in detail. But some works are suggestive of this kind of risk:

  • In The Age of Em, Robin Hanson speculates about a future in which AI is first achieved through emulations (“ems”) of human minds. He imagines this as a hyper-competitive economy in which, despite fantastic wealth from an economy that doubles every month or so, wages fall close to Malthusian levels and ems spend most of their existence working. However, they “need not suffer physical hunger, exhaustion, pain, sickness, grime, hard labor, or sudden unexpected death.” There is also a section in Superintelligence asking, “would maximally efficient work be fun?”
  • In Artificial Intelligence and Its Implications for Income Distribution and Unemployment (Section 6) Korinek and Stiglitz imagine an economy in which humans compete with much more productive AIs. AIs bid up the price of some scarce resource (such as land or energy) which is necessary to produce human consumption goods. Humans “lose the malthusian race” as growing numbers of them decide that given the prices they face, they prefer not to have offspring.18

Questions about this argument

This argument is highly abstract, and has not yet been written up in detail. I’m not sure I’ve given an accurate rendition of the intended argument. So far I see one key open question:

  • Collective action problems which we currently face typically erode some, but not all value. Why do we expect more of the value to be eroded once powerful AI is present?

People who prioritise AI risk should clarify which arguments are causing them to do so

How crucial is the alignment problem?

The early case for prioritising AI centered on the alignment problem. Now we are seeing arguments that focus on other features of AI; for example, AI’s possible facilitation of totalitarianism, or even just the fact that AI is likely to be a transformative technology. Different members of the broad beneficial AI community might view the alignment problem as more or less central.

What is the attitude towards discontinuity assumptions?

For long-termists, I see three plausible attitudes19:

  • They prioritise AI because of arguments that rely on a discontinuity, and they think a discontinuous scenario is probable. The likelihood of a discontinuity is a genuine crux of their decision to prioritise AI.
  • They prioritise AI for for reasons that do not rely on a discontinuity
  • They prioritise AI because of possibility of discontinuity, but its likelihood is not a genuine crux, because they see no plausible other ways of affecting the long-term future.

Of course, these are three stylised attitudes. It’s likely that many people have an intermediate view that attaches some credence to each of these stories. Even if most people are somewhere in the middle, identifying these three extreme points on the spectrum can be a helpful starting point.

The third of these attitudes is really exclusive to long-termists. For more conventional ways of prioritising, there are many plausible contenders for the top priority, and the likelihood of a risk scenario should be crucial to the decision of whether to prioritise mitigating that risk. Non long-termists could take either of the other two attitudes towards discontinuities.

Benefits of clarification

My view that people should clarify why they prioritise AI is mostly based on a heuristic that confusion is bad, and we should know why we make important decisions. I can also try to give some more specific reasons:

  • The motivating scenario should have strong implications about which activities to prioritise within AI. To take the most obvious example, technical work on the alignment problem is critical for the scenarios that center around misalignment, and unimportant otherwise. Preparing for a single important ‘deployment’ event only makes sense under discontinuous scenarios.20
  • Hopefully, the arguments that motivate people are better than the other arguments. So focusing on these should facilitate the process of evaluating the strength of the case for AI, and hence the optimal size of the investment in AI risk reduction.
  • Superintelligence remains the only highly detailed argument for prioritising AI. Other justifications have been brief or informal. Suppose we learned that one of the latter group of arguments is what actually motivates people. We would realise that the entire publicly available case for prioritising AI consists of a few blog posts and interviews.
  • Costly misunderstandings could be avoided, both with people who are sceptical of AI risk and with sympathetic people who are considering entering this space.
    • Many people are sceptical of AI risk. It may not currently be clear to everyone involved in the debate why some people prioritise AI risk. I would expect this to lead to unproductive or even conflictual conversations, which could be avoided with more clarification.
    • People who are considering entering this space might be confused by the diversity of arguments, and might be led to the wrong conclusion about whether their skills can be usefully applied.
  • If arguments which assume discontinuities are the true motivators, then the likelihood of discontinuities is plausibly a crux of the decision to prioritise AI. This would suggest that there is very high value of information in forecasting the likelihood of discontinuities.

Appendix: What I mean by “discontinuities”

By discontinuity I mean an improvement in the capabilities of powerful AI that happens much more quickly than what would be expected based on extrapolating past progress. This is obviously a matter of degree. In this document I apply the label “discontinuity” only to very large divergences from trend, roughly those that could plausibly lend themselves to a single party gaining a decisive strategic advantage.

If there is a discontinuity, then the first AI system to undergo this discontinuous progress will become much more capable than other parties. The sharper the discontinuity, the less likely it is that many different actors will experience the discontinuity at the same time and remain at comparable levels of capability.

Below I detail two ways in which this notion of discontinuity differs from Bostrom’s “fast take-off”.

Discontinuities aren’t defined by absolute speed

Bostrom defines a “fast take-off” as one that occurs over minutes, hours, or days.

The strategically relevant feature of the discontinuous scenarios is that a single AI system increases in capabilities much faster than other actors. (These actors could be other AIs, humans, or humans aided by AI tools). No actor can react quickly enough to ensure that the AI system is aligned; and no actor can prevent the AI system from gaining a decisive strategic advantage.

By defining a “fast take-off” with the absolute numerical values “minutes, hours, or days”, Bostrom is essentially making the prediction that such a “take-off” would indeed be fast in a strategically relevant sense. But this could turn out to be false. For example, Paul Christiano predicts that “in the worlds where AI radically increases the pace of technological progress […] everything is getting done by a complex ecology of interacting machines at unprecedented speed.”

The notion of discontinuities is about the shape of the “curve of AI progress” – specifically, how discontinuous or kinked it is – and is agnostic about absolute numerical values. In this way, I think it better tracks the strategically relevant feature.

Discontinuities could happen before “human-level”

Bostrom’s analysis of AI trajectories is focused on the “take-off” period, which he defines as the period of time that lies between the development of the first machine with “human-level general intelligence” and the development of the first machine that is “radically superintelligent”. There is little analysis of trajectories before “human-level general intelligence” is achieved.

One approach is to define a machine as having “human-level general intelligence” if it is at least as good as the average human at performing (or perhaps quickly learning) nearly any given cognitive task. But then it seems that many risky events could occur before human-level general intelligence. For example, one could imagine an AI system that is capable of running most of a country’s R&D efforts, but lacks the ability to engage in subtle forms of human interaction such as telling jokes.

The notion of discontinuity is not restricted in this way. A discontinuity could occur at any point during the development of powerful AI systems, even before “human-level”.

  1. In an intelligence explosion, an AI rapidly and recursively self-improves to become superintelligent. 

  2. Yudkowsky does not explicitly say whether discontinuity assumptions are a crux of his interest in AI risk. He merely remarks: “I tend to assume arbitrarily large potential jumps for intelligence because (a) this is the conservative assumption; (b) it discourages proposals based on building AI without really understanding it; and (c) large potential jumps strike me as probable-in-the-real-world.” In a 2016 Facebook post, reprinted by Bryan Caplan, Yudkowsky describes “rapid capability gain” as one of his three premises for viewing AI as a critical problem to be solved. If discontinuities imply “a higher standard for Friendly AI techniques”, this suggests that AI safety work would still be needed in more continuous scenarios, but would only need to meet a lower standard. But we are not told how low this standard would be, and it if would still, in Yudkowsky’s view, justify prioritising AI. Regardless, Yudkowsky has not given any detailed argument for viewing AI as a catastrophic risk (let alone an existential one) if there are no discontinuities. 

  3. Defined by Bostrom as “a level of technological and other advantages sufficient to enable […] complete world domination”. 

  4. Bostrom also discusses multipolar scenarios that could result from more continuous trajectories, and some of these scenarios could arguably be sufficiently bad to warrant the label “existential risk” – but these scenarios are not the focus of the book and nor, in my view, did they seem to shape the priorities inspired by the book. 

  5. Bostrom writes: “Consider the following medium takeoff scenario. Suppose it takes a project one year to increase its AI’s capability from the human baseline to a strong superintelligence, and that one project enters this takeoff phase with a six-month lead over the next most advanced project. The two projects will be undergoing a takeoff concurrently. It might seem, then, that neither project gets a decisive strategic advantage. But that need not be so. Suppose it takes nine months to advance from the human baseline to the crossover point, and another three months from there to strong superintelligence. The frontrunner then attains strong superintelligence three months before the following project even reaches the crossover point. This would give the leading project a decisive strategic advantage […]. Since there is an especially strong prospect of explosive growth just after the crossover point, when the strong positive feedback loop of optimization power kicks in, a scenario of this kind is a serious possibility, and it increases the chances that the leading project will attain a decisive strategic advantage even if the takeoff is not fast.” In this scenario, what enables the frontrunner to obtain a decisive strategic advantage is the existence of crossover point just after which there is explosive growth. But that is precisely a discontinuity. 

  6. The paper Concrete Problems in AI Safety describes five sources of AI accidents. They stand on their own, separate from discontinuity considerations. 

  7. A singleton is “a world order in which there is at the global level a single decision-making agency”. 

  8. Here and in the rest of the document, I mean “long-term” in the sense of potentially many millions of years. Beckstead (2013), On the overwhelming importance of shaping the far future, articulated “long-termism”, the view that we should focus on the trajectory of civilisation over such very long time-scales. See here for a short introduction to long-termism. 

  9. This quote is lightly edited for clarity. 

  10. If we don’t know anything about alignment, the trade-off is maximally steep: we can either have unaligned AI or no AI. Technical progress on the alignment problem would partially alleviate the trade-off. In the limit of a perfect solution to the alignment problem, there would be no trade-off at all. 

  11. To be clear, in addition to misuse risks, OpenPhil is also interested in globally catastrophic accidents from AI. 

  12. Of course, AI trajectories might have some bearing on the argument. One might believe that civil society will be slow to push back against new AI-enabled totalitarian threats, while states and leaders will be quick to exploit AI for totalitarian purposes. If this is true, very fast AI development might slightly increase the risk of totalitarianism. 

  13. If it were the nearly unavoidable consequence of AI being developed, there would be no point trying to oppose it. If the totalitarian regime would eventually collapse, (i.e. fail to be robust for the very long run), then, although an immeasurable tragedy from a normal perspective, its significance would be small from the long-termist point of view. 

  14. Caplan writes: “Orwell’s 1984 described how new technologies would advance the cause of totalitarianism. The most vivid was the “telescreen,” a two-way television set. Anyone watching the screen was automatically subject to observation by the Thought Police. Protagonist Winston Smith was only able to keep his diary of thought crimes because his telescreen was in an unusual position which allowed him to write without being spied upon. Improved surveillance technology like the telescreen would clearly make it easier to root out dissent, but is unlikely to make totalitarianism last longer. Even without telescreens, totalitarian regimes were extremely stable as long as their leaders remained committed totalitarians. Indeed, one of the main lessons of the post-Stalin era was that a nation can be kept in fear by jailing a few thousand dissidents per year.” 

  15. It’s worth noting that this set of risks is distinct from misuse risks. Misuse involves the intentional use of AI for bad purposes, whereas here, the argument is that AI might make war more likely, regardless of whether any party uses an AI system to directly harm an adversary. See this essay for an explanation of how some risks from AI arise neither from misuse nor from accidents. 

  16. Mobile missile launchers move regularly via road or rail. Many states use them because they are difficult to track and target, and therefore constitute a credible second-strike capability. The RAND publication states that “AI could make critical contributions to intelligence, surveillance, and reconnaissance (ISR) and analysis systems, upending these assumptions and making mobile missile launchers vulnerable to preemption.” 

  17. This post by Carl Schulman is relevant. 

  18. The details of the model are in Korinek (2017), a working paper called Humanity, Artificial Intelligence, and the Return of Malthus, which is not publicly available online. Here are slides from a talk about the working paper. 

  19. There are some other conceivable attitudes too. One could, for example, find a discontinuity probable, but still not focus on those scenarios, because one finds that we’re certainly doomed under such a scenario. 

  20. These are just some quick examples. I would be interested in a more systematic investigation of what chunks of the problem people should break off depending on what they believe the most important sources of risk are. 

How to use a pebble smartwatch as a pomodoro timer

December 28, 2018

When using the pomodoro technique, I’ve found it useful to have access to the pomodoro timer in the lowest-friction way possible. In particular, I like having the timer on my wrist. I implemented this with a Pebble watch, which I bought used for £40.

With the timer on your wrist, you can walk away from your computer during breaks, without having to go back and check how much break time you have left. During long breaks (usually 20 minutes every 4 pomos), you can enjoy a walk outside the office (pictured below). The watch will also vibrate at the beginning and end of every pomo.

The benefits of having the timer on your wrist instead of your phone may seem trivial. In practise, I have found that even a little bit of friction can cause me to fall off the pomodoro wagon.

The Pebble company is now defunct, but the watches can easily be found on eBay. The Pebble app store has also been shut down, but I still have the .pbw files of two pomodoro apps: Solanum and Simple Pomodoro. You can sideload these apps onto your Pebble through the Pebble smartphone app. The pebble smartphone app still exists on the Google and Apple app stores, and could itself be sideloaded if it ever disappears.

I prefer Simple Pomodoro, because it supports long breaks. Solanum is more beautiful.

It’s possible to use a Pebble pomo timer in tandem with any other timer. I like having a timer on my computer screen. Just start the two timers at the same time and they’ll keep going in sync.

Should we prioritise cash transfers to the urban poor?

June 18, 2018

Note: I owe these ideas to Rossa O’Keeffe-O’Donovan and Natalie Quinn. However, the views expressed here are solely mine.

It’s well known that cash transfers have effects on prices. For instance, Cunha, De Giorgi and Jayachandran (2017) write:

[According to standard models], cash transfers increase the demand for normal goods, which will lead to price increases. This prediction holds either with perfect competition and marginal costs that are increasing in quantity, or with imperfect competition even if marginal costs are constant or decreasing […].

It’s helpful to think in terms of real goods instead of thinking in terms of money. Suppose there was a village in complete autarky: trade only occurs within the village. Say we gave everyone in the village $1,000. Nothing on the ground directly changes: they don’t get better tools, or more fertile land, just because we give them fiat money. What happens to consumption depends on the income distribution prior to the transfer. If all villagers have the same income, they will keep producing the same goods as before, and exchanging them in the same way, the prices will simply be higher1. If they had different incomes, the cash transfer will make them more equal in purchasing power, while also increasing prices (imagine if the transfer was a billion dollars). This amounts to a modest transfer of real goods from richer to poorer villagers2.

Now suppose we only gave $1,000 to a randomly selected half of the villagers. Nothing immediately happens to the productive capacity of the village. Prices rise, which means purchasing power is transferred from non-recipients to recipients. The exact effect on consumption is a bit complicated. In an extremely simple economy where everyone consumes as much as possible of the staple good, recipients consume more, non-recipients consume less, and that’s the end of it. More realistically, the recipients create some new demand for luxury goods. Regardless, the welfare effect of the purchasing power transfer seem likely to be negative. What certainly doesn’t happen is that real goods are transferred from the person who funded the cash transfer programme (say, a GiveDirectly donor) to anyone in the village. This is impossible since the village is in autarky.

Even imperceptible price effects can change everything

If the village were trading with other villages, the price increases would be more spread out, and the effect would be a transfer of purchasing power to the recipients from everyone else in the trade market. More generally, a cash transfer to recipient A is a real good transfer to A, from everyone who trades with A, directly or indirectly. An interesting empirical point is the following. If the market is large, the price effects could be spread across tens of thousands of people, thus they would be very small for each person. Then extremely high statistical power is required to distinguish these price effects from zero. But imperceptible effects on large numbers of people can still be very morally important (see Parfit’s Five Mistakes in Moral Mathematics). In this case, the imperceptible price effects reflect the fact that the cash transfer can’t increase real consumption within the market. So they are everything that matters!

In the case of GiveDirectly, things aren’t quite as bad as in my toy examples above. GiveDirectly targets the poorest inhabitants in a village, so even if the village were in complete autarky, GiveDirectly would cause a real goods transfer from richer inhabitants of the village to the poorer inhabitants (and a much bigger one than if the transfer was given to everyone). If all of western Kenya was in autarky, but there was costless trading within, the transfer would be from everyone in western Kenya (weighted by how much they consume) to the recipients. Notice how different this is from what we might have expected before taking into account price effects: a real transfer from the GiveDirectly donor to the recipient.

Cunha and colleagues’ empirical findings corroborate this:

In the more economically developed villages in the sample, households’ purchasing power is only modestly affected by these price effects. In the less developed villages, the price effects are much larger in magnitude, which we show is partly due to these villages being less tied to the outside economy and partly due to their having less competition among local suppliers.

Now let’s take another case: you give money to your neighbour (who like you, has perfect access to the world economy). There is no price effect. You both would have spent the money on the world market. There is a real goods transfer from you to your neighbour. This is the only case where reality perfectly matches intuition. The same would happen if your neighbour were a Kenyan who has access to world markets3.

The upshot

Now, to me there is an obvious policy implication here: you should transfer cash not to the poorest, but to those who score highest on some combination of low income and access to the world economy. Income being equal, you should transfer to the most globalised recipients you can find. The importance of this can be enormous. If you give to someone in an autarkic village, you transfer goods from poor Kenyans to very poor Kenyans. If you give to a globalised city-dweller, the real transfer is from yourself to the recipient.

I haven’t been able to find any discussion of this policy implication in the literature. This is quite surprising for something that naturally follows from supply and demand theory.

There could be practical problems with this policy. Most obviously, people who live in globalised cities are typically rich, and those who live in isolated villages are typically poor. So the ideal recipient, say someone who lives in Mumbai but is as poor as a rural day labourer, may not exist. (It could also be harder to identify the poor in the cities. But the operation costs of a cash transfer programme could perhaps be lower.)

There is some empirical data on this, for example from the World Bank report Geographic Dimensions of Well-Being in Kenya: Where are the Poor? (henceforth GdWBK). GdWBK chapter 2 tells us that the “poverty line is determined based on the expenditure required to purchase a food basket that allows minimum nutritional requirements to be met (set at 2,250 calories per adult equivalent (AE) per day) in addition to the costs of meeting basic non-food needs”. The rural poverty line (Ksh 1,239) is less than half the urban poverty line (Ksh 2,648). I find it surprising that basic goods really are twice as expensive in urban areas, but let’s take these estimates at face value. GdWBK doesn’t directly give information about the income distribution, but it tells us about the poverty gap index in different areas.

Wikipedia defines the poverty gap index as where is the total population, is the total population of poor who are living at or below the poverty threshold, is the poverty line, and is the income of the poor individual . In this calculation, individuals whose income is above the poverty line have a gap of zero. Multiplying the poverty gap index by , we get , the average poverty gap among the poor.

From GdWBK chapter 5, I compute for the rural areas of the Western District of Kenya, for the district of Nairobi, implying that the Nairobian poor live on an average of 67% of the urban poverty line and the poor in rural parts of the Western District live on an average of 51% of the rural poverty line.

  1. This is known as the classical dichotomy in economics. 

  2. The more equal village might also shift its production and consumption towards more staples and fewer luxuries. 

  3. Except for the fact that Kenya uses a different currency, so you have to take into account exchange rates, which might not be free-floating. I don’t understand macroeconomics, so let’s ignore that. 

Meta-ethical theories

June 17, 2018

Here’s a diagram of meta-ethical theories, loosely based on Figure 1.9 in Miller’s Introduction to contemporary metaethics.1

  1. The javascript embedding might break one day. Backups are in .png and .xml