Fighting the evil influence of Facebook (but keeping the good bits): a manifesto and how-to guide

October 13, 2017

From SMBC-comics.com

Note: In a fit of pure indulgence, I began this essay with the manifesto bit, 1400 words where I rage against certain shades of red and criticise Norwegian studies. You might justifiably prefer to skip straight to the how-to-guide.

I.

Every day, 10,000 versions of Facebook are being A/B tested, with one goal: making you spend more time on the website, and less on life. The shade of red of the notification button, you can be sure, is the result of a ruthless optimisation process in which less addictive hues were terminated. Now, whenever that guy you met at a party three years ago likes one of your comments, the red badge sparks to life and you get to feel, for one second, like a caged rat receiving a joyless orgasm through a wire. This state of affairs has been scientifically determined to be optimal from the point of view of the shareholders of Facebook, Inc.

When Facebook started out, the best way to make users spend more time on the website was to make it more useful to them. Hence for instance photo-sharing, events, or messaging. And Facebook is still providing that value, and more. It facilitates what is good about social interaction as well as what is bad. When all these features have exhausted their potential, however, what remains to create marginal increases in engagement is to make the product more addictive.

Facebook has embraced this strategy with tremendous success. When outcomes can be easily measured and experiments are cheap to run, optimisation can proceed very quickly and does not require an understanding of the underlying mechanisms. If you used every nudge the disciplines of psychology and economics have uncovered to make an addictive website, you wouldn’t come close to this level of optimality.

II.

Okay, I’ve been theorising a lot. This is the part where I planned to say: “How about some empirical evidence?”, hoping to settle the case with any naysayers while showcasing my epistemic virtue. Turns out the studies didn’t really agree with me, but instead of changing my mind I doubted the studies.

405 Norwegian university students were asked how often during the last year they :”(i) Spent a lot of time thinking about Facebook or planned use of Facebook?”, “(ii) Felt an urge to use Facebook more and more?” or (iii) “Tried to cut down on the use of Facebook without success?”. On average they replied that they “rarely” or “very rarely” did so.1. Among the big five, low conscientiousness predicted higher Facebook usage. Turkish researchers found similar results. The time-honoured method of eyeballing standard deviations suggests that maybe 15% of the students report engaging in addictive behaviours “sometimes” or more often.

I found these studies a bit underwhelming. Is it really true that failing to cut down on Facebook usage is a “rare” or “very rare” event? That doesn’t match what I observe in myself and others around me. Among my peers, checking one’s notifications right before and right after a class seems like average behaviour. With lectures and meals, if people don’t bother to do so it’s because they’re often on their phones during the event.

Cal Newport’s Deep Work2, and the research he discusses, rang far more true to me than the Norwegian and Turkish studies. The finding that students find Facebook only moderately addictive is also in tension with the fact that people with low conscientiousness spend more time on Facebook. (Or maybe everyone out there has just great conscientiousness, thank you very much?). For what it’s worth, I’m giving about 50% weight to these peer-reviewed, N>400, 100+-citation studies, and 50% to my intuitive guesses here.

Okay, I’d better get back to pontificating without the nuisance of empirical data.

III.

I haven’t yet talked about the second of Facebook’s harms: social competition. Do you ever feel pain when you see pictures of your friends’ amazingly successful and carefree life, socially, romantically, and professionally speaking? I do. Social competition is a zero-sum game, and Facebook is giving us a way to shovel even more of our resources into this pit of destruction. Not only that, but Facebook is unique among dominance-hierarchy tools in giving each user near-complete control over the aspects of their life they choose to show. Everyone spends more time making themselves look good, and also feels like a social failure when comparing their actual life to the airbrushed version of other people’s. It’s like a massive prisoners’ dilemma where 2 billion people constantly defect on each other. Everyone loses, except, of course, Facebook, which collects a smidgen of revenue every time a user posts a photo and makes someone else sad.

This time peer review has valiantly come to my support. One study using experience sampling reports: “The more people used Facebook at one time point, the worse they felt the next time we text-messaged them; the more they used Facebook over two-weeks, the more their life satisfaction levels declined over time.” A study on panel data which used objective measures of Facebook use, pulled directly from participants’ Facebook accounts, found that “a 1-standard-deviation increase in likes, links clicked, or status updates was associated with a decrease of 5%-8% of a standard deviation in self-reported mental health”.

Okay, but we should be sceptical of differences-in-differences. How about an RCT? I found a few experiments3, and they all show that Facebook usage makes people sadder. This was especially the case for passive use as opposed to active use (posting, messaging, commenting); and envy4 was thought to be a major explanation. I especially liked this Danish study which asked randomly assigned half of 1095 people to quit facebook for a week (participants claimed 87% compliance!). It found that the quitters were happier, p < 0.001. The effects were strongest for subjects who reported feeling high levels of envy when browsing Facebook. Sadly, there were strong selection effects since participants were recruited by voluntary sign-up through a link posted on Facebook.

IV.

To sum up: the raw material of Facebook is information about every player’s maneuvers in your local social constellation. Who’s up, who’s down, who’s making allies or enemies with whom, and crucially, who’s fucking whom. No wonder your crave this stuff, since the very reason our brains are so big is that they evolved in an arms race to move up in the tribe’s hierarchy of sexual success, at least according to one leading theory. This juicy raw material is then presented in maximally addictive packaging.

If Facebook is so bad for people, why don’t they stop using it? A first level of explanation is addictiveness. Just like the human in the comic above, it’s perfectly possible for organisms to repeatedly make choices that create more pain in their lives overall, if they receive carefully timed hits of pleasure and relief. At a second level, evolutionary biology tells us that winning the social rat race will be favoured by your genes even if it makes you miserable.

Third are network effects. I have tried to convince you that Facebook causes large harms, but I think it also creates a lot of value. So I would guess that quitting Facebook as an individual, while better than you probably think, could end up not being worth it for many people. But why then don’t they switch to a different social network, with all of the benefits and none of the costs? (Perhaps one that makes money from subscription fees, aligning its incentives with yours.) Because of network effects. It’s very hard to start a movement to a new network when everyone is already on Facebook.

This brings me to my proposed solution, which fights the evil in Facebook while retaining some of the good. I’ve been very happy with this trade-off; aggressively pushing back against Facebook use has improved my life. Here is what I’ve done.

V.

Facebook has billions of dollars and hundreds of shade-of-red-optimising engineers in this fight. That means you need to bring the big guns too.

The main theme here is: keep valuable sources of information, relentlessly cull everything else.

  • Install Delayed Gratification in your browser. The key thing here is that the 15-30 second delay gives you a chance to reconsider and close the tab, but since it’s only a delay you’re not tempted to circumvent the tool.
  • Install News Feed Eradicator on your browser. This one is great if you manage to make it to the point where you don’t regularly circumvent it. That took me a while, and during that time it wasn’t so useful, but now I haven’t used the news feed in over a year and it’s been the biggest improvement for me.
  • Install Cold Turkey (Self-control on Mac). This is some deep-level blocking. You can’t circumvent it short of reinstalling the operating system, I think. So start with short experiments. Eventually move on to giving yourself some daily windows to use Facebook, and block it during all other times. I’ve got it set up so Facebook is accessible between noon and 2pm, and between 6pm and 10pm. Once you’ve found a schedule you like, the next step is to lock the schedule5 for a month or more.
  • Install Stylish and then get the following styles for it:
  • If you just want to post a status update, do it through buffer instead of going to Facebook and giving it another opportunity to suck you in.
  • Delete the Facebook app from your phone6. This one should be obvious. (With buffer you can still post from your phone)
  • Set the messenger app notifications on your phone to always be silent. (How to do this on Android).
  • On the messenger app, whenever someone posts to “my day”, long-press their name and tap “hide”. Ruthlessness is key here.

I think that’s all I have. Good luck! ⚔

  1. Averages were less than 2 on the following scale: 1: Very rarely, 2: Rarely, 3: Sometimes, 4: Often, 5: Very often. 

  2. Typical quote: “every moment of potential boredom in your life—say, having to wait five minutes in line or sit alone in a restaurant until a friend arrives—is relieved with a quick glance at your smartphone.” 

  3. Tromholt, Morten. “The Facebook experiment: quitting facebook leads to higher levels of well-being.” Cyberpsychology, Behavior, and Social Networking 19.11 (2016): 661-666.

    Sagioglou, Christina, and Tobias Greitemeyer. “Facebook’s emotional consequences: Why Facebook causes a decrease in mood and why people still use it.” Computers in Human Behavior 35 (2014): 359-363.

    Verduyn, Philippe, et al. “Passive Facebook usage undermines affective well-being: Experimental and longitudinal evidence.” Journal of Experimental Psychology: General 144.2 (2015): 480. 

  4. For much more on envy, see Krasnova, Hanna, et al. “Envy on Facebook: A hidden threat to users’ life satisfaction?.” (2013): 1477-1491

  5. In Cold Turkey, it’s Settings -> Lock schedule 

  6. If you find yourself using the mobile web interface, log out, set it to never remember your username or password, and pick a really long and annoying password. 

Oxford Prioritisation Project Review

October 12, 2017

By Jacob Lagerros and Tom Sittler

To discuss this document, please go to the effective altruism forum.

Short summary

The Oxford Prioritisation Project was a research group between January and May 2017. The team conducted research to allocate £10,000 in the most impactful way, and published all our work on our blog. Tom Sittler was the Project’s director and Jacob Lagerros was its secretary, closely supporting Tom. This document is our (Jacob and Tom’s) in-depth review and impact evaluation of the Project. Our main conclusions are that the Project was an exciting and ambitious experiment with a new form of EA volunteer work. Although its impact fell short of our expectations in many areas, we learned an enormous amount and produced useful quantitative models.

Contents

  1. Short summary
  2. Executive summary
  3. The impact of the Project
    1. What were the goals? How did we perform on them?
      1. Publish online documents detailing concrete prioritisation reasoning (1)
      2. Prioritisation researchers (2)
      3. Training for earn-to-givers (3)
      4. Give local groups something to do (4)
      5. Local group epistemics (5)
      6. The value of information of the Oxford Prioritisation Project (6)
    2. Learning value for Jacob and Tom
    3. What were the costs of the project?
      1. Student time
      2. CEA money and time
  4. Main challenges
    1. Different levels of previous experience
      1. Heterogenous starting points
      2. Continued heterogeneity
    2. Team breakdown
  5. Should there be more similar projects? Lessons for replication
    1. Did the Project achieve positive impact?
      1. Costs and benefits
      2. Tom’s feelings
    2. Things we would advise changing if the project were replicated
      1. Less ambition
      2. Shorter duration
      3. Use a smaller grant if it seems easier
      4. Focus on quantitative models from the beginning
      5. More homogenous team
      6. Smaller team
  6. More general updates about epistemics, teams, and community
    1. The epistemic atmosphere of a group will be more truth-seeking when a large donation is conditional on its performance.
    2. A major risk to the project is people hold on too strongly to their pre-Project views
    3. A large majority of team applicants would be people we know personally.

Executive summary

A number of paths for impact motivated this project, falling roughly into two categories: producing valuable research (both to inform and to inspire) and empowering people (by making them more knowledgeable, by improving the local community…).

We feel that the Project’s impact fell short of our expectations in many areas, especially in empowering people but also in producing research. Yet we are proud of the Project, which was an exciting and ambitious experiment with a new form of EA volunteer work. By launching into this unexplored space, we have provided significant value of information for ourselves and the EA community.

We believe that we increased the prioritisation skill of team members only to a small extent (and concentrated on one or two people), much less than we hoped. We encountered severe challenges with a heterogeneous team, and an eventual team breakdown that threatened the existence of the Project.

On the other hand, we feel confident that we learned an enormous amount through the Project, including some things we couldn’t have learned any other way. This goes from team management under strong time pressure and leadership in the face of uncertainty, to group epistemics and quantitative-model-building skills.

Research-wise, we are happy with our quantitative models, which we see as a moderately useful contribution. We are less excited about the rest of our output, which consumed a lot of time yet feels less relevant.

We’d like to thank everyone on the team for making the Project possible, as well as Owen Cotton-Barratt and Max Dalton for their valuable support.

The impact of the Project

What were the goals? How did we perform on them?

In a document I wrote in January 2017, before the project started, I identified the following goals for the project:

  1. Publish online documents detailing concrete prioritisation reasoning
    This has direct benefits for people who would learn from reading it, and indirect benefits by encouraging others to publish their reasoning too. Surprisingly few people in the EA community currently write blog posts explaining their donation decisions in detail.

  2. Produce prioritisation researchers
    Outstanding participants of the Oxford Prioritisation Project may be made more likely to become future CEA, OpenPhil, or GiveWell hires.

  3. Training for earn-to-givers
    It’s not really useful for the average member of a local group to become an expert on donation decisions. Most people should probably defer to a charity evaluator. However, for people who earn to give and donate larger sums, it’s often worth spending more time on the decision. So the Oxford Prioritisation Project could be ideal training for people who are considering earning to give in the future.

  4. Give local groups something to do (see also Scott Alexander on “pushing vs pulling goals”) Altruistic societies or groups may often volunteer, organise protests, write a policy paper, fundraise, etc., even if the impact on the world is actually negligible. These societies might do these things just to gives their members something to do, create a group they can feel part of, and give the society leaders status. But within the effective altruism movement, many of these low-impact activities would appear hypocritical. People in movement building have been thinking about this problem. The Centre for Effective Altruism and other organisations have full-time staff working on local group outreach, but they have not to my knowledge proposed new “things to actually do”. The Project is a thing to do that is not outreach.

  5. Heighten the intellectual level of local groups
    *Currently most of the EA community is intellectually passive. Many of us have a superficial understanding of prioritisation, we mostly use heuristics and arguments from authority. By having more people in the community who actually do prioritisation (e.g. who actually understand GiveWell’s spreadsheets), we increase the quality of the average conversation. *

In addition to these give object-level goals, a sixth goal:

  1. The value of information of the Oxford Prioritisation Project
    Much of the expected impact of the Project comes from discovering whether this kind of project project can work, and whether it can be replicated in local groups around the world in order to get the object-level impacts many times over

Publish online documents detailing concrete prioritisation reasoning (1)

Quantity-wise, this goal was achieved. We published 38 blog posts, including individuals describing their current views, minor technical contributions to bayesian probability theory, discussion transcripts and, most importantly, quantitative models.

However, the extent to which our content engaged with substantial prioritisation questions, and was intellectually useful to the wider EA community, was far less than we expected. Overall, we feel that our substantial intellectual contribution were our quantitative models. Yet these were extremely speculative and developed in the last few weeks of the Project, while most of the preceding work was far less useful.

Regarding “direct benefits for people who would learn from reading” our research: this is very difficult to evaluate, but our tentative feeling was that this was lower than we expected. We received less direct engagement with our research on the EA forum than we expected, and we believe few people read our models. Indirectly, the models were referenced in some newsletters (for example MIRI’s). However, since our writings will remain online, there may be a small but long-lasting trickle of benefits into the future, from people coming across our models.

Though we did not expect to break major new conceptual ground in prioritisation research, we believed that the EA community provides too many ‘considerations’-type and too few ‘weighing’-type1 arguments. Making an actual granting decision would hopefully force us to generate ‘weighing’-type arguments, and this was a major impetus for starting the Project. So, we reasoned, even though we might not go beyond the frontier of prioritisation research, we could nonetheless be useful to people with the most advanced EA knowledge, by producing work that helps aggregate existing research into an actionable ranking. We think we were moderately successful in this respect, thanks to our quantitative models.

Prioritisation researchers (2)

This is technically too early to evaluate, but we are pessimistic about it: we do not think the project caused any member who otherwise would not have considered it to now consider prioritisation research as a career2. This is based on impressions of, and conversations with, members.

This goal was a major factor in our decisions of which applicants to admit to the project. We selected several people who had less experience with EA topics, but who were interested and talented, in order to increase our chance of achieving this sub-goal. In retrospect, this was clearly a mistake, since getting these people up to speed proved far more difficult than we expected, and we still don’t think we had a counterfactual impact on their careers. Looking back, we recognise that there was some evidence for this that we interpreted incorrectly at the time, so we made a mistake in expectation, but not an obvious one.

As often, we suspect the impact in this category was extremely skewed across individuals. While we think we had no impact on most members, we think there is a small (<5%)3 chance that we have counterfactually changed the interests and abilities of one team member, such that this person will in the future work in global priorities research.

Training for earn-to-givers (3)

This was not achieved, for two reasons. While at the outset, we believed that there were about 3 team members who might consider earning to give in the future, by the end we think only one of them has a >50% chance of choosing that career path. So even though we provided an opportunity to practice prioritisation thinking, and especially quantitative modelling, we don’t think we had an impact by improving the decisions of future earn-to-givers. Regardless, we believe that this practice failed to increase the prioritisation skill of our team (see previous sections), so we wouldn’t have had impact here anyway.

Give local groups something to do (4)

This goal was achieved. We designed and implemented a new form of object-level group engagement that could theoretically be replicated in other locations. However, it’s debatable whether the cost:benefit ratio of such replications is sufficiently high. See the section: “Should there be more similar projects? Lessons for replication”

Local group epistemics (5)

This goal was not achieved.

One impulse for starting the project was a frustration about the lack of in-depth, object-level intellectual activity in the local student EA community, which we (Jacob and Tom) are both part of. Current activities look like:

  • Attending and organising introductory events
  • Social events, where conversations focus on:
    • Discussing new developments in EA organisations
    • Philosophy, especially ethics
    • ‘Considerations’-type arguments, with a special focus on controversial, or extreme arguments. Much repetition of well-known arguments.
  • Fundraising

We wanted to see more of:

  • Discussion of ‘weighing’-type arguments, with a focus on specific, quantifiable claims
  • Instead of repetition of known considerations, discussion of individual people’s actual beliefs on core EA questions, and what would change their minds. Conversations at the frontier of people’s knowledge.
  • Knowing when people change their minds
  • Individuals conducting shallow (3-20 hour), empirical or theoretical research projects, and publishing them online

We did not believe that the Project alone could have achieved any of these changes. But we were optimistic that it would help push in that direction. We thought members of the local group would become excited about the Project, discuss its technical details, and give feedback. We also thought that team members would socialise with local group members and discuss their work, and hoped that the Project would serve as a model inspiring other more intellectually focused activities. None of these happened.

The local community was largely indifferent to the Project, as evidenced by an attendance of no more than 10 people at our final decision announcement. Throughout the Project, there was little interaction between the community and team members. In retrospect, we think we could have done more to facilitate and encourage such interaction. But we were already very busy as things were, so this would have needed to trade off against another of our activities.

Overall we clearly didn’t achieve this goal.

The value of information of the Oxford Prioritisation Project (6)

This goal was arguably achieved, in the sense that the Project produced several unexpected results which carry important implications for future projects. The information gained included:

Learning value for Jacob and Tom

The Project was a huge learning experience for us both, and especially strongly for Tom. This was the first time Tom led a team. Running a group of prioritisation researchers was a very different task from academic projects or internships we had been involved with in the past.

Our guess is that between 25% and 75% of the value created by the Project was through our becoming wiser and more experienced. This admittedly subjective conclusion relies on a number of difficult-to-verbalise intuitions, to the effect that we came out of the project knowing more about our own strengths and weaknesses, and how people and groups work. Since we both plan to give a substantial weight to altruistic considerations in our career decisions, this could be impactful.

Throughout the Project, Tom kept a journal of specific learning points, mostly for his own benefit but also for others who would potentially be interested in replicating the Project. He originally planned to turn these notes into a well-structured and detailed retrospective, but completing this work now looks as though it would not be worth the time cost. Instead he is publishing his notes with minimal editing here. These files reflect Tom’s views at the time of writing (indicated on each document); he may not endorse them in full anymore. They cover the following topics, in alphabetical order:

What were the costs of the project?

Student time

Tom tracked 308 focused pomodoros (~ 150 hours) on this project, and estimates that the true number of focused hours was closer to 500. Tom also estimates he dedicated at least another 200 hours of less focused time to the Project.

Jacob estimates he spent 100 hours on the Project.

CEA money and time

We would guess that the real costs of the £10,000 grant were low. At the outset, the probability was quite high that the money would eventually be granted to a high-impact organisation, with a cost-effectiveness not several times smaller than CEA’s counterfactual use of the money4. In fact, the grant was given to 80,000 Hours.

The costs of snacks and drinks for our meetings, and logistics for the final event were about £500, covered by CEA’s local group budget.

We very tentatively estimate that Owen Cotton-Barratt spent less than 5 hours, and Max Dalton about 15 hours, helping us with the Project over the six months in which it ran. We are very grateful to both for their valuable support.

Main challenges

We faced a number of challenges; we’ll describe only the biggest ones, taking them in rough chronological order.

Different levels of previous experience

Heterogenous starting points

Some team members were experienced with advanced EA topics, while others were beginners with an interest in cost-effective charity. This was in part because we explicitly aimed to include some less experienced team members at the recruitment stage (see above, “2: Prioritisation researchers”). But an equally important factor was that, before we met them in person, we overestimated some team members’ understanding of prioritisation research.

We selected the team exclusively with an online application form. Once the project started, and we began talking to them in person, it quickly became clear that we had overestimated many team members’ familiarity with the basic arguments, concepts, and stylised facts that constitute the groundwork of prioritisation work. Possible explanations for our mistake include:

  • Typical mind fallacy, or insufficient empathy with applicants. Because we knew much more, we unconsciously filled gaps in people’s applications. For example, if someone was vaguely gesturing at a concept, we would immediately understand not only the argument they were thinking of, but also many variations and nuances of this argument. This could turn into believing that the applicant had made the nuanced argument.
  • Wishful thinking. We were excited by the idea of building a knowledgeable team, so we may have been motivated to ignore countervailing evidence.
  • Underestimating applicant’s desire and ability to show themselves in the best light. We neglected to account for the fact that applicants could carefully craft their text to emphasise their strengths, and mistakenly treated their applications more as if they were transcripts of an informal conversation.
  • Insufficiently discriminative application questions. Tom put significant effort into designing a short but informative application. Applicants were asked to provide a CV, a Fermi estimate of the total length of waterslides in the US, and a particular research question they expected to encounter during the project, along with their approach for answering it. After the fact, we not only think that these specific questions were suboptimal5, but also see clear ways the application process as a whole could have been done very differently and much better (see section “Smaller team” below). We struggle to think of evidence for this that we interpreted incorrectly at the time, so this may still have been the correct decision in expectation.

Continued heterogeneity

A heterogenous team alone would not have been a major problem if we hadn’t also dramatically overestimated our ability to bring the less experienced members up to speed.

We had planned to spend several weeks at the beginning of the project working especially proactively with these team members to fill remaining gaps in their knowledge. We prepared a list of “prioritisation research concepts” and held a rotating series of short presentations on them and we gave specific team members relevant reading material. We expected that team members would learn quickly from each other and “learn by doing”, from trying their hand at prioritisation.

In fact, we made barely any progress. For all except one team member, we feel that we failed to bring them substantially closer to being able to meaningfully contribute to prioritisation research: everyone remained largely at their previous levels, some high, some low6.

This makes us substantially more pessimistic about the possibility of fostering EA research talent through proactive schemes rather than letting individuals learn organically. (EA Berkeley seemed more positive about their student-led EA class, calling it “very successful”, but we believe it was many times less ambitious). We feel more confident that there is a basic global prioritisation mindset, which is extremely rare and difficult to change by certain kinds of outside intervention, but essential for EA researchers.

Team breakdown

We were struggling to create a cohesive team where everyone was able to contribute to the shared goal of optimally allocating the £10,000, and was motivated to do so. Meanwhile, some team members became less engaged, perhaps as a result of the lack of visible signs of progress. Meeting attendance began to decline, and the problem worsened until the end of the Project, at which point four out of nine team members had dropped out. After the project only 3 out of 7 team members took the post-project survey. The results have informed our estimates throughout this evaluation.

While understanding that the dropout rate for volunteer projects is typically high, we still perceived this as a frustrating failure. An unexpected number of team members encountered health or family problems, while others simply lost motivation. Starting around halfway through the Project, the majority our efforts were focused on averting a complete dissolution of the team, which would have ended the Project.

As a result, we decided to severely curtail the ambition of the Project by choosing our four shortlisted charities ourselves, without team input, and according to different criteria than those we had originally envisioned7. We had been planning to shortlist the organisations with the highest expected impact, as a team, in a principled way. Instead we (Jacob and Tom) took into account our hunches about expected impact as well as the intellectual value of producing a quantitative model of a particular organisation, in order to arrive at a highly subjective and under-justified judgement call.

We are satisfied with this decision; we believe that it allowed us to create most of the value that could still be captured at that stage, given the circumstances. With a smaller team and a more focused goal, we produced the four quantitative models which led to our final decision.

Should there be more similar projects? Lessons for replication

Did the Project achieve positive impact?

Costs and benefits

See above, “What were the costs of the project?”.

Tom’s feelings

It’s important to make a distinction between the impacts of the Project from a purely impartial perspective, and the impacts according to my values, which give a much larger place to me and my friends’ well-being.

Given that the object-level impacts (see above, “The impact of the Project”) were, in my view, low, effects on Jacob’s and my personal trajectories (our academic performance, well-being, skill-building) could be important, even from an impartial point of view.

Against a counterfactual of “no Oxford Prioritisation Project” (say, if the idea had not been suggested to me, or if we had not received funding), I would guess with low confidence that the Project had negative (impartial) impact. Without the Project, I would have spent these 6+ months happier and less stressed, with more time to spend on my studies. I plan to give significant weight to altruistic considerations in my career decisions, so this alone could have made the project net-negative. In addition, I believe I would have spent significant time thinking about object-level prioritisation questions on my own, and published my thoughts in some form. On the other hand, I learned a lot about team management and my own strengths and weaknesses through the Project. All things considered, I suspect that the Project was a little bit less good than this counterfactual.

When it comes to my own values, I’m slightly more confident that the Project was negative against this counterfactual. If offered to go back in time to re-experience the same events, I would probably decline.

Both impartially and personally speaking, there are some nearby counterfactuals against which I am slightly more confident that the Project was negative. These mostly take the form of developing quantitative models with two or three close friends, in an informal setting, and iterating on them rapidly, with or without money to grant. However, these counterfactuals are unlikely; at the time I didn’t have the information to realise how good they would be.

Going back now to the impartial perspective: despite my weakly held view described above, there are several scenarios for positive impact from the Project which I find quite plausible. For example, I would consider the Project to have paid for itself relative to reasonable counterfactuals if:

  • what I learned from the Project helps me improve a major career decision
  • the team member mentioned above ends up pursuing global priorities research
  • we inspire another group to launch a project inspired by our model, and they achieve radically better outcomes

Things we would advise changing if the project were replicated

Less ambition

Global prioritisation is very challenging for two reasons. First, the search space contains a large number of possible interventions and organisations. Second, the search space spans multiple very different focus areas, such as global health and existential risk reduction.

The Project aimed to tackle both of these challenges. This high level of ambition was a conscious decision; we were excited by the lack of artificial restrictions on the search space. Though we had no unrealistic hopes of finding the truly best intervention, or of breaking significant new ground in prioritisation research, we still felt that an unrestricted search space would make the task more valuable, it made it feel more real, and less like a student’s exercise. We implicitly predicted that other team members would also be more motivated by the ambitious nature of the Project, but this turned out not to be the case. If anything, motivation increased after we shifted to less ambitious goals.

Given that our initial goal proved too difficult, even given the talent pool available in Oxford, we would recommend that potential replications restrict the search space to eliminate one of the two challenges. This gives two options:

  • Prioritisation among a pre-established shortlist of organisations working in different focus areas. (This is the option we chose towards the end of the Project).
  • Prioritisation in a (small) focus area, such as mental health or biosecurity.

We would weakly recommend the former rather than the latter, because we already tried it with moderate success, and because it allows starting immediately with quantitative models of the shortlisted organisations (see below, “Focus on quantitative models from the beginning”).

Shorter duration

Given the circumstances, we believe the Project was too long. A shorter project means less is lost if the Project fails, and the closer deadline could be motivating.

We would recommend one of two models:

  • 1-month project with meetings and work sessions at intervals
  • 1 week retreat, working on the project full-time

Our most important work, building the actual quantitative models and deciding on their inputs, was done in about this amount of time. The large, early part of the project centering around learning and searching for candidate organizations, was marginally not very useful (see e.g. section “Continued heterogeneity” above).

Use a smaller grant if it seems easier

We initially believed that the relatively large size of the grant (£10,000) would motivate team members not only to work hard, but, more importantly, to be epistemically virtuous – that is, to focus their efforts on actions to improve the final allocation rather than ones that felt fun or socially normative. We now believe that this effect is small, and does not depend much on the size of the grant. For more information, see section “More general updates about epistemics, teams and community”.

Where the £10,000 figure may have helped is through getting us more and better applicants by signalling serious intent. But we are very uncertain about this consideration and would give it low weight.

Overall, our view is that the benefits of a larger grant size are quite small, relative to other strategic decisions, and that a £1000-2000 grant might have achieved nearly all the same benefits8. On the other hand, the true monetary cost of the grant is low (see above, “CEA money and time”). Therefore our tentative recommendation is: a £10,000 grant may still be worth getting, but don’t worry too much about it. Be upfront to the funder about the small effect of the money, and consider going ahead anyway if they give you less.

Focus on quantitative models from the beginning

One of the surprising updates from the Project was that we made much more progress, including the less experienced team members, once we began working on explicit quantitative models of specific organisations. (See section “More general updates about epistemics, teams and community”.) So we would recommend starting with quantitative models from the first day, even when they are very simple. This may sound surprising, but we urge you to try it9.

More homogenous team

We severely overestimated the degree to which we could bring less experienced members of a heterogeneous team up to speed (see above, “Different levels of previous experience”). So we would recommend a homogenous team, with all members meeting a high threshold of experience.

Smaller team

We started with a model where, very roughly, people are either good, conscientious team members, or they get de-motivated and drop out of the team. Under this model, dropouts are not very costly. You lose a team member, but you also lose overhead in dealing with them. So that is a reason to have a bigger team to start with. However, what we actually observed is that demotivated people don’t like working, but the one thing they dislike more is dropping out. Without speculating about the underlying reasons, a common strategy while demotivated (that we ourselves have also been guilty of in the past) is to do the minimal amount of work required to avoid dropping out. Hence, future projects should start with a small team of fully dedicated members rather than a larger team hoping to provide some “buffer” should a team member drop out.

Having a smaller team also means expending more effort selecting team members; doing so should also help with the problem raised in the above subsection. Although it seemed to us that we had already put a lot of resources into finding the right team, we now see that much more could have been done here, for example:

  • a fully-fledged trial period; consisting of a big, top-down managed team from which the most promising few members are then selected to go on (although we worry this could introduce negative anti-cooperative norms and an unpleasant atmosphere).
  • making the application process the following: candidates build a quantitative model, receive feedback, and go back to submit a second version; they are then evaluated on the quality of their models

More general updates about epistemics, teams, and community

We had several hypotheses about how a project like this would affect the members involved, as well as the larger effective altruism community in which it took place. Here are some of our updates. Of course, the Project is only a single data point. Nonetheless, we think it still carries evidence in the same sense that, say, an elaborate field-study might be important without being an RCT.

The epistemic atmosphere of a group will be more truth-seeking when a large donation is conditional on its performance.

Update: substantially lower credence

There are at least two reasons why human intellectual interaction is often not truth-seeking. First, there are conflicting incentives. These can take the form of both internal, cognitive biases or external, structural incentives.

To some extent the money helped realign incentives. The looming £10,000 decision provided a Schelling point that could be used to end less useful discussions or sub-projects without the violation of social norms that this often entails otherwise.

Nonetheless, the atmosphere also suffered from many common cognitive biases. These includes things like deferring too much to perceived authorities, and discussing topics that one likes, feels comfortable with or know many interesting facts about. It is possible that the kind of disciplining “skin in the game” effect we were hoping for failed to occur since the grant was altruistic, and of little relevance to team members personally. In response to this, team members also pledged a secret amount of their own money to the eventual recipient (with pledges ranging from £0 to £200)10. It is difficult to disentangle the consequences of this personal decision from the donation at large, but it might still have suffered the same altruistic problem. Given how insensitive people are to astronomical differences in charity impact in general, the choice of which top charity one’s donations go to might not make a sufficient psychological difference to offset other incentives.

Second, truth-seeking interaction is partly difficult not because of misaligned incentives, but because it requires certain mental skills that have to be deliberately trained (see also “Different levels of previous experience”). Finding the truth, in general, is hard. For example, we strongly encouraged team members to center discussions around cruxes, but most often the cruxes members gave were not actually things that would change their minds about X, as opposed to generic evidence regarding X or evidence that would clearly falsify X but that they strongly never expected to be found. This was true for basically all members of the Project, often including ourselves.

Instead of the looming, large donation, the epistemic atmosphere seems to have been positively impacted by things like guiding disagreements in relation to which quantitative model input they would change, and working within a strict time limit (e.g. a set meeting ending). For more on this, see the section “Focus on quantitative models from the beginning”.

A major risk to the project is people hold on too strongly to their pre-Project views

Update: lower credence

We nicknamed this the “pet charities” problem: participants start the project with some views about which grantees are most cost-effective, and see themselves as having to defend that view. They engage with contrary evidence, but only to argue against it, or to find some reason their original grantee is still superior.

This was hardly a problem, but something in the vicinity was. While people didn’t strongly defend a view, this was mostly because they didn’t feel comfortable engaging with competing views at all. Instead, participants strongly preferred to continue researching the area they already knew and cared most about, even as other participants were doing the same thing with a different area. Participants’ different choice of area implied disagreeing premises, but they proved extremely reluctant to attempt to resolve this disagreement. We might call this the “pet areas” problem or the problem of “lower bound propagation”. (Because participants may informally be using the heuristic: “consider only interventions better than X”, with very different Xs).

Another problem that proved bigger than pet charities was over-updating on authority opinion (such as Tom’s current ranking of grantees). We see this as linked with the lack of comfort or confidence mentioned above.

A large majority of team applicants would be people we know personally.

Update: false

We’re both socially close to the EA community in Oxford. We expected to more or less know all applicants personally: if someone was interested enough in EA to apply, we would have come across them somehow.

Instead, a large number of applications were from people we didn’t know at all, a few of which ended up being selected for the team. We update that, at least in Oxford, there are many “lurkers”: people who are interested in EA, but find the current offerings of the local group uninspiring, so that they don’t get involved at all. There appear to be many talented people who are only prepared to work on an EA project if it stands out to them as particularly interesting. Although we generally would advise caution, this could be one reason to be more optimistic about replications of the Project.

  1. A useful distinction is between ‘considerations’-type arguments and ‘weighing’-type arguments. Considerations-type arguments contain new facts or reasoning that should shift our views, other things being equal, in a certain direction. Sometimes, in addition to the direction of the shift, these arguments give an intuitive idea of its magnitude. Weighing-type arguments, on the other hand, take existing considerations and use them to arrive at an all-things-considered view. The magnitude of different effects is explicitly weighed. Considerations-type arguments involve fewer questionable judgement calls and more conceptual novelty, which is one reason we believe they are oversupplied relative to weighing-type arguments. While Tom believed this sufficiently strongly to contribute to motivating him to launch the Project, we both agree that this is something reasonable people can disagree about. On a draft of this piece, Max Dalton wrote: “They also tend to produce shifts in view that are less significant, both in the sense of less revolutionary, and in the sense of the changes tending to have less impact. This is partly because weighing-type arguments are more commonly used in cases where you’re picking between two good options. Because I think weighing-type arguments tend to be lower-impact, I’m not sure I agree with your conclusion. My view here is pretty low-resilience.” 

  2. To be clear, however, there are team members who are seriously considering that career path. 

  3. Tom’s guess: 15% chance this person goes into prioritisation research. Conditional on him or her doing so, a ~30% chance we caused it. 

  4. We also had a safeguard in place to avoid the money being granted to an obviously poor organisation, in case the project went dangerously off the rails: Owen Cotton-Barratt had veto power on the final grant (although he says he would have been reluctant to use it). 

  5. The Fermi question was too easy in that it didn’t help discriminate between top applicants, and that the research proposal question was too vague and should have required more specifics. 

  6. Jacob adds: “It should be emphasized that we are disregarding any general intellectual progress here. It is plausible that several team members learned new concepts and practiced critical thinking, and as a result grew intellectually from the project – just not in a direction and extent that would help with global prioritisation work in particular.” 

  7. More goal flexibility, earlier on, would have been good. We had ambitious goals for the Project, which we described publicly online, and in conversations with funders and others we respect. In attempting to achieve these goals, we believe we were quite flexible and creative, trying many different approaches. But we were too rigid about the (ultimately instrumental) Project goals. Partly, we felt that changing them would be an embarrassment; we avoided doing so because it would have been painful in the short run. But it seems clear now that we could have better achieved our terminal goals by modifying the Project’s goals. 

  8. There is significant disagreement about this among people we’ve discussed it with, on and off the team. We note that being more heavily involved in the Project seems to correlate with believing that a low grant would have achieved most of the benefits. Outsiders tend to believe that more money is good, while we who led the Project believe the effects are small. (A middle ground of £5000 has tended to produce some more agreement.) People we respect disagree with us, which you should take into account when forming your own view. 

  9. Jacob adds: “One of our most productive and insightful sessions was when we spent about six hours deciding the final inputs into the models. It is plausible that this single session was equally intellectually productive and decision-guiding as the first few weeks of the project combined.” 

  10. A team member comments that “skin in the game”-effects may encourage avoiding bad outcomes more than working extra hard for good outcomes: “Subjectively, I felt it as a constant ‘safety net’ to know that we’d most likely give to a good charity that ends up being in a certain range of uncertainty that the experts concede, and that it was almost impossible for us to blow 10,000GBP on something that would be anywhere near low impact”. 

Extrême pauvreté: les surprenantes lois de puissance du don efficace

August 27, 2017

Inspiré d’un billet de Jeff Kaufman

J’ai pour la première fois découvert l’altruisme efficace à travers le plaidoyer de Peter Singer pour une approche stratégique à la lutte contre la pauvreté.

S’il me fallait résumer la force de ses arguments en une seule image, ce serait avec la juxtaposition ces deux graphiques.

1

worldincome_nolabel
Source: Doing Good Better1

Le premier graphique montre la distribution mondiale du revenu. En abscisse, les plus pauvres sont à gauche et les plus riches à droite. Cette distribution est très inégale, de telle façon que les plus riches sont extrêmement riches alors que la majorité de la population mondiale est relativement très pauvre. On appelle lois de puissance2 ce genre de distribution très asymétrique. On leur oppose des distributions moins extrêmes, telles que la distribution normale. La taille chez les humains suit une distribution normale : les plus grands humains ont une taille au plus 60% supérieure à la moyenne. Mais les humains les plus riches sont des centaines de fois plus riches que la moyenne. Les 10% les plus riches de la planète ont donc une capacité d’aider énormément les plus pauvres.

Mais qui est cette élite mondiale d’oligarques? Vous en faites probablement partie. J’ai pris soin d’effacer l’échelle de l’axe des ordonnées. Tentez de deviner à quel centile vous vous trouvez. Une fois que vous avez écrit votre réponse, regardez le graphique complet. Sans tricher, auriez-vous deviné à quelle partie de la courbe vous appartenez?3.

Ce résultat est peu intuitif: vous n’avez pas l’impression d’être nanti, mais vous faites partie des personnes les plus riches au monde, ce qui vous donne une opportunité d’aider énormément les plus pauvres. Ceci est la première des surprenantes lois de puissance de l’altruisme efficace.

2

distribution_dcp2
Source: DCP2.

Le second graphique montre le rapport coût-efficacité de 108 interventions de santé dans les pays en développement. Les données proviennent de la base de données DCP2, qui répertorie le coût de chaque intervention et son bénéfice en termes d’années de vie pondérées par la qualité, ou quality-adjusted life-year (QALY). La QALY est un outil qui permet de comparer différentes interventions de santé. Une année en pleine santé vaut 1 QALY. Une année d’une personne infectée par le VIH vaut 0,5 QALY, une année d’une personne atteinte de surdité 0,78 QALY4. Une intervention qui soigne la surdité d’une personne en parfaite santé pour 10 ans vaut donc (1-0,78)*10=2,2 QALY5.

Ces calculs de QALYs servent avant tout d’exemple illustratif. Ils soulignent l’importance de la quantification et l’utilité d’une mesure d’impact standardisée. L’altruisme efficace ne se limite pourtant pas aux calculs de QALYs, loin s’en faut. Dès qu’il s’agit de comparer des interventions en-dehors du domaine de la médecine ou de la santé publique, d’autres méthodes s’imposent, et sont fréquemment utilisées.6

Comme on peut le voir sur le graphique, les interventions de santé les plus efficaces sont non pas 30% plus efficaces que la moyenne, ni même 3 fois plus efficaces, mais bien des dizaines de fois plus efficaces. L’intervention la plus efficace dans la base de données DCP2 produit 15 000 fois plus de bénéfice que la moins efficace, et 60 fois plus que l’intervention médiane. De plus, il faut imaginer le graphique comme s’il était coupé à droite, avec des barres bien plus hautes qui ne sont pas montrées ici. En effet, au-delà de la base DCP2, les interventions de santé les plus efficaces sont particulièrement exceptionnelles : l’éradication de la variole en 1979 a prévenu plus de 100 millions de morts, pour un coût de 400 millions de dollars7.

Cela aussi est contre-intuitif. Les différentes ONG travaillant dans le domaine de la santé se ressemblent toutes, et peuvent sembler interchangeables. Mais en réalité, il est crucial de choisir la plus efficace. Si l’on choisit une ONG qui met en place de manière compétente une bonne intervention qui pourtant n’est pas exceptionelle, l’on risque de perdre plus de 90% de la valeur potentielle de son don.

Ainsi, ces deux graphiques8 résument l’importance d’un altruisme efficace. Ce mouvement se fonde sur l’idée que les chiffres ne sont pas décoratifs: lorsque l’on observe des ratios aussi extrêmes que ceux-ci, cela est un appel à agir.

  1. Doing Good Better, William MacAskill. Les données utilisées par l’auteur pour produire ce graphique proviennent de plusieurs sources. Entre le premier et le 21ème centile des plus riches, les données proviennent d’enquêtes auprès des ménages apportées par Branko Milanovic (voir par exemple Milanovic 2012). Pour les 73% les plus pauvres, les données proviennent de l’initiative PovcalNet de la Banque Mondiale. Pour les 0.1% les plus riches, le chiffre provient de The Haves and the Have-Nots: A Brief and Idiosyncratic History of Global Inequality, Branko Milanovic. 

  2. Voir Wikipédia, Loi de puissance

  3. L’application de Giving What We Can peut vous donner votre centile exactement. 

  4. Organisation Mondiale de la Santé 

  5. Les pondérations expriment la moyenne des préférences exprimées par les patients. Il y a plusieurs méthodes pour les mesurer, mais la plus commune consiste à demander aux patients de choisir s’ils préfèrent rester en vie avec une certaine maladie pendant une période donnée, ou vivre moins longtemps mais dans un état de santé parfaite (Torrance, George E. (1986). “Measurement of health state utilities for economic appraisal: A review”. Journal of Health Economics. 5: 1–30). Cette méthode présente certains désavantages, mais les systèmes de santé sont de toute façon obligés de hiérarchiser les maladies pour faire le meilleur usage de leur budget limité, et le QALY est pour l’instant l’outil de plus utilisé.

    Parmi les désavantages, il peut par exemple y avoir des biais dans les préférences exprimées. Si l’on interroge ceux qui n’ont pas la maladie, ils pourraient surestimer son impact, car le fait de poser la question donne une prééminence psychologique à la maladie. Le fait d’y penser lorsque la question est posée donne l’impression que la maladie va déterminer notre qualité de vie alors qu’en réalité la qualité de vie est déterminée par de multiples composantes. Mais l’inverse pourrait aussi se produire, si les participants à l’expérience ne réalisent pas à quel point une maladie est douloureuse avant d’en avoir souffert. Demander aux patients atteints de la maladie pourrait aussi mener à des biais dans les deux sens. Le fait de poser la question rappelle aux patients qu’ils vivent avec cette maladie et leur demande d’imaginer une vie en bonne santé, ce qui pourrait les amener à surestimer son influence sur leur qualité de vie. Au contraire, le fait d’avoir une maladie incurable pourrait pousser le patient à positiver sa situation pour ne pas perdre espoir, alors qu’une personne en bonne santé serait plus lucide. Au-delà de ces questions de biais cognitifs, certains philosophes considèrent que c’est l’expérience hédonique et non les préférences (même parfaitement dé-biaisées) qui est déterminante moralement. Enfin, le QALY ne permet généralement pas de dire qu’il est meilleur de mettre fin à une vie, même si les souffrances sont extrêmes, car les QALY négatifs sont rarement utilisés. Pour une discussion critique voir:

    Prieto, Luis; Sacristán, José A (2003). “Problems and solutions in calculating quality-adjusted life years (QALYs)” . Health and Quality of Life Outcomes. 1: 80. (archive)

    Broome, John (1993). QALYs. Journal of Public Economics. Volume 50, Issue 2, February 1993, Pages 149-167

    Mortimer, D.; Segal, L. (2007). “Comparing the Incomparable? A Systematic Review of Competing Techniques for Converting Descriptive Measures of Health Status into QALY-Weights”. Medical Decision Making. 28 (1): 66–89. 

  6. Voir par example l’Oxford Prioritisation Project, la comparaison de causes de 80,000 Hours, ou ce billet de Michael Dickens. 

  7. Toby Ord, The moral imperative towards cost-effectiveness (archive

  8. L’on pourrait se demander pourquoi nous rencontrons de telles distributions. Pour quelle raison les interventions de santé ne sont-elles pas distribuées normalement? Sans doute car l’efficacité d’une intervention est le résultat de la multiplication (plutôt que de la somme) d’un grand nombre de petits facteurs indépendants. Voir Wikipédia Loi log-normale

On the experience of confusion

August 6, 2017

I recently discovered something about myself: I have a particularly strong aversion to the experience of confusion. For example, yesterday I was looking into the relationship between common knowledge of rationality and Nash equilibrium in game theory. I had planned to spend just an hour on this, leisurely dipping into the material and perhaps coming out with a clarified understanding. Instead, something else happened. I became monomanically focused on this task. I found some theorems, but there was still this feeling that things were just slightly off, that my understanding was not quite right. I intensely desired to track down the origin of the feeling. And to destroy the feeling. I grew restless, especially because I was making some progress: I wasn’t completely stuck, it felt like I must be on the cusp of clarity. The first symptom of this restlessness was skipping my pomodoro breaks, usually a sure sign that I am losing self-control and will soon collapse into an afternoon nap. The second smyptom was to develop an unhelpful impatience, opening ever more new tabs to search for the answer elsewhere, abandoning trains of thought earlier and earlier. In the end I didn’t have time to do any of the other work I had planned that day!

This happens to me about once a week.

I don’t know if this description was at all effective at communicating my experience. It’s something far more specific than simple curiosity. I’m fine with not knowing things. I’m even happy to have big gaping holes in my knowledge, like a black rectangle on an otherwise-full map of the city. Provided the rectangle has clear boundaries and I know that, as a matter of principle, I could go explore that part of the city, and if I made no mistakes, I could draw the correct map.

Here’s another way of putting this. I’m not at all bothered if a tutor tells me: “The proof of this theorem, in appendix 12.B., relies on complicated maths. You may never understand it. But you have a good grasp of what the theorem states.” I have a picture in my head like:

I am infuriated if a tutor tells me: “When there are sticky prices, equation A looks like this.” What do we mean by sticky prices? And how does the equation follow? Tutor: “Here’s the mathematical statement of sticky prices. It involves completely different objects than equation A. Also, here’s a vague, hand-wavey intuition why the two are related.”

The problem here is not that there’s an empirical fact that I don’t know, or a proof step I don’t understand. I don’t even have a label to put on my confusion. It’s not that I don’t see how the conclusion follows, it’s that I don’t see how it could follow. It’s not that the map has dark patches. I don’t even know if I’m holding the map rightside up or upside down, and the map is written in cyrillic.

In school, I used to make myself unpopular by pursuing these lines of inquiry as far as they would let me, leading to long back-and-forths with my teachers. These conversations were often unproductive. Sometimes the implication was that I should just learn the words of the vague hand-wavey intuition as a kind of password. Naturally, I resented this. Both possibilities were enraging: either the educators themsleves believed that the words could pass for real understanding, or they just expected me to shut up and learn the password. Sometimes I was gently chided (or complimented?) for my curiosity, my apparent desire to know EVERYTHING, not to rest until the whole map was filled in. This too felt wrong: I’m not complaning about a small corner of the map left unfilled. The entire eastern part of the map is in cyrillic!

Although I hope that some people reading this might relate to my experiences, I suspect that I am out of the ordinary in the strength of my aversion to confusion. I have long thought that any of the success I’ve had in my academic pursuits was not due to intelligence, but to my refusal of explanations that felt unsatisfying in some sublte way. I say this not to humble-brag: I have good evidence that I am less intelligent than many of my peers. In school everyone used to participate in this maths competition every year. The questions required clever problem-solving, I consider them pretty close to an IQ test. They were completely different from our maths exams, which prized definitional clarity and rewarded practice. I was around the class median at the competition, but among the best at the exams. As another piece of evidence, I am seriously terrible at mental arithmetic: I routinely get simple sums wrong at the bakery, and not for lack of trying!

So I had long been aware that there was something different about how I asked questions, but only recently did I acquire the language to describe it accurately. I used to think it was “intellectual curiosity”, but as we have seen, “visceral aversion to even slight confusion” would be a more accurate label. I loathe contradiction and dissonance, not ignorance or uncertainty.

I have already talked a bit about how I think I’ve benefitted from this habit of thought. I think it may be one thing that people who get really into analytic philosophy have in common. It also comes with costs, mostly in the form of getting sucked into a productivity-wrecking hole of confusion, like with the game theory example. It would be much more rational to remain clam and composed, let the confusion go for a day or two, and then decide whether it makes sense to allocate more time to it. Part of why I’m getting sucked in so much, I suspect, is because I fear that if I stop, I will let the confusion slip by. I find that thought distressing. Perhaps it’s because I don’t want to forget I was confused, later remember the password, and adopt the confused knowledge that comes with it.

One way to help solve this is to keep a list of everything I am confused about. Then I can set a time limit on my intellectual escapades, and if I’m still confused by the end, I can write it down. Even if I never return to it, it feels much more satisfying to have a degree of meta-clarity (clarity about what I’m confused about) than to let confusion slip into a dark corner of my mind.

How much donation splitting is there, and should you split?

August 3, 2017

Cross-posted to the effective altruism forum.

Table of contents

  1. Table of contents
  2. Summary
  3. Many of us split
    1. Individual examples
    2. EA funds data
      1. Naive approach: distribution of allocation percentages
      2. Less naive apprach: weighted distribution of allocation percentages
      3. Best approach: user totals
    3. Other data
  4. Arguments for splitting
    1. Empirical uncertainty combined with risk aversion
    2. Moral uncertainty
    3. Diminishing returns
    4. Achieving a community-wide split
      1. Cooperation with other donors
      2. Lack of information
    5. Remaining open-minded or avoiding confirmation bias
    6. Memetic effects
  5. Recommendation
  6. Appendix: R code

Summary

Many aspiring effective altruists report splitting their donations between two or more charities. I analyse EA funds data to estimate the extent of splitting. Expected utility reasoning suggests that for small donations, one should never split, and always donate all the money to the organisation with the highest expected cost-effectiveness. So prima facie we should not split. Are there any convincing reasons to split? I review 6 arguments in favour of splitting. I end with my recommendation.

Many of us split

Individual examples

For example, in CEA Staff’s Donation Decisions for 2016, out of 14 staff members who disclosed significant donations, I count 10 who report splitting. (When only small amounts are donated to secondary charities, it is sometimes ambiguous what counts as splitting.) In 2016, Peter Hurford gave about 2/3 to Rethink Charity, and 1/3 to other recipients. Jeff Kaufman and Julia Wise gave about equal amounts to AMF and the EA Giving Group Fund.

EA funds data

I wanted to study EAs’ splitting behaviour more systematically, so I looked at anonymised data from the EA funds, with permission from CEA.

In the following sections, I describe various possible analyses of the data. You can skip to “best approach: user totals” if you just want the bottom line. The R code I used is in the appendix.

I was given access to a list of every EA funds donation between 2017-03-23 and 2017-06-19. Data on allocation precentages was included. For example, if a donor went to the EA funds website and gave $1000, setting the split to 50% “global health and development” and 50% “long-term future”, there would be two entries, each for $500 and with an allocation percentage of 50%. In the following, I call these two entries EA funds donations, and the $1000 an EA funds allocation.

Naive approach: distribution of allocation percentages

The simplest analysis is to look at a histogram of the “allocation percentage” variable. The result looks like this1:

naive

Here, most of the probability mass is on the left, because most donations are strongly split. But what we really care about is how much of the money being donated is split. For that we need to weight by donation size.

Less naive apprach: weighted distribution of allocation percentages

I compute a histogram of allocation percentages weighted by donation size. In other words, I ask: “if I pick a random dollar flowing through EA funds, what is its probability of being part of an EA funds donation which itself represents X% of an EA funds allocation?”, and then plot this for 20 buckets of Xs2.

lessnaive

Here, much more of the probability mass is on the right hand side. This means larger donors split less, and are much more likely to set the allocation percentage to 100%.

But this approach might still be problematic, because it is not invariant to how donors decide to spread their donations across allocations. For instance, suppose we have the following:

Allocation ID Name Fund Allocation % Donation amount
2 Alice Future 100% $1000
1 Alice Health 100% $1000
3 Bob Health 50% $1000
3 Bob Future 50% $1000

Here, Alice and Bob both split their $2000 donations equally between two funds. They merely used the website interface differently: Alice by creating two separate 100% allocations (perhaps the next month), and Bob by creating just one allocation but setting the sliders for each of the funds to 50%.

However, if we used this approach, we would count Alice as not splitting at all.

It’s an open question how much time should elapse between two donations to different charities until it is no longer considered splitting, but rather changing one’s mind. In the individual examples I gave above, I took one month, which seems like a clear case of splitting. Up to a year seems reasonable to me. Since we have less than a year of EA funds data, it’s plausible to consider any donations made to more than one fund as splitting. This is the approach I take in the next section.

Best approach: user totals

For each user, I compute:

  • Their fund sum, i.e. for each fund they donated to, the sum of their donations to that fund
  • Their user totals, i.e. the sum of all their donations to EA Funds

This allows me to create a histogram of the fraction of a user total represented by each fund sum, weighted by the fund sum3.

best

This is reasonably similar to the weighted distribution of allocation percentages, but with a bit more splitting.

Other data

One could also look at the Donations recorded for Vipul Naik database, or Giving What We Can’s data, and conduct similar analyses. The additional value of this over the EA funds analysis seemed limited, so I didn’t do it.

Arguments for splitting

Empirical uncertainty combined with risk aversion

Sometimes being (very) uncertain about which donation opportunity is best is presented as an argument for splitting. For example, the EA funds FAQ says that “there are a number of circumstances where choosing to allocate your donation to multiple Funds might make more sense” such as “if you are more uncertain about which ways of doing good will actually be most effective (you think that the Long-Term Future is most important, but you think that it’s going to be really difficult to make progress in that area)”.

High uncertainty is only a reason to split or diversify if one is risk averse. Is it sensible to be risk averse about one’s altruistic decisions? No. As Carl Schulman writes:

What am I going to do with my tenth vaccine? Vaccinate another kid!

While Sam’s 10th pair of shoes does him little additional good, a tenth donation can vaccinate a tenth child, or a pay for the work of a tenth scientist doing high impact research such as vaccine development. So long as Sam’s donations don’t become huge relative to the cause he is working on (using up the most efficient donation opportunities) he can often treat a charitable donation of $1,000 as just as worthwhile as a 1 in 10 chance of a $10,000 donation.

Moral uncertainty

The EA funds FAQ says that another reason for splitting could be “If you are more uncertain about your values (for example, you think that Animal Welfare and the Long-Term Future are equally important causes)”.

Does it make any difference if the uncertainty posited is about morality or our values rather than the facts? In other words, is it reasonable for a risk-neutral donor facing moral uncertainty to split?

This depends on our general theory for dealing with cases of moral uncertainty. (Will MacAskill has written his thesis on this.) We can start by distinguising moral theories which value acts cardinally (like utilitarianism) from moral theories which only value acts ordinally. The latter category would include theories which only admit of two possible ranks, permissible and impermissible (like some deontlogical theories), as well as theories with finer-grained ranking.

If the only theories in which you have non-zero credence are cardinal theories, we can simply treat our normative uncertainty like empirical uncertainty, by computing the expected value. (MacAskill argues persuasively against competing proposals like ‘my favourite theory’, see Chapter 1).

What if you also hold some credence in merely ordinal theories? In that case, according to MacAskill, you should treat the situation as a voting problem. Each theory can “vote” by ranking your possible actions, and gets a number of votes that is proportional to your credence in that theory.

The question is which voting rule to use. Different voting rules have different properties. A simple property might be:

Unanimity: if all voters agree that X>Y, then the ouput of the voting rule must have X>Y.

Let’s say we are comparing the following acts:

  1. Donate $1000 to charity A
  2. Donate $500 to charity A and $500 to charity B.

Unanimity implies that if all the first-order theories in which you have credence favour (1), then your decision after accounting for moral uncertainty will also favour (1). So provided our voting rule satisfies unanimity, moral uncertainty provides no additional reason to split. (In fact, a much weaker version of unanimity will usually do, if you have sufficiently low credence in pro-splitting moral theories.)

Diminishing returns

A good reason to split would be if you face diminishing returns. At what margins do we begin to see returns diminish sufficiently to justify splitting? This depends on how much donation opportunities differ in cost-effectiveness.

Suppose there are two charities, whose impact $ and are monotone increasing with monotone decreasing first derivatives. Then you should start splitting at such that .

If you have , you should donate to charity f and to charity g such that .

It’s generally thought that for small (sub-six-figure) donors, , that is, returns aren’t diminishing noticeably compared to the difference in cost-effectiveness between charities.

However, many people believe that at the level of the EA community, there should be splitting. What does this imply in the above model?

Let’s assume that the EA community moves per year (including Good Ventures). Some people take the view that the best focus area is more than an order of magnitude more cost-effective than others (although it’s not always clear which margin this claim applies to). Under some such view, marginal returns would need to diminish by more than 10 times over the 0-100M range in order to get a significant amount of splitting. To me, this seems intuitively unlikely. (Of course, some areas may have much faster diminishing returns than others4.) Michael Dickens writes:

The US government spends about $6 billion annually on biosecurity5. According to a Future of Humanity Institute survey, the median respondent believed that superintelligent AI was more than twice as likely to cause complete extinction as pandemics, which suggests that, assuming AI safety isn’t a much simpler problem than biosecurity, it would be appropriate for both fields to receive a similar amount of funding. (Sam Altman, head of Y Combinator, said in a Business Insider interview, “If I were Barack Obama, I would commit maybe $100 billion to R&D of AI safety initiatives.”) Currently, less than $10 million a year goes into AI safety research.

Open Phil can afford to spend something like $200 million/year. Biosecurity and AI safety, Open Phil’s top two cause areas within global catastrophic risk, could likely absorb this much funding without experiencing much diminishing marginal utility of money. (AI safety might see diminishing marginal utility since it’s such a small field right now, but if it were receiving something like $1 billion/year, that would presumably make marginal dollars in AI safety “only” as useful as marginal dollars in biosecurity.)

To take another approach, let’s look at animal advocacy. Extrapolating from Open Phil’s estimates, its grants on cage-free campaigns are probably about ten thousand times more cost-effective than GiveDirectly (if you don’t heavily discount non-human animals, which you shouldn’t) (more on this later), and perhaps a hundred times better after adjusting for robustness. Since grants on criminal justice reform are not significantly more robust than grants on cage-free campaigns, the robustness adjustments look similar for each, so it’s fair to compare their cost-effectiveness estimates rather than their posteriors.

Open Phil’s estimate for PSPP suggests that cage-free campaigns are a thousand times more effective. If we poured way more money into animal advocacy, we’d see diminishing returns as the top interventions became more crowded, and then less strong interventions became more crowded. But for animal advocacy grants to look worse than grants in criminal justice, marginal utility would have to diminish by a factor of 1000. I don’t know what the marginal utility curve looks like, but it’s implausible that we would hit that level of diminished returns before increasing funding in the entire field of farm animal advocacy by a factor of 10 at least. If I’m right about that, that means we should be putting $100 million a year into animal advocacy before we start making grants on criminal justice reform.

I find this line of argument moderately convincing. Therefore, my guess is that people who believe that their preferred focus area is orders of magnitude better than others, should generally also believe that the whole EA community should donate only to that focus area.

Achieving a community-wide split

Suppose you do think, for reasons like those described in the previous section, that because of diminishing returns, the community’s split should be . (There may be other reasons to believe this, for instance if the impact of different causes is multiplicative rather than additive.)

There are two ways that this could lead to you prefer splitting your individual donation: cooperation with other donors, and lack of information.

Cooperation with other donors

Suppose that at time t, before you donate, the communty splits . You are trying to move the final allocation to , so you should donate everything to (assuming your donation is small relative to the community). If the community’s allocation was , however, you should donate everything to . We can call this view the single-player perspective.

From this perspective, it’s very important to find out what the community’s current allocation is, since this completely changes how you should act.

But now suppose that there other donors, who also use the single-player perspective. For the sake of simplicity we can assume they also believe the correct community-wide split is 5. The following problem occurs:

Everyone is encouraged to spend a lot of time looking into current margins, to work out what the best cause is. Worse, if the community as a whole is being close to efficient in allocation, in fact what is best at the margin changes a whole lot as things scale up, and probably isn’t much better than second- or third-best thing. This means that it’s potentially lots of work to form a stable view on where to give, and it doesn’t even matter that much.6

Imagine the donors could all agree to donate in the same proportional split as the optimal community allocation (call this the cooperative perspective). They would obtain the same end result of a 70%/30% split, while saving a lot of effort. When everyone uses the single-player perspective, the group is burning a lot of resources on a zero-sum game.

From a rule-consequentialist perspective, you should cooperate in prisonner’s dilemmas, that is, you should use the cooperative perspective, even if, to the best of you knowledge, this will lead to less impact.

Even if we find rule-consequentialism unconvincing, act-consequentialism would still recommend investing resources to make it more likely that the community as a whole cooperates. This could include publicly advocating for the cooperative perspective, or getting a group of high-profile EA donors to promise to cooperate amongst themselves.

Lack of information

Suppose information about the community’s split was impossible or prohibitively expensive to come by. Then someone using the single-player perspective would have to rely on their priors. One reasonable-sounding prior would be one that is symmetrical on either side of , or otherwise has the expected value . This prior assumes that given no information about where others are donating, they are equally likely to collectively undershoot as to overshoot their preferred community-wide split.

On this prior, the best thing you can do is to donate 70% to and 30% to . So given some priors, and when there is no information about others’ donations, the single-player perspective converges with the cooperative perspective.

Remaining open-minded or avoiding confirmation bias

Because of confirmation bias and consistency effects, donating 100% to one charity may bias us in the direction of believing that this charity is more cost-effective. For example, one GiveWell staff member writes7:

I believe that it is important to keep an open mind about how to give to help others as much as possible. Since I spend a huge portion of my time thinking within the GiveWell framework, I want to set aside some of my time and money for exploring opportunities that I might be missing. I am not yet sure where I’ll give these funds, but I’m currently leaning toward giving to a charity focused on improving farm animal welfare.

I tend to find this type of argument from bias less convincing than other members of the EA community8. I suspect that the biases involved are insensitive to the scope of the donations, that is, it’s sufficient to donate a nomial amount to other causes in order to reduce or eliminate the bias. Then such considerations would offer no reason for significant splitting. It’s also questionable whether such self-deception is even likely to work. Claire Zabel’s post “How we can make it easier to change your mind about cause areas” also offers five techniques for reducing this bias. Applying these techniques seems like a less costly approach than sacrificing significant expected impact to splitting.

Memetic effects

Sometimes people justify splitting like so: “splitting will reduce my direct impact, but it will allow me to have more indirect impact by affecting how others view me”.

For example:

In the past we’ve split our donations approximately 50% to GiveWell-recommended charities and 50% to other promising opportunities, mostly in EA movement building. […] GiveWell charities are easier to talk about and arguably allow us to send a less ambiguous signal to outsiders.

And also:

I’ll probably also give a nominal amount to a range of different causes within EA (likely AMF, GFI and MIRI), in order to keep up to date with the research across the established cause areas, and signal that I think that other cause areas are worthwhile.

The soundness of these reasons depends very much on each donor’s personal social circumstances, so it’s hard to evaluate any specific instance. A few general points to keep in mind are:

  • There may be memetic costs as well as benefits to splitting. For example, donating to only one charity reinforces the important message that EAs try to maximise expected value.
  • From a rule-consequentialist perspective, it may be better to always be fully transparent, and not to make donations decisions based on how they will affect what others think of us.
  • There could be cheaper ways of achieving the same benefits. For example, saying “This year I donated 100% to X, but in the past I’ve donated to Z and Y” or “Many of my friends in the community donate to Z and Y” could send some of the intended signals without requiring any actual splitting.

Recommendation

I’m not convinced by most of the reasons people give for splitting. Cooperation with donors appears to me to be the best proposed reason for splitting.

To some degree, we may be using splitting to satisfy our urge to purchase “fuzzies”. I say this without negative judgement, I agree with Claire Zabel that we should “de-stigmatize talking about emotional attachment to causes”. I think we should satisfy our various desires, like emotional satisfaction or positive impact, in the most efficient way possible. It may not be psychologically realistic to plan to stop splitting altogether. Instead, one could give as much as possible to the recipient with the highest expected value, while satisfying the desire to split with the small remaining part. Personally, I donate 90% to the Far Future EA fund and 10% to the Animal Welfare fund for this reason.

Appendix: R code

library(readr)
library(plotrix)
library(plyr)

f <- data.frame(read_csv("~/split/Anonomized EA Funds donation spreadsheet - Amount given by fund.csv"))

exchr <- 1.27

# convert everything to usd
f$dusd <- ifelse(f$Currency=="GBP",exchr*f$Donation.amount,f$Donation.amount)

#naive histogram of allocation percentages
bseq=seq(0,100,5)
n <- hist(f$Allocation.percentage, breaks=bseq, freq=FALSE, xlab="Allocation Percentage", main="")
n_n <- data.frame(bucket=w$breaks[2:length(n$breaks)],prob=(n$counts/sum(n$counts)))


# weighted histogram
bseq=seq(0,100,5)
w <- weighted.hist(f$Allocation.percentage,f$dusd,breaks=bseq,freq = FALSE, xlab="Allocation Percentage", ylab = "Density, weighted by donation size")
w_n <- data.frame(bucket=w$breaks[2:length(w$breaks)],prob=(w$counts/sum(w$counts)))

#user totals
f <- ddply(f,.(UserID),transform,usersum=sum(dusd))
u <- ddply(u,.(UserID),transform,usersum=sum(dusd))

#user totals by fund
f <- ddply(f,.(UserID, Fund.Name),transform,usersum_fund=sum(dusd))

# fundfrac
f$fundfrac <- f$usersum_fund/f$usersum

# remove appropriate duplicates
f$isdupl <- duplicated(f[,c(2,5)])
f2 <- subset(f, isdupl == FALSE)

# weighted histogram of fundfrac
z <- weighted.hist(f2$fundfrac,f2$usersum_fund,breaks=bseq/100,freq = FALSE, xlab="Fund fraction (per user)", ylab = "Density, weighted by fund sum (per user)")
z_n <- data.frame(bucket=z$breaks[2:length(z$breaks)],prob=(z$counts/sum(z$counts)))
  1. The underlying data are:

    allocation percentage bucket probability
    5% 0.088
    10% 0.137
    15% 0.105
    20% 0.166
    25% 0.098
    30% 0.050
    35% 0.037
    40% 0.039
    45% 0.067
    50% 0.061
    55% 0.011
    60% 0.016
    65% 0.005
    70% 0.011
    75% 0.006
    80% 0.019
    85% 0.004
    90% 0.007
    95% 0.002
    100% 0.069

  2. The data:

    allocation percentage bucket probability
    5% 0.009
    10% 0.011
    15% 0.025
    20% 0.020
    25% 0.043
    30% 0.060
    35% 0.032
    40% 0.066
    45% 0.015
    50% 0.037
    55% 0.083
    60% 0.004
    65% 0.015
    70% 0.002
    75% 0.004
    80% 0.007
    85% 0.022
    90% 0.003
    95% 0.075
    100% 0.467

  3. The data are:

    fraction of user total bucket probability
    5% 0.013
    10% 0.019
    15% 0.021
    20% 0.024
    25% 0.044
    30% 0.052
    35% 0.074
    40% 0.065
    45% 0.025
    50% 0.033
    55% 0.097
    60% 0.074
    65% 0.017
    70% 0.003
    75% 0.004
    80% 0.008
    85% 0.025
    90% 0.003
    95% 0.002
    100% 0.395

  4. One extreme example would be a disease eradication programme, where returns stay high until they go to zero after eradicaiton has been successful, vs. cash transfers where returns diminish very slowly. 

  5. The extension to the general case would go like this: everyone truthfully states their preferred split and donation amount, and a weighted average is used to compute the resulting community-preferred spit. See also “Donor coordination under simplifying assumptions”

  6. Adapted from Owen Cotton-Barratt, personal communication. 

  7. In addition, this OpenPhil post on worldview diversification, and this comment give reasons a large funder may want to make diversified donations in order to retain the ability to pivot to a better area. Some of them may transfer to the individual donor case. 

  8. The points in this paragraph apply similarly to oher “arguments from bias”, such as donating for learning value or to motivate oneself to do reasearch in the future (both of which I have seen made).