posttitle = Newcomb’s Problem is a trust problem titleClass =title-long len =42

Newcomb’s Problem is a trust problem

There’s a classic thought experiment called Newcomb’s Problem.  It goes as follows:

Newcomb’s problem: You face two boxes: a transparent box, containing a thousand dollars, and an opaque box, which contains either a million dollars, or nothing. You can take (a) only the opaque box (one-boxing), or (b) both boxes (two-boxing). Yesterday, Omega — a superintelligent AI — put a million dollars in the opaque box if she predicted you’d one-box, and nothing if she predicted you’d two-box. Omega’s predictions are almost always right.

If you haven’t heard about this yet, you might as well take a moment to consider what you’d do.

While you do, here’s a quote by Robert Nozick in his 1969 analysis of it:

In his 1969 article, Nozick noted that “To almost everyone, it is perfectly clear and obvious what should be done. The difficulty is that these people seem to divide almost evenly on the problem, with large numbers thinking that the opposing half is just being silly.”

One argument is: what you do right now can’t control what’s already in the box, therefore obviously you should two-box and get more money.

The other argument is: if you think like that you will not get very much money at all, because there will be no money in the other box.

But doesn’t that second argument imply that the choice you make now can in some sense control the past? “Yes”, answers Joe Carlsmith in his Betteridge’s-Law-of-Headlines-violating essay Can you control the past?, “and this is a wild and disorienting fact”.

Building trust in Omega

The whole essay is worth reading if you’re at all interested in the topic. My aim right now is to analyze this one beautiful hypothetical:

Imagine doing “tryout runs” of Newcomb’s problem, using monopoly money, as many times as you’d like, before facing the real case (h/t Drescher (2006) again). You try different patterns of one-boxing and two-boxing, over and over. Every time you one-box, the opaque box is full. Each time you two-box, it’s empty. 

You find yourself thinking: “wow, this Omega character is no joke.” But you try getting fancier. You fake left, then go right — reaching for the one box, then lunging for the second box too at the last moment. You try increasingly complex chains of reasoning. Before choosing, you try deceiving yourself, bonking yourself on the head, taking heavy doses of hallucinogens. But to no avail. You can’t pull a fast one on ol’ Omega. Omega is right every time.

Indeed, pretty quickly, it starts to feel like you can basically just decide what the opaque box will contain. “Shazam!” you say, waving your arms over the boxes: “I hereby make it the case that Omega put a million dollars into the box.” And thus, as you one box, it is so. “Shazam!” you say again, waving your arms over a new set of boxes: “I hereby make it the case that Omega left the box empty.” And thus, as you two-box, it is so. With Omega’s help, you feel like you have become a magician. With Omega’s help, you feel like you can choose the past. 

Now, finally, you face the true test, the real boxes, the legal tender. What will you choose? Here, I expect some feeling like: “I know this one; I’ve played this game before.” That is, I expect to have learned, in my gut, what one-boxing, or two-boxing, will lead to — to feel viscerally that there are really only two available outcomes here: I get a million dollars, by one boxing, or I get a thousand, by two-boxing. The choice seems clear.

Makes sense, right? Like if you had those experiences, with the monopoly money, that’s how it would feel.

And I would describe this as “you have gained trust in Omega’s prediction abilities”.  You can tell for yourself that Omega can predict you effectively perfectly.  Your sense of things includes this perfect predictor.  Let’s contrast that with the scenario as presented:

Newcomb’s problem: You face two boxes: a transparent box, containing a thousand dollars, and an opaque box, which contains either a million dollars, or nothing. You can take (a) only the opaque box (one-boxing), or (b) both boxes (two-boxing). Yesterday, Omega — a superintelligent AI — put a million dollars in the opaque box if she predicted you’d one-box, and nothing if she predicted you’d two-box. Omega’s predictions are almost always right.

“Omega’s predictions are almost always right.” — says who?

When we act in the world, we don’t generally do so on the basis of sensational claims we have not verified.  And here we are being asked to stake our skin in the game on a claim about the predictive capabilities of a very strange sort of being with whom we are unfamiliar.  How does Omega know what we’re like?  Where did that information come from?  And why should I buy the claim that Omega is almost always right?

(Maybe two-boxers are like the children who eat the marshmallow immediately not because they have no self-control but because given their life experiences they have no trust that if they don’t eat it that it won’t get taken from them, let alone that they’ll be given a second one after 15 minutes!)

Of course, the act of stepping into a thought experiment involves buying such claims, which is why I generally recommend not pressing the button. But suppose one does. This one raises at least more interesting questions than “why am I standing next to this lever and how did I come to have perfect knowledge of how trolleys work while somehow still being me?”

Being someone

This “while somehow still being me?” question is central. The aforereferenced trolley problem is famously framed as one of ethics, and while I’m an ethical realist in the sense that I think there are objective truths about ethics, the most important of those truths as I see it is that ethics is necessarily contextual.  So it doesn’t work to reason about it out of context.

Anyway, Jacob Falkovich writes:

Newcomb’s problem is the *most* interesting thought experiment because it gest to the heart of the problem with this entire approach to philosophy: are YOU *somebody* or are you *in a thought experiment*?

Simulating & trusting

Newcomblike situations show up all the time where people are simulating each other—this is what happens when other people show up on time only if they think everybody else will show up on time.  The basketball player who throws the ball to the place in the court where her teammate is headed, not because she can see him but because she knows he’d be heading there.  His movements are part of her sense of things.

“Simulating each other” is a way too third-person clinical kind of way to put it.  Your sense of reality simply includes what you expect to happen, including the results of others intentions, skills, and integrities.  It’s so immediate.  It’s simply what is going on.  It can be wrong—if you’ve ever arrived where your car or bike is only to encounter NOTHING it was stolen or towed, you know how gut-turning this can be.  It’s visceral.  It’s reality, as you know it.

What’s weird about the original Newcomb’s problem is the lack of any opportunity for the requisite trust to be built—or distrust.  The lack of entanglement, of basis for simulation. It’s also odd for situations to be so perfectly onesided, where one side can perfectly simulate the other but not vice versa. Where did Omega’s sense of you come from? Where is your trust in Omega’s sense of you supposed to come from? Central questions to how we actually navigate these questions, left utterly unanswered.

Meanwhile, the monopoly money scenario, while it would allow trust to be built if you somehow got in it, requires even more suspension of disbelief to enter—magically, Omega not merely picks the sort of thing you’d do when faced with your real skin in the game choice, once…  but manages each time to simulate you so perfectly that she manages to know which feint you’ll do or at least which choice you’ll eventually make after the feint.

Staying with that scenario longer…  you could imagine that if you got thousands of opportunities to try this with the monopoly money, you might eventually be able to find holes in Omega’s simulation process.  Maybe she can simulate your natural behavior fine, or a coinflip, but not a random number generator, and so if you choose based on that, she actually does no better than chance—she simulates you well enough to know you’re going to decide that way, but can’t simulate the actual outcome.  Or maybe ordinary algorithmic randomness is easy for her but a quantum one is impossible for it.  Then you’d actually start to be building not just a sense that you control the past via your choices, but a sense of how.  Not merely a sense that Omega can predict you, but your own sense of how Omega can predict you.  You start being mutually entangled—still not symmetrically, but no longer one-way with no loop.

And this changes how you relate to Omega, which changes how Omega relates to you.

Collective integrity and identity

Meanwhile… there’s a much more realistic example given in Joe’s post:

I find that my two-boxing intuition strengthens if Omega is your great grandfather, long dead (h/t Amanda Askell for suggesting this framing to me years ago), and if we specify that he’s merely a “pretty good” predictor; one who is right, say, 80% of the time (EDT still says to one-box, in this case). Suppose that he left the boxes in the attic of your family estate, for you to open on your 18th birthday. At the appointed time, you climb the dusty staircase; you brush the cobwebs off the antique boxes; you see the thousand through the glass. Are you really supposed to just leave it there, sitting in the attic? What sort of rationality is that?

Leaving aside the “leave it there”—obviously there’s some stimulation about whether you’ll donate it to charity vs take it for yourself. And let’s leave aside the 80%.

There is a kind of family lineage that would absolutely prize itself on its ability to have a kind of integrity in the form of being able to make choices like this—and predictions like this—and part of how it might do it is via rituals like this. And in such a scenario, the ritual would be awesome and meaningful and you would be prepared for it by your trust in the kind of person your great-grandfather is and the kind of person he would know you to be, by the entanglement via your family, even if you never met him. You’d spend your whole childhood being trained to pass this particular intergenerational marshmallow test, though perhaps without knowing of the specifics.

Meanwhile, if such a lineage was intact generations ago, if you get to age 18 and see your great-grandfather as an old kook or whatever, you’re not likely to respect the meaning of enacting the desired entanglement.

And in a scenario where there’s no lineage, well, you’re choosing just based on trying to think about what he thinks you would do, but the whole thing is very weak and sparse, and your simulation of his simulation of you, conditioned on your simulation of him… breaks down.

(And in contexts not that dissimilar to this one, there’ll sometimes be the idea that you somehow should trust him to have put it in, because only if you make the leap and trust him, will you be the sort of person who will have earned the reward. But of course, only if he’s trustworthy should you trust him. This is a reflexive system, with no right answer, so trust cannot answer the question. This is a matter of faith—more to come about that in future posts.)

Some religious or cultural or family traditions are seemingly unnecessarily costly (“irrational”) and maintaining them is part of how collective identity is maintained over space and time… and the benefits of that coalition can vastly outweigh the costs. Newcomb’s problem is a weird half-slice of such a thing, that has no wholeness. More on this lens on Newcomb’s problem in The Intelligent Social Web.

Closing thoughts

What I want to say here is about how thought experiments in general tend to involve some sort of confusion or violation around how your trust in what’s going on is always the basis for action.  They ask you to take someone’s word for it about something that often you cannot make sense of, and then make an important decision.  Even worse, they then attempt to generalize about how you’d act in real situations, on the basis of contrived ones.

In some sense, this is how the mutual simulation process works, but…  it’s also utterly not.  It’s an absurdly crude and confused approximation, that would not enable you to be a good friend, let alone to play Omega.

When we come to know and trust each other, we are being who we are in the actual kinds of situations that we’re in, together, and our very sense of the world and ourselves becomes imbued with our sense of each other. Actual entanglement is shockingly intimate.

If you found this thought-provoking, I invite you to subscribe:    
About Malcolm

Constantly consciously expanding the boundaries of thoughtspace and actionspace. Creator of Intend, a system for improvisationally & creatively staying in touch with what's most important to you, and taking action towards it.



1 Comment

Malcolm » 26 Nov 2025 » Reply

Adding that there’s something interesting about forgiveness and how to credibly pull yourself out of distrust holes. And that Benjamin Ross Hoffman has a great post on this Calvinism as a Theory of Recovered High-Trust Agency which continues to be the only time I’ve ever seen anybody say anything nice about Calvinism.

Have your say!

Have your say!

Message

Name *

Email *