Shaping is not Coercion

Sometimes reinforcement training gets confused with coercion—I myself had this confusion for quite some time, first implicitly then more explicitly from having read Perceptual Control Theory (PCT) books.

Then I read Don’t Shoot the Dog, a book on reinforcement training, and realized that I was approximately confusing threat and punishment and reward…  with creating a good learning gradient for someone.

We are what we repeatedly do.  Excellence, then, is not an act, but a habit.
– Aristotle

What causes us to repeatedly do things? They work.  What causes us to do different things? They seem to work better. This is how learning works.

So if you want an organism—yourself, a spouse, a pet, a kid, a coworker—to learn to do something different, then you need to give them some sort of gradient along which they try new things and those things seem to work better. You need to meet them in the adjacent possible, and encourage them along it.

Video games are excellent at this—they start easy, then get progressively harder as you go.  At each stage, you need to figure out what works to get you to the next stage. If you got thrown into level 48 at the start, you’d be totally overwhelmed.

This is the idea with shaping: using some reinforcer and a sequence of behaviors that work at each step.

Reinforcement is not reward.  Arguably it’s a subtype of reward, but unlike the typical connotation of “reward”, here you need dozens or hundreds of little instances (or in the case of training LLMs, trillions), and more critically it needs to come immediately.  Seconds is too late. So it’s a very different vibe.

Shaping a dog to lay down

When my wife and I read Don’t Shoot the Dog, we were in Australia with her family for Christmas, and her folks have a dog named Eddie.  Using lentil-sized pieces of meat, we spent a couple days shaping him to lie down.

This was pretty easy, in part because we could use the presence of the meat in our hand—which got his attention and tended to get his nose to follow—to guide him from a seated position to a lying down position.  Then the trick was to figure out how to get him to do that while we backed up and moved the “down” gesture so that it wasn’t happening right in front of his nose, but rather from us at standing height.

But all told we taught the old dog a new trick in a couple days, despite ourselves being totally new to the training method.

Teaching my 1yo baby a useful skill

One of the skills they don’t tell you you’ll need when you become a parent is improvisationally automating and delegating things.  Maybe some parents don’t do this, but I did a lot.  I don’t like being a laptop employed as a nightlight.  So in the first few months of parenting, when I found myself with the job of “simply hold the bottle while the baby drinks”. I figured out ways to better prop the bottle up: give it more balance points by attaching a small cheap carpenter’s clamp to it.  Soon my intrepid daughter was holding it herself, merely reflexively—this is her at 3.5months, a good few weeks before she learned even to just see an object and reach for it.

Something they do tell you about parenting is that there will be regressions—the newborn who sleeps so easily becomes a 4mo who sleeps very erratically until you give them really clear consistent sleep associations and help them learn to bridge from one sleep cycle to the next.  But it had not occurred to me that after my baby around 7mo learned to hold the bottle herself without the clamp… that she would some day soon become unable to feed herself with the bottle.

That day came at around 10 months, when she learned to sit up.  See…  she hadn’t learned about gravity.  When she was laying on her back, merely holding the bottle in her mouth at all (usually at about the angle in the above photo) would result in the milk flowing.  But once she could sit up, well, gravity wasn’t on her side.  Suddenly she was frustrated, but she refused to drink laying down. Fair enough, but it meant I was once again employed below my pay grade.

It took me another 2 months before it occurred to me to try this, but one night it occurred to me to try shaping—at worst, I would entertain myself more than just sitting there holding the bottle up.  The regime was simple:

  1. an obvious reinforcer: the milk is flowing
  2. the following conditions, in order:
    1. the milk only flows if her hand is near the bottle
    2. the milk only flows if her hand touches the bottle
    3. the milk only flows if she’s holding the bottle
    4. the milk only flows if she holds the bottle up (at this point I was basically watching!)

She only has one bottle a day, at bedtime.  One night, I went through that process and it took the whole 20 minutes or so of the feed to get any moments of the last step.  I would back up if it seemed too hard.  The next night, within 5 minutes she was feeding herself.  The following night I just handed her the bottle and she grabbed it and tilted it up. And every night since, although sometimes someone enjoys holding and feeding her anyway.

I don’t know if it would have worked earlier, but my guess is yes given how fast it worked when I did it. Might’ve taken longer at 10mo—and I’m not 100% sure she was strong enough at that age but she was climbing a lot so probably.  Endurance though.

So I taught my baby a skill.

How is this kind of training different from coercion?

I’m mostly writing for my past self here. I legit used to think that all such training regimes were coercive.

I want to suggest that the two issues with coercion are:

  1. it doesn’t engage with the learning process.  it merely says “do/stop this or else” or “if you do/stop this, there’ll be a giant payoff”.  it doesn’t engage in the intimate play of figuring out how to actually help the organism solve that challenge. the mere presence of the reward, possibly some substantial distance in the future, is supposed to generate sustained motivation and creativity.
  2. it doesn’t respect some other serious need/goal the organism has.  suppose that someone comes up with a shaping regime to try to get me to stop talking.  they find a way of reinforcing every time I’m quiet in a conversation, and perhaps negatively reinforcing every time I talk.  this might work up to a point (and frankly I might appreciate it up to that point!) but after that point it would require a certain kind of force because I have reasons for talking, and failing to solve those while continuing to seriously aim to reduce my talking would create inner conflict in me, which is very stressful.  The PCT folks highlight this in exquisite detail.

It’s maybe already obvious how the above situations don’t have these issues, but let’s spell it out:

Conditional reinforcing is not coercing

I used to think that the act of giving the reinforcer conditionally was somehow coercive.

I now mostly would pay attention to whether the organism seems to be engaging in the process with interest or frustration, and as long as you’re in the interest zone, ie the zone of proximal development, the conditionality feels like a fun puzzle.  And, moreover, if the learning conditions are somehow demeaning, rather than an attuned loving challenge, then even if the task is in the adjacent possible, then, well, something bad is happening.  Is it coercion? It might be some other problem. Importantly, for shaping to work, the conditionality needs to be so small that within seconds of them trying different stuff, something they do is further along the path and so they get reinforced.  So there’s no experience of it being withheld.

Creating the situation required for reinforcing is not coercing

A more subtle confusion, likely held only by those who have engaged with Perceptual Control Theory and its beef against behaviorism, is that the situation required in order to do a training regime are themselves coercive or otherwise require total captive control of an organism’s environment, because that’s the only way that you can keep them ongoingly in a situation where you have control over something that they need (the reinforcer) so that you can use it to reinforce them.  This one seems silly to me from the above examples.  Like yes, I control the meat & milk, and the dog & baby can’t get more of them than I (or other caregivers) allow.  But:

  1. the dog does not NEED the meat, it merely wants and enjoys it.  I don’t need to keep the dog hungry.  It doesn’t get increasingly frustrated if it goes days without getting any meat, trying and failing different strategies to cajole or trick or subvert me so it can get some.  it just goes about its business.
  2. the baby is GOING TO get the milk! I am not withholding milk from the baby for any meaningful length of time. If the training failed, on any scale, the baby would get the milk anyway within seconds.  I don’t think it occurred to my daughter, who by that time could understand things working on the scale of 10+ seconds, that this mild frustration she was having on this particular evening when the milk wasn’t flowing consistently meant that she would go hungry.

In fact, in both cases, the learning is going to go way better if the dog or baby is not hungry, because they’ll have more patience.

My impression is that BF Skinner used to keep his pigeons somewhat hungry—maybe this is necessary for some animals and not others.  I haven’t not investigated.  I’m not trying to say reinforcing/shaping is never coercion, just that it isn’t inherently coercion.

But also animal trainers often work with secondary stimulus—a simple click sound—for which the animal is clearly not operating in a state of deprivation. And the animal gets excited when it’s training time! I get to learn! What fun!

“That worked! [in terms of something I care about]”

One reframe I found helpful for this is to think of the reinforcer as a signal of “that worked!”

It’s a more intuitive, 1st person, description of what a reinforcer is. It reinforces because it tells the organism what worked. It tells them what worked because it got them what they wanted, or it got them a sign that they would get what they wanted later, or because it got some signal that they were learning something worth learning. And sometimes the mere act of having some success with something you’re trying to do gives you that “that worked!” sign. But for forming superorganisms, we need to be able to tell each other what worked.

I have a video on this:

I also have another essay defining coercion in terms of PCT, which may be of interest.

2024: Nascent Intelligence

I was worried that due to time constraints, and also because of the current zeitgeist, that I was going to end up writing a short outline of my year and then getting Claude or some other LLM to expand it for me into a full post. But I currently don’t do that with any of my writing, and a yearly review post feels like almost the worst thing to do it with because part of the whole point is it’s just an expression of what’s going on for me, and the AI is not gonna be able to fill in the details accurately (unlike if it can interpolate some model or explanation) so I might as well just publish the outline.

Instead, however, I find myself dictating large chunks of this post using wisprflow transcription (which can keep up with me at >200wpm with background music!) plus a foot pedal keyboard with three buttons: [tab, dictate, and enter] while feeding my baby daughter. And that feels like a great place to start in terms of what has the year been like.  My year has been a year characterized by coming into contact with nascent intelligence, notably:

  • LLM systems and other AIs
  • my baby daughter who was born in August.

The fact that Jess was pregnant was a detail omitted from last year’s yearly review, since we hadn’t told more than a few family and friends at that point. The previous year, I omitted the fact that we’d gotten engaged, for the same reason!

Anyway, the year thus began for the Ocean family with a sense of the water slowwwwly pulling back to create a massive wave that we knew would crash down and completely change our lives sometime in the summer.

» read the rest of this entry »

The Parable of the Canoe Sandwich 🛶🥪

Suppose you and I are out having a canoe trip. We’re spending the day out, and won’t be back for hours. Suppose there’s a surprise wave or gust of wind and… you drop your sandwich in the water. Now we only have one sandwich between us, and no other food.

If we were in this situation, I’d want you to have half of my sandwich.

an AI-generated painting depicting the scene just described

That wouldn’t be a favour to you, or an obligation, or a compromise. I’d be happy to give you half my sandwich. It would be what I want. It would be what I want, under the circumstances. Neither of us wanted the circumstances of you having dropped your sandwich, but given that that happened, we’d want you to have half of mine.

Yes—this is more accurate: we would want you to have half of my sandwich.

However, this requires us having a We that’s capable of wanting things.

To explore this, let’s flip the roles—suppose it’s me who dropped my sandwich. I’m assuming that you feel the sense in which of course you’d want me to have some of yours. If you need to tweak the story in order to make that true, go for it. Eg maybe you wouldn’t if “I” dropped my sandwich but you would if say an animal ran off with it—not a version though where you lost my sandwich and you’re trying to make it up to me! That’s a very different thing.

So suppose my sandwich has been lost and your initial response is like “of course I’d want you to have half of mine”.

However… suppose that in response to this event, I’m kind of aggressive & entitled about the whole thing and I’m demanding some of your sandwich (or all of it, for that matter). My guess is that this would dramatically reduce the sense in which you would want to give some to me. You might anyway, from fear or obligation or conflict-avoidance or “wanting to be a good friend” or whatever, but it would no longer directly feel like “oh yeah of course I’d want that.” Part of why, is the breakdown of the sense of We that is implied by my demand—my demand enacts a world where what you want and what I want are at odds, which didn’t seem to be the case back when you felt that sharing the sandwich would be what you wanted. I seem to only care about my needs, not yours, thus I’m not caring about our needs, so it seems like you might get exploited or overdrawn if you try to open yourself towards my needs. (And by “seems”, I don’t at all mean to imply that this isn’t what’s happening—maybe it is! “If you give them an inch they’ll take a mile” is a real interpersonal pattern.)

» read the rest of this entry »
Learn how to bootstrap meta-trust
If you're involved in some kind of developmental culture (community, company, whatever), check out How we get there, my short dense ebook on what allows groups to unfold towards more robust trust, rather than falling into traps of fake trust. a graphic depicting the How We Get There book
Become more intentional
Check out Intend, a web-app that I built to help people spend their time in meaningful & intentional ways and be more playfully purposeful. Intend logo
Connect with me on Twitter!