Originally
published in slightly different form on September 16, 2010 at PsychologyToday.com.
I’ve spent a good deal of
time here emphasizing the differences between the two most common forms of pet
dog training—the pack leader and behavioral science models—and contrasting them
with the approach I use, which is more closely allied with the way working dogs
are trained (primarily through stimulating and then satisfying their prey
drive). With that in mind, I’m now proposing a “unified theory of dog
training,” which will hopefully show how all three models are related, why some
methods work better in some training situations than in others, and why each
model sometimes fails.
First stop on our journey,
understanding pattern recognition.
There’s a tendency among
+R trainers to believe that their method is based on “the science of how
animals learn,” when in fact, there are still many gaps in our knowledge about
how learning actually takes place. For instance the idea that dogs learn
through making associations between a behavior (“I sit”) and its consequences
(“I get a treat”) may not actually be the case; there’s a growing body of
clinical research, particularly in the area of neuroscience, which strongly
suggests that the learning process may be the result of a very different set of
rules than what we’ve previously been taught.
Dr. Ian Dunbar,1 one of the main figureheads in the +R movement
wrote on his blog recently that, “The first gift that we can give to all animal
owners, parents and teachers is to simplify the ridiculously ambiguous and
unnecessarily complicated and confusing [behavioral science] terminology. Second,
let’s simplify the underlying theory by going back to Thorndike’s original
premise—that behavior is influenced by [its] consequences.”
As I’ve pointed out
before, this idea of how pleasant or unpleasant outcomes shape behavior can be
traced directly back to Freud’s “pleasure principle”—we tend to be attracted to
things that increase pleasure (or decrease internal tension), and that we tend
to avoid things that do the opposite. However, new research suggests that both
Dunbar and I may be wrong, that behavior is not learned via its consequences.
I think one of the biggest
misunderstandings about positive reinforcement is the idea that animals learn
new behaviors primarily because a neurotransmitter called dopamine creates a
feeling of well-being in connection with an external reward, and that even the
anticipation of a reward releases dopamine.
Here’s what WikiPedia has
to say: “Dopamine is commonly associated with the reward system of the brain,
providing feelings of enjoyment and reinforcement to motivate a person to
perform certain activities.”
That sounds about right,
doesn’t it?
Yes, but here’s the
problem. In testing this idea directly on the brains of certain animals (mainly
rats, mice, and monkeys), some researchers have found an interesting set of
anomalies. For instance, in his paper “Dopamine and Reward: Comment on Hernandez et al. (2006),” Neuroscientist Randy Gallistel of Rutgers writes, “In the monkey, dopamine neurons do not
fire in response to an expected reward, only in response to an unexpected or
uncertain one, and, most distressingly of all, to the omission of an expected
one.”
So missing out on a reward
is pleasurable? How could that be?
In another article, “Deconstructing the Law of Effect,” Gallistel poses the problem of learning from an
information theory perspective, contrasting Edward Thorndike’s model, which
operates as a feedback system, and a feedforward model based on Claude
Shannon’s information theory.
It’s well-known that
shaping animal behavior via operant or classical conditioning requires a
certain amount of time and repetition. But in the feedforward model learning
can take place instantly, in real time.
Why the difference? And
is it important?
I think so. Which is more
adaptive, being able to learn a new behavior on the fly, in the heat of the
moment, or waiting for more and more repetitions of the exact same experience
to set a new behavior in place?
In Thorndike’s model, the
main focus is on targeting which events in a stream seem to create changes in
behavior. But according to information theory, the intervals between events,
when nothing is happening, also carry information, sometimes even more than is
carried during the unconditioned stimulus. This would explain why the
monkey’s brains were producing dopamine when they detected a big change
in the pattern of reward, i.e., no reward at all!
We’re now discovering that
the real purpose of dopamine is to help motivate us to gather new information
about the outside world quickly and efficiently. In fact dopamine is released
during negative experiences as well as positive ones. (The puppy who gets his
nose scratched by the cat doesn’t need further lessons to reinforce the
“no-chasing-the-cat” rule; he learns that instantaneously, with a single swipe
of the cat’s paw.)
This adds further
importance to the idea that learning is not as much about pairing behaviors
with their consequences as it is about paying close attention to salient
changes in our environment: the bigger the changes, the more dopamine is released,
and, therefore, the deeper the learning.
Randy Gallistel again:
“...behavior is not the result of a learning process that selects behaviors on
the basis of their consequences ... both the appearance of ‘conditioned’
responses and their relative strengths may depend simply on perceived patterns
of reward without regard to the behavior that produced those rewards.” (“The Rat Approximates an Ideal Detector of Changes in Rates of Reward: Implications for the Law of Effect,” Journal of Experimental Psychology: 2001, 27, 354-372.)
Temple Grandin, the
subject of a recent award-winning HBO film starring Clare Danes, always
provides us with keen insights into animal behavior, and more particularly,
their thought processes. I think she hits the nail on the head when she says
that animal minds are geared toward perceiving vivid sensory details about
their environments while the human brain tends to gather these details into conceptual
chunks. In general terms: the animal mind is, in most cases, a difference
detector, while the human mind is a similarity detector. (Dogs seem to fall
somewhere in between.)
So if learning takes place
through recognizing changes in the environment—an instantaneous process that
releases dopamine—and not through the slow, random, trial-and-error recognition
of connections between behaviors and their consequences—which sometimes does
and sometimes doesn’t release dopamine—this would indicate that while Ian
Dunbar’s model of learning may have flaws, perhaps so does mine!2
See? This is why we need a
“unified dog theory!”3
“Life Is an Adventure—Where Will Your Dog Take You?”
Join Me on Facebook!
Follow Me on Twitter!
Join the Rescue Dog Owners Support Group!
Join Me on Facebook!
Follow Me on Twitter!
Join the Rescue Dog Owners Support Group!
Footnotes:
1.) While I was putting
the finishing touches on this article I learned that Cesar Millan has invited
Ian Dunbar to contribute a chapter to his next book, and Dunbar agreed.
So there may already be some movement toward a “unified dog theory” taking
place, having nothing to do with me.
2.) However, using my
model of tension and release, any reduction of tension or stress would
hypothetically end up acting as a double reward.
3) Any unified dog theory
has to include the one model of dog training that, more than any other, relies
on teaching behaviors through pattern recognition, and that’s the model used to
train working dogs: sheepdogs, cattle dogs, detection dogs, police dogs, etc.
This may be one reason these are among the best-trained dogs on the planet.
No comments:
Post a Comment