Leonard: "Sheldon, you can't train my girlfriend like a lab rat."
Sheldon: "Actually, it turns out that I can."
No matter how much you love someone, there is always something that they do that drives you crazy. A little personality quirk can be considered endearing or nerve-wracking considering the circumstances. It is tempting to try to eradicate this quirk, but everyone knows that you cannot get someone to change who they are. Or can you?
For Sheldon Cooper, the annoyance in his life is not a personality quirk in his roommate but the entire personality of his roommate's girlfriend, Penny. The first offense comes on a Monday morning, when Sheldon wakes up to find Penny dancing in the kitchen and making french toast for breakfast. The problem is, Monday is french toast day; Sheldon refuses to deviate from his schedule, so the mistake is unforgivable in his eyes. Leonard, Sheldon's roommate, urges Sheldon to "find a better way to deal with Penny."
Leonard: "Okay. I know what you're doing."
Leonard: "Yes, you're using chocolates as positive reinforcement for what you consider correct behavior!"
Sheldon: "Very good! Chocolate?"
How does one go about changing someone's behavior? Sheldon's solution is to reward Penny every time she does something of which he approves by offering her a chocolate. The first offering is suspicious, because, as Leonard explains to Sheldon, though offering chocolate falls under the definition of nice, "it does, but in my experience, you don't."
Later on, after more offered chocolates, Leonard figures out Sheldon's goals. By using positive reinforcement, Sheldon hopes to subconsciously guide Penny into acting in a more pleasing manner. If positive reinforcement were to fail or progress too slowly, he has the option of using the negative reinforcement of a squirt of water from a water bottle or a small electric shock (which he assures Leonard would have no long term damage).
Over the course of the evening, Sheldon succeeds in getting Penny to vacate his seat on the couch, leave the room to take a phone call, and lower the register of her voice to "a more pleasing baritone." Penny and Sheldon's relationship remains as tempestuous as ever, but this is more likely due to Leonard forbidding Sheldon to continue the reinforcement than to the effectiveness of the training itself.
Thorndike and Human Learning
In the history of behaviorism, you must begin with Edward Thorndike and the law of effect. Thorndike developed problem boxes, small boxes that had only one physical solution that would allow escape. He would place a cat in the box, and wait for the cat to find the solution, which would give the cat freedom and a food treat. Thorndike would measure the latency period, the time between enclosure and escape, and noticed that the latency would decrease with successive trials.
By graphing the change in latency, Thorndike ending up with learning curves. One would expect that after discovering the single behavior that resulted in escape, the cats would repeat this behavior unfailingly as it was the only successful behavior. This would result in a sudden drop in latency periods. Instead,Thorndike observed these learning curves. This indicated that learning was gradual rather than sudden.
As Douglas Mook explains in Classic Experiments in Psychology, “it was as if the correct response were gradually being strengthened by the reward of escape from confinement (and fish). Unrewarded responses gradually dropped out” (128).
These observations lead to Thorndike’s law of effect (or the reinforcement principle), which takes into account only the effect of action rather than knowledge of the effect (which would explain gradual learning). A successful action forms a connection between stimulus situation (in this case the Thorndike problem box) and response (performing the successful action).
The gradual learning is further explained with the law of exercise, which dictates that subsequent learning is explained by the strengthening of the connection between stimulus situation and response through repeated experience.
Pavlov and Conditioned Responses
It is difficult to talk about operant conditioning without at least mentioning its partner in crime. Classical conditioning is perhaps the most famous of behavior studies. Physiologist Ivan Pavlov wished to study the rate of saliva production in dogs at the introduction of food. He surgically implanted a collection tube in the jaw of his dogs and measured the resulting saliva. But there was one problem with his experiment. As Mook puts it:
“After a while, the dog would begin to salivate as the assistant approached, before any food was placed in its mouth. Clearly that was a problem. Pavlov wanted to study salivation as a response to food in the mouth. He could hardly do that if salivation begins when there is no food in the mouth. Pavlov decided, however, that what was happening here was even more interesting than the original question” (132)
Why did the dog start to salivate when no food was present? Pavlov hypothesized that a connection had been formed that associated the approach of the assistant with the food. In essence, the approach of the assistant replaced the food stimulus in the creation of the salivation response. Pavlov decided to create a new connection in the dog's mind by taking the assistant out of the equation and attempting to form a new connection between a metronome or bell and the food delivery.
This famous experiment is quite simple. You take one hungry dog and present it with food and it salivates. In this set-up (1), the food is the unconditioned stimulus (or UCS, the natural cause) and the salivation is the unconditioned response (or UCR, the natural response). In subsequent trials (2), you introduce your experimental or conditioned stimulus (CS) prior to the introduction of the food. A connection is formed in the mind between the CS and the UCR such that, eventually (3), you can replace the UCS with the CS completely and still get the desired reaction, or conditioned response (CR). As the CS is presented without the food present, however, the process of experimental extinction occurs, in which the frequency of CR will reduce without the presence of the food.
Skinner and Reinforcement
Pavlov's almost accidental discovery dealt with learning predictive cues that came before the behavior was exhibited. B.F. Skinner, on the other hand, like Thorndike, focused on the effect of reward and punishment after the behavior occurred. While Thorndike studied rates of learning, Skinner focused on the control of behavior, or operant conditioning. As George Stanley Reynolds explains in A Primer On Operant Conditioning, operant conditioning “refers to a process in which the frequency of occurrence of a bit of behavior is modified by the consequences of the behavior” (1).
Like Thorndike, Skinner used special boxes and animals in his experiments. Unlike Thorndike's boxes, however, Skinner's were not meant to be escaped and they contained pigeons rather than cats. Pigeons are extremely food motivated (have you ever been to Piazza San Marco in Venice? Offer the pigeons some food and you are bombarded), which made them great subjects for Skinner's experiments. Instead of the offer of escape as the reward of the desired behavior, Skinner set up a food delivery system in the boxes. If the pigeon pecked in the correct location (they are very good at pecking), food would be delivered. Desired behavior leads to reward, again similar to Thorndike's experiments.
Skinner went further, however, and studied behavior discrimination by varying the rates of reward based on secondary stimuli (for example, pressing a lever brings food but more often when a light is on then when it is not). He also varied the rate of reinforcement by using three different reinforcement schedules:
fixed ratio (pigeon is rewarded on every nth behavior) results in a high and steady rate of response
variable ratio (pigeon is rewarded after after random number of responses) results in a high and steady rate of response
fixed interval (pigeon is rewarded after a fixed period of time, regardless of behavior) results in a high rate of response as the end of the interval approaches and a slower rate at its beginning
variable interval (pigeon is rewarded at random times, regardless of behavior) results in a slow and steady rate of response
The way Sheldon goes about modifying Penny’s behavior is not by pushing her into certain activities he deems appropriate, but by reinforcing naturally occurring reactions so as to see her perform them more often. As Reynolds explains, “the frequency of occurrence of an operant is greatly influenced by the consequences of the operant” (8). In this case, the operant is the desirable behavior and the consequence of that behavior is the offer of delicious chocolate.
What we see Sheldon do, however, would only be part of the process. In order to properly train Penny to his specifications, he would need to set up conditions in such a way that enabled Penny to easily repeat his desired behavior. Behaviorism does not consider cognition to be a part of the conditioning process beyond the formation of the connection between behavior and reward. If this is the case, there could be no generalization of Penny being rewarded for "nice" behavior. She would not connect that offering to take dishes and leaving the room to take a call are connected. She would only make the connection between taking Sheldon's dishes and getting chocolate and leaving the room to take a call and getting chocolate. In order to strengthen these connections, she would need to repeat these specific actions and receive chocolate each time.
Similarly, if Sheldon were to actual train Penny to lower her voice consistently, he would want to use a variable ratio reinforcement pattern in which he would reward Penny each time her voice would lower a number of octaves, with each reward coming only once her voice would lower past the point at which he previously rewarded her.
Of course, setting up an experiment to the specifications required would not lend itself well to comedy due to the high level of repetition needed. Luckily, the writers on The Big Bang Theory are intelligent enough to realize this. But in the end, they highlight the goals of operant conditioning effectively and humorously. As Reynolds puts it, “operant conditioning attempts to understand behavior by gaining knowledge of the factors that modify behavior” (1-2). In the end, Sheldon is able to observe a highly motivating factor with respect to Leonard's behavior: the offer of sex.
Megan Kathleen (author) from Los Gatos, CA on September 05, 2011:
Thanks for reading this one too, d.william! The Big Bang Theory is a great show for fans of the comedy genre, but anyone with a background in science will especially appreciate the science references liberally sprinkled throughout. Let me know what you think if you ever get to it.
d.william from Somewhere in the south on September 05, 2011:
Well written and interesting. I have never seen this show, so now i will have to start watching it.
kentuckyslone on July 12, 2011:
Excellent! Very well laid out and explained.