Continuous Reinforcement Occurs When a Response is Reinforced Only Some of the Time
Methods in Behavioral Pharmacology
Frans Van Haaren , in Techniques in the Behavioral and Neural Sciences, 1993
4.1.1 Response-based schedules
A continuous reinforcement (CRF) schedule is the most straightforward response-based schedule of reinforcement to which a subject can be exposed during an experimental session. For example, each and every response (e.g key peck, lever press) emitted by a food-deprived organism (usually a pigeon, rat or monkey) in a standard, sound-attenuating operant chamber (Ator, 1991) is followed by the presentation of food (e.g. 3–4 s access to mixed pigeon grain, presentation of a 45-mg (rat) or 90-mg (monkey) food pellet). Experimental sessions usually last until a fixed number of reinforcers has been obtained or until a certain fixed period of time has elapsed. The latter is important to be included in experiments in behavioral pharmacology as drug administration may sometimes greatly reduce or even eliminate responding for a prolonged period of time.
Other response-based schedules of reinforcement require that the subject emits a fixed or variable number of responses for reinforcement to be presented. A fixed number of responses is to be completed on a fixed-ratio (FR) schedule, while a variable number of responses is required on a variable-ratio (VR) schedule. The different response requirements to constitute a VR schedule can be adapted from Fleshler and Hoffman's (1962) equation otherwise used to generate constant probability variable-interval schedules. When behavior is maintained by a random ratio (RR) schedule, each response activates a probability gate which determines whether or not that specific response will be followed by reinforcement presentation (Schoenfeld and Cole, 1972). Responding is said to be maintained on a progressive ratio (PR) schedule when the ratio requirement is systematically increased by a fixed (or variable) number of responses after every reinforcer (Hodos, 1961; van Hest et al., 1988b). The schedule designations FR 30, VR 30 or RR 30 indicate that reinforcement presentation will occur following the completion of the thirtieth response on the FR schedule and following a variable number of responses, averaging 30, on the VR and RR schedules. FR, VR and RR schedules can all generate 'ratio strain' when the ratios become too large to maintain consistent responding.
Response rate (defined as the number of responses per unit time not including the time involved in presentation of the reinforcer) is the most frequently reported dependent variable in the experimental analysis of behavior and behavioral pharmacology. In the absence of drug administration VR and RR schedules maintain relatively high response rates with very little pausing. Response rates decrease as the average ratio requirement is increased (Schoenfeld and Cole, 1972; Blakely and Schlinger, 1988). The behavior maintained by FR schedules is characterized by a pause after the presentation of the reinforcer (post-reinforcement pause) and before the start of the next ratio run (break and run pattern, Felton and Lyon, 1966). Changes in the post-reinforcement pause (the time elapsed between the end of reinforcement presentation and the next response) and running rate, (the number of responses per unit time calculated to include total session time but not the post-reinforcement pause and the time involved in the presentation of the reinforcer) are two dependent variables frequently analyzed in behavioral pharmacology after drug administration on schedules with fixed response requirements.
The major dependent variable of interest on the PR schedule is the final ratio completed before the subject ceases to respond for a pre-determined period of time (break-point). It has been suggested that the break-point may reflect motivational variables as it varies systematically with increases in food deprivation and the volume and concentration of a liquid reinforcer (Hodos, 1961; Hodos and Kalman, 1963). In behavioral pharmacology the analysis of break-points on PR schedules has been used to assess the reinforcing value of self-administered substances such as cocaine (Loh and Roberts, 1990).
Read full chapter
URL:
https://www.sciencedirect.com/science/article/pii/B9780444814449500092
When Representation Becomes Reality: Interactive Digital Media and Symbolic Development
Georgene L. Troseth , ... Zachary D. Stuckelman , in Advances in Child Development and Behavior, 2019
4.1.3 Conjugate reinforcement
The third aspect of responsiveness, conjugate reinforcement, involves a match between the timing and magnitude of triggering behaviors and responses, offering continuous reinforcement that sustains the behavior ( Rovee & Rovee, 1969). For instance, if a mailbox in a Mercer Mayer touchscreen book pops open every time a child touches it on the screen, that pairing is conjugate; the volume of touches directly relates to the volume of reinforcing responses.
These three characteristics of responsiveness from people and screens are important components of operant conditioning. Both temporal contiguity and probabilistic contingency affect whether or not humans of all ages will learn an association between a self-produced action and some outcome, and conjugate reinforcement motivates repetition of the rewarded behavior.
Read full chapter
URL:
https://www.sciencedirect.com/science/article/pii/S0065240718300363
Concepts and Principles
Lisa N. Britton , Matthew J. Cicoria , in Remote Fieldwork Supervision for BCBA® Trainees, 2019
Schedules of Reinforcement
Schedules of reinforcement are very challenging for many trainees to grasp. Your instruction in this area should include the following topics:
- •
-
Continuous reinforcement
- •
-
Intermittent schedules of reinforcement
- •
-
Fixed ratio schedule
- •
-
Variable ratio schedule
- •
-
Fixed interval schedule
- •
-
Variable interval schedule
- •
-
Compound schedules
- •
-
Concurrent schedule
- •
-
Multiple schedule
- •
-
Chained schedule
- •
-
Mixed schedule
- •
-
Tandem schedule
- •
-
Alternative schedule
- •
-
Conjunctive schedule (Cooper et al., 2007, pp. 305–320)
Rehearsal and Performance Feedback
Provide examples of various schedules and have the trainees determine which schedule is in effect with each of the examples. Provide feedback regarding which answer is correct and why. Give the trainees feedback on their score individually. Continue providing instruction, assigning readings, and presenting more examples until the trainees achieve the previously established criterion with this activity. Appendix E includes examples for you to use within your instruction.
Ethics Related to Schedules of Reinforcement
Emphasize with your trainees that we have an ethical obligation to thin schedules of reinforcement to the natural reinforcers available in the environment. It is also critical for us to refrain from using reinforcers that may be harmful for our clients even when they may be effective (Bailey & Burch, 2016, p. 135).
Read full chapter
URL:
https://www.sciencedirect.com/science/article/pii/B9780128159149000043
Treatment Components
Jonathan Tarbox , Taira Lanagan Bermudez , in Treating Feeding Challenges in Autism, 2017
4.1.2 Reinforcement Schedules
The feeding intervention should clearly specify when and how much reinforcement should be provided, referred to as the "schedule of reinforcement." Continuous reinforcement schedules provide reinforcement following every instance of the target behavior. For example, the client receives one bite of preferred food following each bite of nonpreferred he accepts. Continuous reinforcement is also referred to as a Fixed Ratio 1 schedule of reinforcement. In contrast, intermittent schedules of reinforcement specify how only some of the responses will result in a reinforcer. Intermittent reinforcement can be delivered after the client eats a fixed number of bites (e.g., the client receives reinforcement after every 5 bites he accepts) or a varied schedule of reinforcement, delivered after a varied number of responses (e.g., the client receives reinforcement after approximately 5 bites, ranging from 3 to 7).
To maximize effectiveness at the beginning of the treatment, use a denser schedule of reinforcement, so that varied and flexible eating is richly reinforced. After the client's eating has reliably improved, consider changing to intermittent reinforcement. Gradually decreasing the frequency of reinforcement is referred to as "schedule thinning," described in Chapter 6, Common Treatment Packages and Chapter 8, Caregiver Training and Follow-Up.
Read full chapter
URL:
https://www.sciencedirect.com/science/article/pii/B9780128135631000045
Clinical Applications of Principle 2
Warren W. Tryon , in Cognitive Neuroscience and Psychotherapy, 2014
Reinforcement Schedules
There are many ways to present, schedule, reinforcers. One way is to use counters. Reinforcing every behavior that occurs is called continuous reinforcement and is equivalent to setting the counter to 1. Reinforcement could be delivered after 2, 3, or more instances of the target behavior. Fixing the counter number at some such value results in a fixed ratio schedule. Letting the counter number vary around a specified mean and standard deviation results in a variable ratio schedule. Reinforcement can be scheduled with timers. Reinforcing the first instance of the target behavior after, say, a 60-second delay constitutes a fixed interval 60-s schedule. Letting the mean, and possibly standard deviation, of the intervals vary constitutes a variable interval schedule. Reinforcement can be scheduled with speedometers. These schedules are called differential reinforcement of low rate or of high rate. The same or different schedules can pertain to the two or more available responses, as is the case in the concurrent operant case. It might not be intuitively obvious that each of these reinforcement schedules causes different response patterns, but the experimental analysis of behavior has revealed otherwise. Ferster and Skinner (1957) published an entire textbook on the effects that reinforcement schedules have. The matching law shows that subjects match their behavior to the probability of receiving reward. 5
Read full chapter
URL:
https://www.sciencedirect.com/science/article/pii/B9780124200715000107
Conditioning
R.L. Port , T.L. Finamore , in Encyclopedia of Gerontology (Second Edition), 2007
Studies in Aging Animals
The effects of age on the acquisition of operant conditioned behavior have been delineated and are strikingly similar to effects seen in classical conditioning studies. Using a continuous reinforcement schedule, we have found that aged rats are markedly impaired in the initial acquisition of appetitively reinforced bar press responses. The magnitude of impairment is similar as well in that aged groups took three or four times as many training days to reach criteria in comparison to the younger group. The impairment appears to reflect an associative learning deficit, in that initial activity levels were equivalent between groups (precluding the potential that younger animals were more active behaviorally and thus produced spontaneous reinforced behaviors earlier in training than the aged subjects). Further, once an association had been formed (meeting a criterion of more than 60 responses in a 20 min training session), response rates were equivalent between groups and precluded the possibility that a lower 'ceiling effect' or maximal response rate in aged subjects had an influence on acquisition performance.
Analysis of the extinction of operant responses in aged rats has revealed no significant difference in comparison to younger subjects. Older subjects were equally resistant to initial extinction via non-reinforced training sessions. Thus, once a behavior was acquired, older animals were unimpaired in maintaining the association. When animals were retrained in the same task, older animals reacquired the response as expediently as the younger animals did. Minimal retraining was required to reintroduce the behavior in both aged and young groups. Cycles of reacquisition and extinction training revealed the expected increased resistance to extinction in both groups as a consequence of prior learning experience. Consequently, the effects of aging on simple appetitive operant conditioning appear to be selectively restricted to the initial learning of the association, and no significant differences are apparent in extinction or relearning of the task.
A differential reinforcement of low rate (DRL) schedule is a more complex appetitive conditioning task wherein subjects must inhibit responding for a period of time to reactivate reinforcement. Commonly, animals are required to inhibit the response for a period of 6 consecutive seconds prior to reinstitution of the reward. Aged animals show a marked impairment in the initial acquisition that is similar to that seen in simple appetitive conditioning. Older subjects experienced significant difficulty in inhibiting responses during the interval. As with many other tasks, once the task was learned, the performance of aged animals was equivalent to younger subjects, and aged subjects were not impaired in the retention of behaviors learned in a DRL schedule.
Aversive conditioning involves the application of punishment to alter the probability of a behavior recurring in the future. In active avoidance conditioning, the subject receives a mild electric shock if they fail to perform the targeted behavior (for example, move to a different location). In passive avoidance, animals are trained to inhibit their natural tendency to explore the environment by punishment for failure to remain at one location in the training chamber. Studies of both active and passive avoidance training revealed a rather marked impairment in aged animals in the initial acquisition phase of training. Once an association is well established, aged animals typically perform as well as younger subjects.
Read full chapter
URL:
https://www.sciencedirect.com/science/article/pii/B0123708702000391
Skill Acquisition
Jonathan Tarbox , Courtney Tarbox , in Training Manual for Behavior Technicians Working with Individuals with Autism, 2017
5.3.3 Continuous and Intermittent Schedules of Reinforcement
The timing and frequency with which you give reinforcement is called the schedule of reinforcement. There are two basic schedules of reinforcement: continuous and intermittent. With continuous reinforcement , a particular behavior results in a particular reinforcer every time the behavior occurs. Intermittent reinforcement schedules are schedules in which a particular behavior produces a particular consequence, but not every time the behavior occurs. It is commonly believed that intermittent schedules of reinforcement lead to strong behavior maintenance. This point is important to consider when approaching desired as well as undesired behavior; we want to be sure we continue to reinforce desired behavior on a thinned schedule, but we want to be careful of intermittently reinforcing undesired behavior.
Continuous Reinforcement: "Every time I do this, I get what I am after!"
| |
|
There are four types of intermittent schedules of reinforcement: fixed ratio, variable ratio, fixed interval, and variable interval. The schedule determines which occurrences of the target response will be followed by reinforcement. Ratio schedules of reinforcement specify how many occurrences of a target response are required before reinforcement is delivered. A fixed ratio (FR) schedule of reinforcement is when a reinforcer is delivered after a set amount of target responses. For example, after every five correct responses during DTT, the learner earns a 1-minute break (FR5). A variable ratio (VR) schedule of reinforcement is when a reinforcer is delivered after an average number of occurrences of the target response. For example, after an average of five correct responses in DTT, the learner earns a 1-minute break (VR5). Sometimes the learner may only need to complete three correct responses, sometimes four, sometimes six, and sometimes seven, but the overall average number required to earn a break is five.
Intermittent Reinforcement: "Sometimes this works to get me what I want, I'll give it a try!"
| |
|
Schedules of reinforcement that specify the amount of time that must pass since the last reinforcer was given, before the behavior can be reinforced again are called interval schedules. A fixed interval (FI) schedule of reinforcement is when a behavior is reinforced after an established "fixed" amount of time since the last reinforcer was given. For example, a child may need to wait at least 5 minutes after her last break before she can ask for a break again (FI5). If she asks for a break earlier than that, there is no penalty, but she simply does not get the break. In other words, the first time she asks for a break after 5 minutes have elapsed since her last break, she gets a break again. A variable interval (VI) schedule of reinforcement is when the first occurrence of a target response is reinforced after an average amount of time. For example, the number of minutes that the learner would have to wait since her last break might change each time, from 3 to 7 minutes, but on average 5 minutes overall (VI5).
Example of Schedules of Reinforcement in Action
Dinnertime
- •
-
Fixed ratio
- •
-
Continuous FR1=every bite of broccoli results in bite of French fry
- •
-
FR2=every 2 bites of broccoli results in bite of French fry
- •
-
Variable ratio
- •
-
VR3=On average, every 3 bites of broccoli results in bite of French fry
Independent Play
- •
-
Fixed interval
- •
-
FI 5 minute=After playing independently for at least 5 minutes, mommy will come play with Sally if Sally asks her
- •
-
Variable interval
- •
-
VI 5 minute=After playing independently for between 3 and 7 minutes (5 minutes on average), mommy will come play with Sally if Sally asks her
Read full chapter
URL:
https://www.sciencedirect.com/science/article/pii/B9780128094082000052
Experimental Analysis of Behavior, Part 2
Iver H. Iversen , in Techniques in the Behavioral and Neural Sciences, 1991
5.4.2 Response sequences
Response sequences have been analyzed in many different procedures. An impressive early work analyzed sequences of lever pressing and approach to the food tray in rats responding on a continuous reinforcement schedule ( Frick and Miller, 1951). Using automated recording of each response, clear patterns of sequential dependencies emerged in the data. Frick and Miller stated that "taking sequential dependencies into account materially decreased uncertainty regarding the individual animal's behavior" (p. 24). Sequential analyses reveal some degree of pattern or order in the recorded behavior that enables predictions of behavior that surpass those based on response rate alone. A more recent example will illustrate a typical multi-behavioral analysis method. Hann and Roberts (1984) studied lever press shock avoidance in rats (see Part 1, Ch. 5). Rats were videotaped and tapes were analyzed by different observers. Fig. 17 shows the sequential pattern of responses between consecutive lever presses for one rat. The calculation of transition probabilities is basically simple. If 100 lever presses occur and 40 are followed by jumping and 60 by turning, then the transition probabilities from lever pressing to these two responses would be 0.4 and 0.6, respectively. The rat chosen for display was avoiding well with a lever pressing rate of 10.3 resp/min and a shock rate of 0.21 shocks/min. The most typical sequence between successive lever presses was prone position/upward movement/turning/jumping (P-U-T-J). Even though prone position was most likely to be followed by upward movement (0.98), prone position was nonetheless the only response that reliably preceded shock. This observation questions the notion that lever pressing occurs because it is the only response not followed by shock. In fact, prone position was the response most likely to follow a lever press. Different compositions were obtained for the remaining 5 rats, but all rats developed reliable sequences rather than random patterns. Hann and Roberts suggested that avoidance responding should be viewed within a context of sequences of activities or response chains. Analyses of response chains may be critical for an understanding of why some rats are poor at avoidance tasks. In addition, how drugs and toxins affect avoidance behavior may be elucidated further through response-chain analyses.
Read full chapter
URL:
https://www.sciencedirect.com/science/article/pii/B9780444812513500138
Methods in Behavioral Pharmacology
Michael J. Lewis , in Techniques in the Behavioral and Neural Sciences, 1993
5.1 Two-lever autotitration methods
One of the first methods to measure brain stimulation reward threshold was proposed by Stein and Ray (1960). The method employs two levers, one upon which the animal self-stimulates on a CRF schedule. Current intensities are initially above threshold; however, after a set number of response intensities decrease. Eventually stimulation declines with responding on this lever to the point that it is no longer reinforcing. An animal then may reset the stimulation to the initial intensity by pressing the second lever. The mean current at which the animal presses the second lever is defined as the reward threshold. This is essentially a rate-free measure that has been used by Stein (1962) to investigate the effects of amphetamine, which lowered threshold and chlorpromazine, which increased threshold. More recently, others (Schaefer and Holtzman, 1979; Nazzaro et al., 1981; Seeger et al., 1981; Neil et al., 1982) have employed the autotitration method with minor modifications.
Also, modification of the method to vary frequency of stimulation rather than intensity has been used to investigate a wide variety of drugs (Zarevic and Setler, 1979). The procedure has considerable advantage over simple response rate and has been shown to produce stable and reliable threshold measurements. Another advantage is that performance at the highest current intensities provides response rate measures that may be used for comparison with other experiments using simple response rates.
Autotitration methods have been criticized, however, on several theoretical and practical grounds. One criticism is that a descending series of stimulation magnitude may cause a propensity to continue to self-stimulate at values below threshold. Moreover, the serial order of presentation of stimulation may produce anticipation of reward decrement and thus may create the possibility that an animal will learn to reset the stimulation to a 'preferred' level rather than the actual threshold. Another problem is that, when stimulation is increased at the reset point to a higher level than the original reset intensity, reward threshold increases rather than remaining constant (Foureziezos and Nawiesniak, 1982). Modification of the procedure to a one-lever procedure with an automatic program reset after a long interval of time (during which responding has usually ceased) seems to have eliminated the problem of choice with this procedure (Fouriezos and Nawiesniak, 1982); however, further exploration of these problems and the validation of resultant modifications of the procedures is necessary.
Read full chapter
URL:
https://www.sciencedirect.com/science/article/pii/B9780444814449500201
Psychological Theories that have Contributed to the Development of Occupational Therapy Practice
Moses N. Ikiugu PhD, OTR/L , in Psychosocial Conceptual Practice Models in Occupational Therapy, 2007
Schedules of Reinforcement
Very soon, as Skinner continued experimenting, he discovered that certain variations of how reinforcement was presented were more effective than others in ensuring that the rat repeated the desired behavior. He called these variations schedules of reinforcements. In continuous reinforcement, every time the rat pressed the pedal, food was released into the box. However, when food was not made available, the pedal-pressing behavior went into extinction very quickly.
In the fixed ratio schedule, food was released after a certain number of pedal pressings (e.g., every second, third, fourth time). Very soon the rat learned to press the pedal the required number of times in order to obtain food. However, like the continuous schedule, when food was not released at the scheduled occurrence of the behavior, after a while the pedal-pressing behavior disappeared.
In the variable schedule, Skinner varied the number of times that the rat needed to press the pedal before food was released. First the food would be released after pressing the pedal two times, another time it would be after five times, another time after three times, and so on. He found that when food was withdrawn, the pedal-pressing behavior persisted longest when this type of schedule was used. He concluded that the variable schedule of reinforcement was the most effective when the goal was to make learned behavior permanent. This is the mechanism of addiction to gambling. 10 People continue to gamble because they do not know when they might win. It may be the next hand in a game of poker, or the next trial in a roulette table.
Read full chapter
URL:
https://www.sciencedirect.com/science/article/pii/B9780323041829500076
Source: https://www.sciencedirect.com/topics/psychology/continuous-reinforcement
0 Response to "Continuous Reinforcement Occurs When a Response is Reinforced Only Some of the Time"
Post a Comment