Kagan's Articles - FREE Kagan Articles

Research & Rationale

Effect Size Reveals the Impact of Kagan Structures and Cooperative Learning

Special Article

Dr. Spencer Kagan

To cite this article: Kagan, S. Effect Size Reveals the Impact of Kagan Structures and Cooperative Learning. San Clemente, CA: Kagan Publishing. Kagan Online Magazine, Winter 2014. www.KaganOnline.com

A powerful yardstick to evaluate the effectiveness of educational innovations is effect size. Effect size is a tool for measuring the size of gains produced by a treatment; it can be translated into percentile gains. Effect size tells us where a student in the middle of the class distribution would be, had they been taught with an alternative educational innovation rather than being taught with traditional methods. Thus effect size is an important tool for educators as they evaluate the effectiveness of educational innovations.

Because effect sizes can be averaged across studies, meta-analysis has become the tool of choice for evaluating the average magnitude of change across many research studies. Any one experiment may not represent the body of research in a field. Meta-analysis reduces the chance of unrepresentative conclusions by averaging the effect sizes of many experiments, reporting an overall average effect size.

Here we: 1) Show what effect sizes and meta-analyses tell us; 2) Analyze results of meta-analyses of cooperative learning on achievement; 3) Present results of effect size studies of Kagan Structures; and 4) Overview the many ways cooperative learning and Kagan Structures boost achievement. Finally we point to some directions for future research.

What do Effect Sizes and Meta-Analyses Tell Us?

Effect size helps us evaluate the size of the impact of an educational innovation. Effect size can be translated directly into percentile gains, a concept familiar to educators. For example, an effect size of .10 means a student in the middle of the distribution of scores in the control condition (scoring at the 50th percentile) would be at the 54th percentile had they been in the treatment condition. Obviously, from an educator’s point of view, this is not a very important gain. In contrast, an effect size of .80 means a student in the control condition scoring at the 50th percentile would be at the 79th percentile had they been in the treatment condition — an extraordinary educational accomplishment— a 29 percentile gain! Compared to being ahead of only 49% of the class, the student would be ahead of 78% of the class! Clearly, effect size is an important yardstick for us as educators.

Meta-Analyses of Cooperative Learning on Achievement

A meta-analysis reports the average effect size across a group of studies. Researchers have conducted a number of meta-analyses of the effects of cooperative learning on student academic achievement. See Table: Effect Size of Cooperative Learning on Achievement.

Effect Size of Cooperative Learning on Achievement


Number of
Effect Sizes

Effect Size

Percentile Gain

1. Cooperative Learning vs. Traditional1




2. Cooperative Learning vs. Traditional2




3. Cooperative Learning vs. Traditional3




4. Cooperative Learning vs. Traditional4




5. Cooperative Learning vs. Individual Competition5




6. Cooperative Learning vs. Heterogeneous Classes6




7. Cooperative Learning vs. Individualistic Learning7




8. Cooperative Learning vs. Competitive Learning8




9. Cooperative Learning vs. Traditional9




10. Cooperative Learning vs. Traditional10








Overall the effect size of cooperative learning on academic achievement is very substantial. In every case, cooperative learning produces greater gains than comparison methods. The average effect size across the hundreds of effect sizes is .62 for an average percentile gain of 23. That is, on average a student scoring at the 50th percentile in a traditional classroom would be scoring at the 73rd percentile had they been taught via cooperative learning! Any teacher or administrator would be quite pleased to see their students jump 23 percentiles.

Listing the string of 10 meta-analyses is somewhat misleading. Across the meta-analyses there are over 3,000 effect sizes. The meta-analyses, however, are not all independent so some appear in more than one meta-analysis. Nevertheless, the overall positive impact of cooperative learning is overwhelming. The sheer number of studies and the consistency of positive effect sizes establishes cooperative learning as one of the most well-researched and positive educational innovations of all time.

The 10 Meta-Analyses Underestimate the Power of Strong Cooperative Learning. The ten meta-analyses listed in the table underestimate the power of well-designed cooperative learning in two ways. By lumping the results of well-designed experiments and strong forms of cooperative learning, the meta-analyses underestimate the potential of cooperative learning when well implemented. Further, by lumping strong and weak experimental designs, the meta-analyses underestimate the positive results that are obtained when cooperative learning is tested properly.

Lumping Weak and Strong Forms of Cooperative Learning. By including very weak forms of cooperative learning, some of the meta-analyses underestimate the power of cooperative learning. For example, Meta-Analysis 9 included a wide range of cooperative, collaborative, and small group methods used in colleges and universities to teach math, science, and engineering related content. The “cooperative learning” in those studies varied tremendously: “Studies incorporated small-group work inside or outside of the class­ room. Small-group work refers to cooperative or collaborative learning among two to ten students.”11Those of us who work in the field of cooperative learning would hardly call groups as large as ten students working unsupervised outside of class true cooperative learning. It is no wonder that Meta-analysis 9 had an effect size at the low end of the range. Similarly, Meta-Analysis 10, another study with a low average effect size, included only 20 effect sizes, eighteen of which were done outside the United States. The meta-analysis was based on a wide range of “cooperative learning” methods and included some unusual measures of achievement including content knowledge of Italy’s economy.

Lumping Weak and Strong Experimental Designs. By including studies with very weak experimental designs, some of the meta-analyses underestimate the power of cooperative learning. High-quality meta-analyses include only studies that have experimental designs that include: 1) Random assignment of subjects to conditions; 2) Clear definitions of control groups; 3) Control for experimenter and teacher effects, and 4) Verification of treatment implementation. Categorizing meta-analyses on cooperative learning into high, medium, and low quality based on how many of those criteria are met reveals that high quality studies show higher effect sizes: High Quality = .86; Medium Quality = .56; Low Quality = .4912 By including low quality experimental designs some of the meta-analyses further underestimate the power of cooperative learning.

Explaining the Wide Variance in Effect Sizes. The effect sizes reported across the ten meta-analyses range from .41 to .78. That such a very wide range of cooperative and collaborative learning methods are lumped into a single meta-analysis and that a single meta-analysis includes both low quality and high quality experimental designs probably explains the high variance across the meta-analyses. It should be noted, however, that even those meta-analyses with the lowest effect size are showing very substantial effect sizes, testimonial to the power of cooperative learning. For example, an effect size of .41 translates to a 16 percentile gain!

Meta-Analysis Masks Important Differences. This difference among definitions of cooperative learning across studies within a single meta-analysis raises very important questions regarding the usefulness of broad meta-analyses as decision-making tools for administrators. Even well established cooperative learning methods differ tremendously. By lumping them all under the label of “cooperative learning” and then reporting an average effect size across these very different methodologies is analogous to finding the average weight of a piece of fruit in a basket that contains grapes, oranges, and grapefruits. Although we can find the average weight of a fruit in that basket, it does not tell us anything about grapes, oranges or grapefruits. Meta-analysis paints with a very broad brush — to really understand cooperative learning we need to be far more analytic.

This problem of lumping very different methods under the title of “cooperative learning” is best understood by example. Let’s contrast two well-established cooperative learning methods: TGT and RallyCoach. Both methods are designed to foster mastery of content. Let’s imagine further that we test their effectiveness for students working to master a new math algorithm, say long division.

TGT. In Teams-Games-Tournaments, following direct instruction, students work together in their teams of four to master the new skill. They are motivated to do well because each will go off to a tournament to attempt to bring back points for their team. Tournaments are groups of three of similar ability level. The winner (the one who solves the most problems correctly) brings back 6 points for their team, the loser brings back 2 points, and the third student brings back 4 points. Individual points are summed to produce team points, which are posted. Winning teams are celebrated. The premise is that the points and the between-team competition motivate students so they will work together in their teams to master the content. Notice, however, there is no instruction on how to work together in teams; it is assumed that motivated students will cooperate to make sure everyone masters the skill. Ability for peers to teach each other is assumed, as is the engagement of all students in the team practice.

RallyCoach. In contrast to TGT, RallyCoach places a very heavy emphasis on how to work together, carefully structuring the student interaction, teaching students how to coach each other and how to give each other supportive feedback. Students work in pairs rather than as a team; one student solves the first problem while the other praises, and, if necessary, coaches. By having students work in pairs, RallyCoach doubles the overt active participation (in the same amount of time each student solves twice as many problems working in pairs compared to working in teams of four). The structure ensures that every student gets feedback after solving every problem they do. In contrast, in TGT there is no structure ensuring each student solves problems and gets feedback. Because in RallyCoach one student is solving a problem while the other is coaching, all students are on task and actively engaged at all times. In contrast, in TGT some students can allow their minds to wander while others take over. In RallyCoach, students are taught how to offer supportive coaching and praise whereas in TGT there is no instruction on how to support teammates. In RallyCaoch, there are no points or between-team competition. The premise is that students are motivated by success and by peer encouragement and frequent, immediate individual feedback.

I am emphasizing the differences between TGT and RallyCoach to point out how when we lump them together we are lumping apples and oranges. The cooperative learning meta-analyses do not tell us which approach is better, let alone which component of each package is the cause of the gains each produce. TGT and RallyCoach are very different in many ways. If we do a meta-analysis that includes a basket of studies including TGT, RallyCoach, and yet other cooperative learning methods, we can calculate an average effect size, but unless it includes a separate internal analysis reporting effect sizes for each methodology, it tells us nothing about which cooperative learning approach to use. It also tells us nothing about whether the reward structure emphasized in TGT or the task structure emphasized in RallyCoach is a more powerful predictor of achievement. A broad cooperative learning meta-analysis that lumps many different cooperative learning methodologies tells us it is beneficial to enter the cooperative learning ballpark, but does not tell us how best to play the game.

Effect Size of Kagan Structures

Rather than lumping all forms of cooperative learning into one meta-analysis and calculating the average effect size across studies, we can be more discriminating. We can look at average effect size across studies that use the same approach to cooperative learning. The work of a research team at State University of New York (SUNY) - Fredonia allows us to examine the average effect size across experiments using Kagan Structures.

The SUNY-Fredonia research team published a series of four tightly controlled, independent, peer-reviewed research studies on Kagan Structures.13 The experiments examined the effectiveness of Kagan Structures at different grade-levels (3rd through 8th); with different content (science, language arts, social studies); with different student populations (high achieving, low achieving, students with disabilities); and with different instructional strategies (Numbered Heads Together; Numbered Heads Together + I; and Show Me, a structure using Response Cards).

The studies tested the effects of two Kagan Structures and a variation on one:
Numbered Heads Together. Numbered Heads Together (NHT) has been described in detail. The form of NHT tested has students work in teams of four, each student with a number, from one to four. The teacher asks a question, students work as a team to formulate their best answer (heads together); the teacher calls a number, and then calls on one of the students with that number to give the answer.
Numbered Heads Together + I. Numbered Heads Together + I (NHT+I) adds an incentive package to the basic NHT methodology. That is, teams were awarded points for providing and agreeing with correct responses and earned “Super Team,” “Great Team,” or “Good Team” certificates if their quiz scores averaged above predetermined percentages. Teammates signed the certificates, which were posted publicly within the classroom.
Response Cards or Show Me. Response cards (RC) take several forms, most often student dry-erase boards are used. Show Me (SM) is a five-step Kagan Structure: 1) Teacher asks a question; 2) Teacher provides think time; 3) Teacher signals for students to write (or in the case of pre-made response cards, to select a response); 4) Teacher calls “Show Me” and 5) Students simultaneously display their responses using a response board or display cards.14 Show Me is a Kagan structure focused on increasing active engagement; it does not involve cooperative learning as students work alone. The use of response cards does result in increased active participation and student achievement.15

All four studies found higher achievement gains using NHT, NHT+I, and SM using RCs compared to traditional Whole Class Question & Answer (WCQ&A). The results of the four studies are reported in the table, Effect of Kagan Structures on Achievement.

Effect of Kagan Structures on Achievement


Effect Size

Percentile Gain

1. Numbered Heads vs. Whole Class Question & Answer16



2. Numbered Heads + I vs. Whole Class Question & Answer17



3. Numbered Heads  vs. Whole Class Question & Answer18



4. Numbered Heads + I vs. Whole Class Question & Answer19



5. Response Cards vs. Whole Class Question & Answer20



6. Numbered Heads vs. Whole Class Question & Answer21



7. Numbered Heads vs. Whole Class Question & Answer22






Across the four SUNY studies, the average positive effect size for Kagan Structures was .92, an average gain from 50th to 82nd percentile. The size of this effect size is consistent with the average effect size found in high quality cooperative learning studies. It is higher than the effect size in meta-analyses that include a basket of cooperative learning methods. This is logical: The average effect size of more effective methods will be higher than the effect size that averages more and less effective methods together. Weaker methods bring down the average when all effect sizes are bundled together.

The higher scores for students instructed via Kagan Structures is reflected in the percent of students demonstrating mastery of the content. For example, one study23 examined test performance of students using WCQ&A vs. NHT. Tests covered understanding of physical, chemical, and biological properties of substances and organisms. Content was new to students; pretest class average was 18.6%. Scores using WCQ&A found 22% of the students scoring grades of 90% or above on ten-item daily quizzes. When NHT was used, the percent of students scoring 90% or above (40%) almost doubled!

Additional support for the power of Kagan Structures was an effect size calculated on Time on Task.24 Students using the Kagan Structures were off task substantially less than those instructed with the traditional WCQ&A, Effect Size = .75.

Pupils overwhelmingly preferred NHT and NHT+ I compared to WCQ&A. In a class that experienced all three structures, when given a hypothetical dollar to assess their satisfaction with each instructional strategy, not every student spent their full dollar, but as a class students spent a total of $.79 on WCQ&A; $5.89 on NHT; and $12.82 on NHT+I. Ninety-one percent of the students preferred NHT or NHT+I to WCQ&A.