ANOVA - Correlated

for

1 Independent Variable

Click above to start an Interactive Visual Presentation (Plugin Required)
Click here to go to our plugin download and plugin tutorial page


Copyright 2000, Tom Malloy. All rights reserved.

Go To Sampling Distribution of FGo To Experimental SituationGo To Statistical HypothesesGo To Stat Conclusion Validity

Let's start with the Experimental Situation which is appropriate for this ANOVA.

Experimental Situation

The research contexts where you would want to use a one-way analysis of variance for correlated measures are the same as those contexts where you might use a t-test for correlated means. We have already discussed and worked with the t for correlated means so this should be a straightforward generalization.

One of the advantages of ANOVA for correlated means over a t-test for correlated means is that a researcher can have more than two experimental conditions. The t-test for correlated means can only analyze two conditions.

One-to-one. As we discussed with the t for correlated means lecture, there are several research procedures which would produce correlations or one-to-one correspondences between the scores in different conditions.

Repeated Measures. One procedure that gives us a one-to-one correspondence between numbers in different columns of the data matrix is to have repeated measures on each research participant so that there's just one group of participants and you measure them at several points in time. Within reason you can measure them however many times that you want, so that every participant is measured multiple times. This sort of design is called a repeated measures design.

Matched Groups. Another way to get a one-to-one correspondence or correlation between columns of scores is to have matched groups. For instance, in the experiment we talked about in the t correlated lecture, researchers were interested in teaching reading to first graders in Salt Lake City. Suppose there are three new reading programs developed in three different regions of the the U.S. These three reading programs are based on differing regional beliefs about how best to teach reading. The researchers run a four-group study. Group 1 is a control group based on the current first grade curriculum that's already established in Salt Lake City schools. Group 2 is a special west-coast enriched reading program. Group 3 is an East Coast enriched reading program. The final and fourth group is of a Midwest training program. So the researchers have four different treatments that they want to evaluate.

Next the researchers do an experiment. The four programs are going to be taught to different groups of children so that the research team can determine which would be the best program to implement next year in their local school district. One way of evaluating the results would be to use a one way ANOVA for independent groups in which case they would just randomly assign kids to the different groups. There is a problem with this strategy however. Reading scores in the first grade tend to be extremely variable, some kids come in being quite accomplished readers and other kids haven't even started on the project. There's a vast difference between children on reading achievement going into the first grade. A way to deal with the problem of a lot of initial variability among the participants is to match the participants in every group on their initial reading ability.

The researchers could give some kind of evaluation of reading comprehension prior to the beginning of the study. Based on these reading comprehension scores, they would list everyone who is going to be in the study from the best reader down to the least accomplished reader. Next, since the design calls for four groups, the researchers would take each of the top four scores and randomly assign them to one of the four groups. So the top four readers would be randomly divided among the four treatment groups.

Then the researchers would take the next four scores and randomly assign the students with those scores to the four groups. And so on... This kind of procedure helps to ensure that these four groups are carefully matched with each other. Each group has one of the top four readers, each group has one of the next four readers, and so on down to the least accomplished readers. In short, these groups are carefully matched with one another.

Matching is a common researcher procedure. It creates a correlation between the scores in the various groups.

We have now generalized the example in the t-correlated lecture to four groups from two groups.

Your IV is Type of Program and it's levels are Old, West Coast, East Coast, and Midwest. Your DV is reading performance on a standardized reading test. The four programs are going to be taught to different groups of children so that the research team can determine which would be the best program to implement in their local school district.

CONTRAST: Independent Groups? Let's contrast what we just did with an independent groups design. One way of conducting the research would have been to assign volunteer children to the 4 different groups randomly without matching them. Then the researchers would evaluate the results by using a one way ANOVA for independent groups. That would be a fine way to do the study. But there is a problem with that strategy. Reading scores in the first grade tend to be extremely variable, some kids come in being quite accomplished readers and other kids haven't even started on the project. There's a vast difference between children on reading achievement going into the first grade. A common research strategy for addressing this problem high variability in the DV is to match the participants in the groups.

 

Go To Menu Map

Therapy Study

Here is another example of a study with correlated scores. The independent variable is the Time Course of a new kind of psychotherapy called "Therapia Nueva" or simply TN. The volunteers receive this new kind of psychotherapy (TN) and are measured for mental health at various different times.

IV and DV. The IV is the Time Course of the effects of Therapia Nueva.The researchers are interested in how the effects of psychotherapy hold up over time. Does the effect of psychotherapy dissapate quickly or does it last a long time? The participants are given a particular kind psychotherapy. Their Mental Health (DV) is measured at various time intervals. We are going to look at how mental health changes with the passage of time.

In this example, the researchers are not interested in comparing the therapy with any other kind of treatment. Rather, they want to know how the effects of the therapy on mental health change over time. To keep the example simple, we'll assume that there are only n = 5 participants. (Normally you would want a lot more participants).

Every participant is measured four times. Prior to therapy each is given a mental health pre-test. You can see the pre-test scores of the 5 participants, varying from 30 down to 5, on the first column in the graphic.

The big red bar (right after psychotherapy) on the graphic indicates when in time the treatment (psychotherapy) was given.

Immediately after psychotherapy the researchers post-test every participant. Notice that their mental health scores go up from pre-test to post-test. Now their scores vary between 90 and 50.

The researchers also want to know how permanent the effect of psychotherapy is. Is this improvement in mental health just something that happens for a week or so, or does it last longer than that? So the research team will do a follow-up measurement on all of the participants at six months. Notice from the data that it appears like the participants lost a little bit of the therapeutic effect, but it is still better than it was at pretest.

The research team does the final follow-up measurement at two years. At two years, (just eyeballing the data) it appears that perhaps participants lost a little bit more of therapeutic effect, but they still have fairly substantial gains over the pretest.

The research team does the final follow-up measurement at two years. At two years, (just eyeballing the data) it appears that perhaps participants lost a little bit more of therapeutic effect, but they still have fairly substantial gains over the pretest.

Go To Menu Map
 

Correlated Scores

This psychotherapy example is one in which the researchers made repeated measurements on each participant. Every participant is measured four different times: at pretest, at post-test, at six months, and then finally at two years. So each participant has four scores on the dependent variable.

Because people are consistent across time, we would expect a correlation between pre-test and post-test scores. Notice that the people who have the highest scores at pre-test tend to have the highest scores at post-test, and visa versa. To generalize, we would expect to find a correlation between scores in any two of the four columns (pre-test, post-test, 6 months, and 2 years). The way we have conducted our research has created correlations among the columns of our DV measures. Thus, we have to run a data analysis that takes this correlation into account. We need an ANOVA for correlated DV measures.

Go To Menu Map

A Rose by any other Name

Sometimes repeated measurements designs are called within-subjects design because the levels of the independent variable occur within the participants (subjects). The time course of psychotherapy is something that occurs within the life-experience of every participant. We measure the effects of the time course of psychotherapy within each participant.

The terms within-subjects and repeated measures are synonyms , changing from stat book to stat book. Different people tend to use different names for this sort of research design.

In fact, there are a lot of different names for this kind of analysis of variance: repeated measures ANOVA, within-subjects ANOVA. Some people call it treatment by subjects or t by s ANOVA, and other people call it dependent groups ANOVA. You'll see all of those terms in the research literature. I mention all these synonyms so that if you need to use this research design or talk about it in some kind of conversation, you will know that you can generalize from the terminology in this lecture.

 

NOTE: In StatCenter's Virtual Lab, there is a switch on the dependent variable tool. On the bottom left corner of the tool's window it says either independent means or correlated means. So read the story problems carefully; so of them require you to use a design with correlated measurements and other story problems require you to use a design with indpeendent groups.

You may have to flip the correlated versus independent mean switch depending on how the Virtual Lab story problem askes you to design your study. The switch defaults to independent. So if you if you want correlated groups like the ones we are discussing here, then you must flip that switch and indicate that your groups of scores are correlated.

Go To Menu Map

Statistical Hypotheses

In any study there will be more than one, usually several, groups. The index for groups is j. In our reading program example there are 4 groups so j can vary from 1 to 4.

The statistical hypotheses are the same as the ones we used for ANOVA independent. If we are skeptical that the IV will have an effect on the DV, we write the null hypothesis. H0 assumes that there is no treatment effect for any level of the independent variable. Alpha-j = 0 for every value of j.

In contrast, if we're the scientist and writing the alternative hypothesis, we expect that there are some treatment effects somewhere in this study. Alpha-j is not equal to 0, for at least some values of j.

So, returning to the teaching reading to first graders matched group example which we started this lecture with, the skeptic says, "None of these reading programs work. Whatever program is currently in your curriculum in Salt Lake City is going to be just the same as the West Coast program, and that'll be the same as the East Coast program, and that'll be the same as the Midwest program. There won't be any treatment effects." On the other hand, the people who make up these programs will say "Oh no, mine is really good. Mine will have an effect." They all would believe that somewhere in the four groups, one of these programs, and maybe all of them, will have some effect.

Go To Menu Map
Statistical Conclusion Validity

Let's discuss statistical conclusion validity. In other words, let's look at the plausibility of chance as a way of explaining the data. If the data pattern fit with the scientific hypothesis, is it plausible to argue that the data pattern occurred by chance alone?

 

No Calculations. You are not going to calculate an ANOVA for correlated means in this course. So we won't present formulas for you to learn nor to use. You can use StatTool to calculate the Mean Squares and the F's.

The focus in this lecture is twofold. The first focus we've already completed: Learning what kind of scientific context is appropriate for this ANOVA. The second focus we will now take up: How to understand the results of an ANOVA for correlated groups for purposes of statistical conclusion validity. You need to be able to look at the output of a computer program like Stat Tool on Stat Center or SPSS, or any other statistical package, and know what the data analysis means.

Let's look at the degrees of freedom and the summary table, because what you do need to do, rather than calculate the F for correlated means, is to be able to read the output of a computer program like Stat Tool on Stat Center or SPSS. The outputs of such programs generally are organized around the ANOVA summary table.

Go To Menu Map

ANOVA Summary Table

I am going to present two differnt ways of setting up a summary table for this kind of ANOVA design.

Source of Variance. The first column is call "Source of Variance" and it divides up the Total Variance into subjects, treatment, and treatments by subjects. Often, the treatment by subjects variance is also called error or residual. The synonyms for these terms aren't my fault. I'm not making them up to torture you . They just happen to be around, and so I want you to be able to generalize from whatever we say here to a different conversation.

 

Degrees of Freedom

Let's look at the degrees of freedom. This is the second column in the summary table. In the time course of psychotherapy example there were five participants and there were four measurements on each participant. So in that example j is four and n is 5.

The total degrees of freedom is j times n minus 1 written as j(n-1), where j is the number of measurements on each participant and n is the number of participants.

Four times 5 minus 1 is 19 (total degrees of freedom).

STUDENT QUESTION: "Can you increase the degrees of freedom by having more measures on your participants?" The answer is yes. You are really starting to think like a researcher.

The degrees of freedom for subjects is n minus 1 or (n-1).

And the degrees of freedom for treatments is j minus 1 or (j-1). The error term, or treatment by subjects term, is j minus 1 times n minus 1 or [(j-1)(n-1)].

Notice on the graphic, I've highlighted in brigher yellow the degrees of freedom that you actually need to look up the critical value of F. In looking up the critical value in the F table, you use treatment degrees of freedom (j minus 1) across the top of the table. You will use the error degrees of freedom to move down the table. "Error" is also called "residual" or "T by S."

In terms of our experience with a One-way ANOVA for independent groups the Treatment term here is equivalent to the between-groups term there. In terms of our experience with a One-way ANOVA for independent groups the Error term here is equivalent to the within-groups term there.

Sum of Squares column. The next column across the top of the ANOVA summary table is the sum of squares. In that column on a computer print out will appear the values of the SS that need to be calculated for this ANOVA.

Mean Squares column. The next column of a computer print out will report the means squares.

F. The next (and sometimes final) column in a print out will be the calculated value of the F ratio.

p value. Many computer programs calculate a significance level and place it in the final column. StatCenter's StatTool does not calculate a significance level; you have to look it up in the tables using the degrees of freedom.

Go To Menu Map
There are different conventions for writing out the summary table for ANOVA correlated. Depending of what computer program you're using, the output may look a little different. Now I'm going to show you another summary table that's a bit different than the previous one.
Go To Menu Map

Another Version of the ANOVA Summary Table

Here the total variance is broken into two major categories - between subjects and within subjects.

The between subjects category has just one subcategory, called subjects.

The within category has two subcategories - treatment and treatment by subjects, or error. This version is really exactly the same as the first one, except that this one analyzes the variance into intermediate categories (Between and Within) and then analyzes each of those intermediate categories so that we end up with the same sources of variance and degrees of freedom as the previous summary table had.

Go To Menu Map

Degrees of Freedom

Notice that the degrees of freedom in this version of the ANOVA summary table are the same as in the previous summary table. Only the form of the Summary Table is different.

 
When you use the StatTool in StatCenter, you'll see the first of these two versions of the Summary Table as your printout. Of course, StatTool will also printout the actual sum of squares , the mean square, and the F value. StatTool doesn't give you a p value (probability value). You have to look that up yourself.
Go To Menu Map

Rejection Regions

The rejection region logic is the same as before when we went over ANOVA for independent groups. There's nothing really new here.

Shown on the graphic is a representation of the sampling distribution of F. The expected value of F if H0 is true is somewhere in the neighborhood of one.

As usual, you need to look up the F critical based on two degrees of freedom: the degrees of freedom for treatments and the degrees of freedom for treatment by subjects. Next you examine the calculated F to see whether the calculated F value falls into the rejection region or not.

 

Go To Menu Map

Sampling Distribution of F

By now you should be familiar with the basic steps of this process of statistical hypothesis testing. The first step is to assume the poulation we are studying is a normal population. Next, you take a random sample with some number of measurements on the dependent variable. Next you use the statistical formulas to arrive at a calculated F value. Then you determine where the calculated F value falls on the sampling distribution of F in relation to the critical value of F . Finally you decide whether or not to reject H0.

Go To Menu Map