©Copyright 1997, 2000 Tom Malloy

 


 

Press the "Context for use" button


Use the "Next," "Back," and "Continue" Buttons to navigate through the material presented on the screens


Definition of Context. What we want to do now is define just when you would use the "t test for correlated means" ("dependent means t") in contrast to the "t for independent means," which we’ve already learned.

As we have learned, the t for independent means is used when there is no one to one correspondence between scores in one group and scores in the other group. In contrast, the t for correlated means is used when the two groups of scores have been constructed in a way that naturally leads a particular score in on group to "go with" a particular score in the other group.

Let’s look at some common ways to establish this correspondence between scores in two groups.


One way to get one-to-one correspondence between scores in groups is to make "Repeated Measures" of the Dependent Variable on the same subject. For example, say a scientist wants to evaluate the effectiveness a Diet. The scientific hypothesis is that following the Diet, which is the IV, a person’s weight, which is the DV, will be less than before the diet. So we weigh study participants before the diet and then after the diet.

What we have done is measure each subject’s weight twice (before and after the diet). This is what we mean by making repeated measures on the same subjects. Notice that the weight of a subject before the diet "goes with" the weight of that same subject after the diet. There is a natural correspondence between scores (weights) in the before and after columns.


Whenever there are repeated measurements of the DV on the same subjects (e.g., before and after the IV) the two sets of measurements are correlated and it is necessary to use the t test for correlated means.


Another way to create one-to-one correspondences between scores in two groups to carefully match each subject in one group to a subject in the second group.

For example, suppose a group of scientists have developed an educational program for teaching reading. They hypothesize that second graders taught by this New Program will read better than second graders taught by the Old Program. The IV is Type of Program (Old vs New). The DV is standardized reading scores.


First measure all the students on reading ability. Then put them into pairs. The two top readers are one pair. The next two are another pair.


Because reading scores in second grade have so much variability between children they want to reduce the variance. One way to do this is to take the top two readers in the whole sample and then randomly assign one of them to one group and the other to the other group. Then take the next two best readers and randomly assign them to the two groups. And so on. When you have done this the two groups will be very similar.

The screen shows the subjects moving randomly from the ordered list and their reading scores in the study assigned to one group or the other.


One student in each group will be carefully matched to a corresponding student in the other group.


Repeated measures and matched groups are only two of many research procedures which will generate one-to-one correspondences between groups. Any time there is a natural relationship between subjects in groups (such as husband and wife, or in twin studies) you will need to use the t for correlated means.


Since there is no formula or rule which tells you to use the dependent t, you will have to remain alert for cases that create correlations between groups.


Now on the Navigation Panel press (twice) the "up to menu" arrow button and go back up to the "CORRELATED t MENU."


Press the "EXAMPLE: 1-TAILED" button

Use the "Next," "Back," and "Continue" Buttons to navigate through the material presented on the screens


In the example, we have a salesperson from the Jazzy Ergonomic Keyboard company who approaches a buyer at Consolidated Markets Corporation (CMC). The Jer-Key salesperson says, "I’ve got a super, new kind of keyboard, designed to take advantage of all the latest ergonomic discoveries. I’d like you to try out a few free sample keyboards with some of your staff. You’ll see that the new keyboard is going to increase productivity. People will be able to type faster on this than they did on your old keyboards."


To evaluate the Jer-Key salesperson’s claim, we’re going to do a little experiment. The independent variable will be the kind of keyboard, either the old keyboard or the new keyboard. The dependent variable will be people’s typing speed, measured in words per minute.

Research Design: Because typing speed varies so drastically among people we decide to use each participant in the study as his or her own control: The participants will type on one of the keyboards for some amount of time while we measure their typing speed; then they type on the other keyboard for the same amount of time while we measure their typing speed. Which keyboard a person types on first and which second will be determined by the flip of a coin (randomly). Because we measure each person’s typing speed twice (once on each keyboard), this will be repeated measures design; so that is why we’re going to need a t for correlated means (dependent t).

The actual dependent variable which we will use is going to be the difference between a person’s typing speed on the old keyboard and the new keyboard. We are going to take people’s words per minute on the old keyboard and subtract their words per minute on the new keyboard.


Let’s make some scientific hypotheses. The Jer-Key salesperson promotes the scientific hypothesis that the new keyboard will cause higher typing speeds. That’s a directional scientific hypothesis. So the salesperson is expecting negative difference scores because we are subtracting new typing speeds (which should be higher) from old typing speeds (which should be lower).

We can imagine a different scientific hypothesis, one made by the CMC accountant who’s the one to decide whether or not to buy these typewriters. Perhaps the accountant might say, "I don’t know, sometimes when you change technologies bad things happen, other times good things happen. If we change technologies surely there’s going to be an effect, but I think it could either improve productivity or it could make it worse." So the CMC accountant is having a different point of view on buying these keyboards; this point of view is that the new keyboard might cause an increase or a decrease in typing. That’s a non-directional scientific hypothesis. The differences between typing speeds (Old minus New) might be negative or positive.

Finally, of course, the skeptic would say that the Independent Variable (IV) has no effect on the Dependent Variable (DV), that is, there’s no difference in typing speed between the two keyboards. Of course, if you do get a difference, then the skeptic is going to propose the PCH of chance. The PCH of chance is that any difference found between the typing speeds on the two keyboards would solely be due to chance. The differences between typing speeds (Old minus New) vary from person to person but generally be around zero.


Let’s translate the three scientific hypotheses into statistical hypotheses. Starting with the skeptical hypothesis (no effect of the IV, that is, these new keyboards don’t affect typing speed one way or the other), we generate the Null Hypothesis, H0. H0 says that we "expect" the average difference in typing speeds (Old minus New) to be zero.

The salesperson, of course, expects that the mean difference between typing speeds is going to be less than zero (negative) because we are subtracting a larger number (new typing speed) from a smaller number (old typing speed). So the salesperson’s H1 is that we expect the average difference in typing speeds to be less than zero.

The accountant on the other hand has a non-directional scientific hypothesis. So the accountant’s H1 is stated in a way that expects the mean difference score will not be equal to zero. The difference may be positive or negative but it’s is not going to be equal to zero.

Since we are developing a one-tailed example, we will choose the salesperson’s scientific and statistical hypotheses to evaluate in this example.


Student Question: Why did you subtract the New typing scores from the Old? Wouldn’t it be less complicated to subtract the other way so we don’t get negative numbers?

She’s right. There’s no compelling reason why I subtracted one way versus the other. Normally the convention (as she proposed) is to choose to subtract in a direction that makes the your scientific hypothesis generate positive numbers. Following that convention, the salesperson would subtract Old typing scores (lower numbers under the hypothesis) from New typing scores so as to get positive differences. Notice that if that is what we did, then the salesperson’s alternative statistical hypothesis (H1) would change from what is on the screen. Then it would have to be H1: E(MD > 0). The "less than" sign would change to a "greater than" sign.

You are free to choose the subtraction in which ever direction you want. But once you choose, then you have to make sure that your statement of H1 logically follows your choice. H1 can vary depending on how you choose to subtract.

I could have subtracted the other way: it’s just the choice I have. The reason that I chose to subtract New from Old (and so generate negative differences if the salesperson’s hypothesis is correct) is precisely to bring up this issue and to do an example with negative numbers so you would know what to do in that case.

The main thing to remember is to you have a free choice of which direction you subtract one measure of the DV from another measure of the DV. And... once you make that choice you then have to keep track of the logic of stating alternative hypotheses and setting up rejection regions (which we’ll do in a minute) in ways that are consistent with how you subtract.


This next screen in a way gets ahead a bit because we haven’t yet looked at the formula for t. We’ll see the full t formula in a bit. Suffice it to say for now that the t formula will involve the average difference score (MD) divided by a complicated expression under a square root. To make the point that we are making right now, we don’t need to complicate things with what is in the square root.

The point is that H0 expects the mean difference score to be 0 because H0 expects there to be no difference between the typing scores. If MD DOES happen to come out to be exactly zero, then t would be zero. This is because the top of the t formula would be zero (if MD = 0) and anything (whatever is in the square root) divided into zero is zero. The important conceptual point is that H0 expects t to be equal to 0. We can express this as

E(t given H0) = 0.

The fact that H0 expects t to be zero will be important to understanding the logic of statistical conclusion validity later.


Next we are going to complete this example using the salesperson’s scientific hypothesis. We have a directional scientific hypothesis leading to a one-tailed alternative. "Directional" and "One-Tailed" are essentially synonymous, only one is in the realm of science and one’s in statistics.

This screen summarizes the research design and shows the layout of the data. In the first column is the subject #. Next to each subject # we see the typing speed on the Old keyboard and then on the New Keyboard. As you press "Continue" the New scores are subtracted from the Old to give you a difference score (in blue). So for the first subject, typing speed on the Old keyboard is 55 words per minute, and typing speed for the New keyboard is 61 words per minute. When we subtract those we get a minus 6 as a difference score. (Note: We would get a plus 6 if I subtracted the other way, and that’s just that free choice.) Subject #2 has a minus 9 (47 minus 56) as a difference score. As you keep pressing "Continue" the difference score column will fill out.

Keep pressing "Continue" and the final column (d2) will fill itself out. We will need the difference scores squared to do our calculations.

For the calculation of t we won’t need the raw scores. The t formula will only use difference scores. The raw scores are just used in the beginning to create the difference scores. (That is why they are dimmed in the final screens.)

Next we will start calculating t, starting with the Mean and Standard Deviation of the difference scores.


We can see on the next screen that the sum of the difference scores is minus 38. And that means the mean difference score is minus 38 over 10, and so equals -3.8. Now if you’ll remember the hypotheses, the skeptic (corresponding to H0) said that we expect the mean difference score should be zero. The salesperson said that we expect the mean difference score to be below zero. MD is below zero so the data pattern fits the scientific hypothesis. That’s always an important first step. But the skeptic is going to say, "Well -3.8, that’s not very many words per minute. I think that’s just chance." Therefore, now we’re going to have to use a t test to evaluate the PCH of chance.


To calculate t we’re going to have to calculate a standard deviation, so let’s get on with doing that. First, the sum of the difference scores squared is 216. As you continue to press the "Continue" button, the formula for the standard deviation of difference scores will come, follow with the calculations of the standard deviation. This formula is really identical to the formula you’ve always used for standard deviation, except that we’ve substituted a symbol (d) for all the X’s. I’ve changed the formula a little bit so that we don’t have x’s in it, but it’s the same old standard deviation formula. In words the formula is the square root of the sum of d squared over the number of difference scores minus the average difference score squared.

Go ahead and substitute into the formula before you press continue. Doing that will give you feedback as to whether you understand the formula. In the substitution on the screen, the standard deviation of difference scores was the square root of 216 over 10, minus 3.8 squared.

The next screens go through the arithmetic. The arithmetic boils down to the square root of 7.16, which equals 2.6758.


The degrees of freedom (df) for this t test are the number of difference scores minus one. Don’t count raw scores because that’ll get you twice as many. Count the number of subjects (or the number of difference scores). Counting the raw scores (in this example 20) is a mistake students sometimes make on exams.

In this case the number of difference scores minus one is10 - 1 = 9, which is the degrees of freedom.


Now let’s get the formula for the t. The reason that I’m presenting the formula for the t test after we’ve seen and worked with the data instead of before is because to understand the formula you have to understand thoroughly what we mean by the mean difference scores and the standard deviation of difference scores.

The formula: t equals the mean difference score (MD) divided by the standard deviation of difference scores, itself divided by the square root of the number of difference scores minus one. Go ahead and substitute into the formula to make sure that you know how to.

The correct substitution is t equals minus 3.8 divided by 2.6758, itself divided by the square root of 10 minus 1. That should be what you wrote down when you substituted into the formula. The next screens do the arithmetic. You can write down in your notes as much or as little of the arithmetic as you like.


The final result is that t equals -4.26. That’s the calculated t. You know what we have to do next is to get the critical t, set up rejection region(s) and determine which region the calculated t falls into so we can make a decision about the null hypothesis (H0).


Next we will find the critical value of t. First choose a significance (alpha) level. This is a free choice on your part so long as alpha is not greater than .05. I’ll choose alpha equal to .05.

Second, you must determine if this is a one- or two-tailed test? Since we choose the salesperson’s scientific hypothesis (which is directional) to work with, this is a one-tailed test.


Student question: "Would the accountant’s scientific hypothesis give us a two-tailed test?" Yes, the accountant’s hypothesis would lead to a two-tailed test. That’s correct, and we’ll return to creating a test based on that later.

Third, and finally, we need to know the df if we are to use the tables to get the critical t. In this case df = 09. Using our tables (which aren’t shown on the screen) our critical value is equal to 1.833.

But there’s a question on the screen which asks, "Should this be a plus or a minus critical value?" Before pressing "Continue," answer that for yourself.


If you got the answer right and understand why, then you can skip past this material. If you got the answer wrong or if you don’t fully understand why you choose the right answer then follow along with this discussion. Let’s examine the logic behind choosing plus or minus. Recall that H0 expects t = 0. But what does H1 expect t to be? To review a bit, if you take into account both the salesperson’s scientific hypothesis and the direction we subtracted, you expect the mean difference score to be below zero (because the Old minus New would be negative). Now think about the t formula. t is the mean difference score (MD) divided by a standard deviation and a square root term. Remember, both the standard deviation and a square root must be positive. So if a negative MD is divided by positive numbers (standard deviation and a square root), then the result (t) must be negative. Therefore H1 expects t to be negative.

E(t given H1) < 0

Now let’s get back to the question whether the critical value of t should be plus or minus. Since the null hypothesis (H0) expects t = 0 and the alternative hypothesis (H1) expects t to be negative, it makes sense to place our rejection region on the negative side of 0. So our t critical should be negative. In short, the critical value should be minus because we are predicting a negative average difference score, and therefore a negative t.

The critical value is, therefore, -1.833.

If that discussion is a little bit shaky for you, that’s ok. Next we will go over the whole process of setting up rejection regions, so you can go over the whole process again with lots of visual support.


Let’s set up our number line. The full range of t goes from negative infinity to positive infinity, and in the center of that gigantic range is zero. The null hypothesis expects t to be 0. So we would not want to reject H0 if our calculated t was close to zero. But what defines what is close to zero versus what is far from zero? The critical value does. As we said before the critical value of t divides the range of t into regions that are close to 0 (the expectation of H0 ) versus far from zero.

The table tells us that our critical value is 1.833; and because the alternative hypothesis (H1) tells us t should be negative, we place our rejection region below zero. So the critical value is a -1.833.


Let’s put our critical t (-1.833) on the number line and then draw the "Reject H0" and the "Do not Reject H0" regions. Of course the "Reject H0" region is the one farthest from 0, and the "Do not Reject H0" region is the one that includes 0.

Next, let’s put our calculated t (-4.26) on the number line.


We see that calculated t falls in the rejection region. Consequently we reject the null hypothesis.


What we’re saying when we reject H0 is that if H0 is true, then the chances of getting a calculated value of t out here in this rejection region, is less then .05, less than 1 in 20. This is pretty improbable. Improbable enough to be considered implausible. If the data are solely determined by chance, then the probability of getting a calculate t value in the rejection region is less than .05. This probability is so small (1 in 20) that, by general agreement, scientists are willing to say the once plausible competing hypothesis that chance alone is determining the data is no longer plausible.

In contrast notice that H1 (which comes from the scientific hypothesis) predicts a value of t below 0. So the fact that we got a calculate t = -4.26 is consistent with H1. Therefore H1 and therefore the salesperson’s scientific hypothesis remains plausible.

All we’ve accomplished with all these statistical machinations is simply to eliminate "chance" as a plausible explanation of the data pattern, while leaving the scientific hypothesis plausible. This is what we mean by having a valid statistical conclusion (or Statistical Conclusion Validity).

We’ve not proven or even supported the scientific hypothesis in any deep sense. Whether people come to believe the scientific hypothesis or not depends on many important issues in the research design. These issues, often grouped as Internal Validity, External Validity, and Theoretical Construct Validity are addressed in research methods.


Student question: "If it were a two-tailed test, would there be two rejection regions?" Yes, if it were two-tailed, there would be a critical value below zero with a negative sign and the exact same critical value above zero with a positive sign. BUT neither of them would be 1.833, because you would use a different column in the tables for a two-tailed test. That’s a good question. In fact you can go up to t correlated menu and redo this example from the accountant’s point of view so that you have a non-directional scientific hypothesis which generates a two-tailed rejection region.


Now on the Navigation Panel press (twice) the "up to menu" arrow button and go back up to the "CORRELATED t MENU."

Press the "EXAMPLE 2-TAILED" button

Use the "Next," "Back," and "Continue" Buttons to navigate through the material presented on the screens.


We will use the same Jer-Key salesperson example. But this time we will use the accountant’s scientific hypothesis. The first set of screens indicate that the research design is the same, the data is the same, and the results are the same: t = -4.26, with df = 9. In other words, irrespective of what the scientific hypothesis might be, the data are the data and the results are the results.

Let’s look at the scientific and statistical hypothesis. The accountant is saying, "Yeah, I admit that technology has an impact. I’m just not sure if it’s good or bad. The New keyboard might produce faster or slower typing speeds." So there’s a non-directional scientific hypothesis.


The accountant’s non-directional scientific hypothesis generates a two-tailed H1. Since the new keyboard might make typing speeds faster or slower, the MD might be either above zero or below. In other words the MD will not be equal to zero.

The alternative hypothesis (H1) expects MD to NOT equal zero. This is a two-tailed alternative.


We next use the tables (not shown) to find the t critical, which depends on three things: alpha, df, and whether or not this is a one- or two-tailed test.

Let’ leave alpha at .05, degrees of freedom remain at 9, and we have a two-tailed test. Given those three things, the tables tell us that the critical value of t is 2.262.

But there is a question on the screen. Should 2.262 have a plus or minus in front of it? Before you press "Continue" answer that question for yourself.


The answer is both. Because H1 expects MD to be either above or below zero we want critical values that allow us to reject H0 either above or below zero. So we have two critical values: +2.262 and -2.262.


Let’s set up our number line. The full range of t goes from negative infinity to positive infinity, with zero in the center. The null hypothesis expects t to be 0. So we would not want to reject H0 if our calculated t was close to zero, "close" is defined by the critical values. As we said before, the critical value of t divides the range of t into regions that are close to 0 (the expectation of H0 ) versus far from zero.

We place the critical values of t on the number line to define the rejection regions. In a two-tailed test we have two rejection regions, one on each side of zero.

Next, let’s put our calculated t (-4.26) on the number line. We see that calculated t falls in the rejection region. Consequently we reject the null hypothesis.


Statistical Conclusion Validity. OPTIONAL: You can continue with the next series of screens which complete the ideas about statistical conclusion validity. To get exact commentary to go with the screen go back to the text for the 1-tailed example–the screens and the text are identical.

What I’ll do here is rephrase all that in different words without reference to screens in case you want another discussion of the issues in different words.

Since H0 expects calculated t to be zero, if we get any t close to zero, then there’s no reason to reject H0. The question is, what does "close to zero" mean? "Close" is always defined by the critical values, in this case minus -2.262 and +2.262. We will reject above +2.262 and below -2.262. That’s why this is called two-tailed, we reject in either tail: The tail that goes to negative infinity or the tail that goes to positive infinity. We do not reject H0 in the middle, near zero, because H0 is predicting a t of zero.

"Close to zero" is defined as "between the critical values" and "far from zero" is defined as "outside the critical values."

 

This completes the material on the t test for correlated means. You can go up to the "Correlated t Menu" and select "Formula" any time that you want to quickly review the formula.