t-Independent Lecture

Click above to start an Interactive Visual Presentation (Plugin Required)
Click here to go to our plugin download and plugin tutorial page

Trouble Printing? Download Acrobat File here!

Copyright 2000, Tom Malloy.

This is the text of the in-class lecture which accompanied the Authorware visual graphics on this topic. You may print this text out and use it as a textbook. Or you may read it online. In either case it is coordinated with the online Authorware graphics.

PRINT: You may print this web page on your local printer if you wish.
Then you can read the hard copy of the lecture text as you look at the Authorware graphics.


Topic Locator Map

Go To Science-Statistics-InterfaceGo To Treatment-PopulaltionsGo To FormulaGo To Science Statistics InterfaceGo To Calculated t valueGo To Critical t ValueGo To Expected t valueGo To Statistical Conclusion ValidityGo To Sampling Distribution of t

Go To Statistical Conclusion Validity SummaryGo To Reconnecting to Science

This map allows you to--

  1. Jump directly to a topic which interests you.
  2. Co-ordinate the dynamic visual Authorware presentations with the corresponding text available on this web page.

1. To find a topic which interests you: Look at the map of menus above. Choose a menu that interests you. Notice that the menu buttons have topics printed on them. Click on any button (topic) on the menu; you will jump directly to the text that corresponds to the topic printed on the button.

2. To coordinate this web page with Authorware presentations: The corresponding Authorware program should already be open. Go to the menu of your choice in the Authorware program and click any button which interests you. Then on the topic locator map above click on the same button on the same menu; you will jump to the text that corresponds to the Authorware presentation.

End of Topic Locator Map


Background

Inferential Statistics. In an earlier overview on statistics, we described the two basic types of statistics which you will encounter in this course, descriptive and inferential statistics. All the different statistics we have discussed so far--mean, median, variance, standard deviation, correlation, and regression--are types of descriptive statistics. They are used to describe samples and populations.

In the lectures on sampling distributions, estimating parameters, and hypothesis testing (statistical conclusion validity), we laid the theoretical foundations for comprehending how inferential statistics.

We are now about to start on one of the most exciting topics in this course--applied inferential statistics. These are the statistics that are used to test hypotheses or make inferences about hypotheses and populations. These will include t-tests, chi-square tests, and F-tests. But we are beginning with t-tests.

The t-test was the first applied inferential statistic developed. It is important because of the power it has given to researchers to evaluate the results of experimental research. It is probably the most used statistic in social and medical research because of its wide applicability. Learning how to use it gives you the same power to conduct research and evaluate statistical hypotheses as researchers have.

Historical Note. The t-test was originally developed by a statistician named W.S. Gosset who published his research under the pseudonym of "Student" very early in the 20th century. "Student" was the most important predecessor of Fisher who was described in the Difference to Inference lecture. The theory behind the t-distribution was a breakthrough. Up to that point, use of a probability distribution required that you knew values of the population parameters such as mu and sigma. The t-distribution permits us to use the sample estimates in calculating probabilities.

The story, as I heard it, was this. A brewry in Milwaukee around the turn of the century was confronted with a research puzzle. They thought (hypothesized) that their new brewing process would produce beer that was better. They had measurements of beer quality, but the data were ambiguous and not clear cut. Did the data really show that the new brewing process was better than the old way of doing things. If so, they were willing to invest in the new brewing process. But they did not want to spend all that money of the new process didn't really produce better beer. So they hired a young mathematician, W. S. Gosset, to address that puzzle. Gosset invented the now famous "student's t" in this context. He, of course, wanted to publish his new discovery. But his employers, realizing the power of this new idea, didn't want other breweries to have this same advantage. So Gosset published his theory in mathematical journals as "Student." His employers felt that esoteric math journals were the sort of reading that breweries did on any regular basis, nor would they understand what to do with the theory even if someone brought it to their attention. In this and the succeeding lectures, we will try to give you the basis for understanding a tool so powerful that at least one business sought to hide from their competition.

Some people take that story to mean inferential statistics is the work of the devil. Others take it as a demonstration of how alcohol leads to the refinement of civilaztion. Either way, "Student's t" has been imposed on students for many generations.

There are many different applications of the t-tests. t-tests are used for comparing independent means, for comparing correlated means, for testing hypotheses about single means, for correlations, for regression statistics, and many more. This lecture will address the t for two independent means.

Degress of Freedom. One new concept you will encounter with the t-test (and all other inferential statistics) is degrees of freedom. Generally, degrees of freedom will be the number of observations in the sample minus the number of means. For a one group study, where there is only one mean then df = n-1. For a two group study, where there are two means (a mean for each group), the df = n1 + n2 -2, where n1 is the number of scores in group 1 and n2 is the number of scores in group 2. But we are getting ahead, so don't worry about that for now.

Formulas for calculating the degrees of freedom give you a value which determines the shape of the t-distribution. We will talk more about this later in this t lecture and describe how degrees of freedom are used in the t-test. A later lecture will develop the theory behind degrees of freedom.

Let's begin with a simple, applied example: Elite Skiers.

 

 

Elite Skier Example

Suppose a sports psychologist wants to study the effects of using an imagery technique on the performance of elite skiers.

DEPENDENT VARIABLE. She decides the dependent variable will be the elapsed time down some famous course. (Elapsed time is how long it takes skiers to ski down under racing conditions.) The psychologist assumes that elapsed time data can be modeled by a normal distribution.

INDEPENDENT VARIABLE. The independent variable for this study is using (or not using) an imagery technique (IT). The imagery technique is based on imagining in vivid detail that one is completing a perfect run down the race course.

TWO GROUP STUDY. She plans to do a two group study. She will randomly divide a pool of world-class, elite skiers into two groups. The treatment group will receive her special imagery technique (IT). The control group will receive no special treatment.

Back To Menu Locator Map

Control Population

Some of the study's participants will get imagery training and some will not get any special training. The group that does not get any imagery training is called the control group. It is assumed to be sampled from the control population.

The control population is the population of world-class elite skiers. For this example, let's assume that the mean (mu) elapsed time down the course for elite skiers is 122.66 seconds. The control population is expected to have a normal distribution, shown here as a green normal curve.

Imagery Technique Population

The researcher expects that the imagery technique will improve the performance of elite skiers. Her expectation is that those elite skiers in the group which receives the imagery technique (IT) training will have lower elapsed times down the course than those without the IT training. In other words, she thinks that there are actually two populations--the Imagery (Treatment) Population and the untreated Control Population.

The IT population is expected to have a normal distribution, shown here as a red normal curve. For the moment, let's assume that the imagery technique (IT) works. Let's suppose that the mu of the treatment population is 120.47 seconds.

HOMOGENEITY OF VARIANCE. As we pointed out in the Detect Difference and Double Sample lecture, statisticians commonly assume that the two populations (control population and treatment population) have the same variability. That is, the sigma's for the two populations are equal.

ONLY MU's CAN DIFFER. In general the Normal Distribution has only two parameters (mu and sigma). Therefore, because we have assumed the sigma's are equal, the only potential distinction between the treatment and control populations comes from their mu's.

Two Populations

In summary, the researcher hypothesizes that the IT treatment is effective in improving performance. In probability theory, this scientific hypothesis is modeled in terms of there being two populations--the IT Treatment Population and the untreated Control Populaltion.

The Treatment and Control populations are assumed to have the same sigma.

In the graphic, the red distribution is the model of the elapsed times of the IT Treatment population and the green distribution is the model of the elapsed times of the Control or regular population that did not receive imagery training

The participants in the two groups are similar; both the Control Group and the Imagery Group are elite, world class skiers; they have comparable race times. The researcher will compare the performance of skiers sampled from the Control Population to performance those sampled from the Imagery Population in order to determine whether the imagery technique affects performance.

Decision Making

Even though researchers make hypotheses about populations, they hardly ever are able to test an entire population. Instead they collect samples from relevant populations. Then the researcher examines the sample data and makes an inference about the populations. This scientific thinking strategy is very similar to your experience with the Detect Difference Game. So think about your experience with that game as we continue on in this discussion.

For instance, this researcher wants to know if the IT technique will help elite skier's performance. She can't test all elite skiers in the world so she takes a random sample of elite skiers and gives half of them the IT training and the other half no special training. Then she looks at the performance of these two groups in order to make an inference about the effectiveness of the imagery technique.

Another way of thinking about the researcher's question is "Do the sample groups come from two different populations or from one population?" If the IT treatment has an effect on skiers' elapsed time on the course then the two groups come from different populations. If the treatment has no effect then the two groups come from the same population.

Ho versus H1. We will discuss this in greater detail further along in this lecture, but for now we can point out that the null hypothesis is equivalent to saying that there is only one population and the alternative hypothesis is equivalent to saying that there are two populations.

 

Theoretical Treatment Effect

Here are the two populations again. For this example we have made up that the IT population has a mean (mu) elapsed time of 120.47 seconds and the control group or regular population has a mean (mu) elapsed time of 122.76 seconds.

The size of the theoretical treatment effect is easily calculated. It is the difference between the mu of the treatment population and the mu of the control population. In our example the treatment effect size is -1.29 seconds.

We have worked with this idea of treatment effect in the Double Sample tool and the Detect Difference lecture so that you should have lots of experience with it.

Larger treatment effects are easier to detect than smaller ones so if the mu's are far apart it is easier to determine that there are two populations rather than just one. When the distributions get closer and there is more overlap between the curves, then it is more likely that you will need a statistical test like a t-test to determine whether there are two populations or only one. The levels of difficulty on the Detect Difference Game are based on treatment effect size.

 
Science vs. Skepticism

The scientist proposes that the treatment is effective and so there are two separate populations - the treatment and the control populations. In other words the IV has an effect upon the DV and therefore there really are two populations -- those who have been trained and can get down the hill faster, and those who haven't been trained who get down the hill slightly more slowly.

The skeptic, of course, does not believe in this imagery stuff. So to the skeptic, it doesn't really matter if someone receives training or not, it's all the same. In other words, the skeptic believes that the treatment is not effective and therefore both groups are coming from the same population. That is, there is only one population.

Homogeniety of Variance Assumption

Just to repeat: Statisticians assume that the two populations have the same sigma. That's a standard assumption for t tests and all of these kinds of statistical tests. The variances are assumed to be equal, and so the only possible difference between the two populations is their mu. The real question is, given that the sigma's are the same, "Is there a difference between the mu's?"

 

Statistical Hypotheses

NOTATION: We need general symbols that go beyond this example. So, for this example, let's call the treatment population Population 1. Population 1 has as its center, Mu 1. Let's also call the control population Population 1, with Mu 2 as its center.

NULL HYPOTHESIS: Ho, the null hypothesis, comes from the skeptical hypothesis. One general form of Ho states that "Mu 1 minus Mu 2 is equal to zero." If the two population mu's are equal then their difference is zero. [Another way to say this is that if the two mu's are the same, then there's just one population.]

ALTERNATIVE HYPOTHESIS. On the other hand, the alternative hypothesis, H1, states that "Mu 1 minus Mu 2 is less than zero." Remember we are measuring elapsed times so smaller values on the DV indicate better skiing performance. In other words, the treatment population will have lower scores than the control population. Therefore, there are probably two distributions. This scientific hypothesis is directional.

The statement "Mu 1 minus Mu 2 is less than zero" implies there are two populations. The logic is that if the two mu's are different, then there have to be two populations.

ANSWER TO STUDENT QUESTION: Yes, H1 always corresponds the scientific hypothesis, and Ho always corresponds to the skeptical hypothesis. Ho particularly corresponds to the assertion that the data pattern is due to chance. So, Ho goes with skepticism and the PCH of chance.

How do we decide if our two groups come from different populations or from the same population? We decide in the same way you decided when you played the Detect Difference Game.

REVIEW: The dependent variable is elapsed time; the independent variable is whether or not the skiers receive the imagery treatment. There are two groups of skiers; one group was trained and the other was not. The two groups both race down the same course. The researcher finds that, on the average, the elapsed time for the imagery group is less than the control group. The scientist predicted this result.

The scientist wonders if she is sampling from two populations (as H1 suggests) or from only one population (as Ho indicates)?

By now you have enough experience with our probability tools to know that if Ho is correct and if you take two samples from the same population, you're going to get chance differences between them. The whole PCH of chance idea is that these data, the imagery group having lower elapsed times, could be due to chance alone. Another way of saying this is that the skeptic thinks these data could result from two samples from the same population.

 
 

Deciding: 1 or 2?

Just like in the Detect Difference Game, the problem is that the two populations are not available to the researcher for examination. In reality, scientists don't know the populations, all they know is the data. We've got to decide from the data whether or not there is one population like Ho says or two populations like H1 says.

 

This is the whole point of all the experience you had playing the Detect Difference Game. The idea is to put you in the formal situation of the scientist, and in the Detect Difference game, you don't see the populations. You have to use the data to guess whether there are one or two populations. After you make your guess the Detect Difference game gives you feedback by showing you the populations.

Scientists never get that kind of feedback because these populations are only models. They don't really exist. It may be worth just going back and playing the Detect Difference Game again to help solidify these ideas.

SUMMARY: We have two samples of data. Did the data come from two populations, a red and a green one, or are they just two samples from the single green population that differ by chance? We can't see the two populations, all we see is the data.

That is the conceptual background for the t-test. Let's review another research study example.

Back To Menu Locator Map


Back To Menu Locator Map

Psychotherapy Example

In our next example, we're moving from elite skiers where we want to reduce the value of the dependent variable, to mental health, where we want to increase the value of the dependent variable. People want more mental health and skiers want to take less time down the hill. So it all depends on the context of the study which direction we would want our results to go.

 

Design of Psychotherapy Study. Let's assume that a group of scientists is interested in determining whether a new psychotherapy they have developed is effective. As a first step in this process, they do the following experiment. They randomly divide people who have consented to participate in the study into two groups: A group which receives the new psychotherapy and a group which is told that no psychotherapist is currently available. This group is told it is on a waiting list to receive psychotherapy later.

After the study, both groups are asked to come into the counseling center where they are measured on the dependent variable (DV). The DV is some set of measurement operations which give us a number for each participant. This may be a paper and pencil test of depression or a structured interview or behavioral observation. But whatever it is it must generate a number for each participant.

NOTE: In the Interface to Science lecture, we discussed how some measurement operations are elegant and valid while others are very poor and lack construct validity. These validity issues are crucial because all that statistical procedures can do is work with numbers. If these numbers are invalid or are conceptually shaky, then the statistical conclusions will be invalid or shaky. Creating valid measurement operations is an important scientific issue that provides a foundation for statistical procedures.

Let's assume (and it's a big assumption) that we have a valid measure of mental health. The scientists manipulate the IV by giving the psychotherapy group the new psychotherapy while withholding the new psychotherapy from the waiting list control group. At the end of the study both groups are measured for mental health (DV).

Of course, at this point, the waiting list control group is given the new psychotherapy for human and ethical reasons, but this is not part of the study.

The Hypotheses

The scientific hypothesis expects the IV, psychotherapy, (1) will have an effect and (2) that this effect will be beneficial to mental health (DV).

This experimental design implies that the psychotherapy group (experimental group) will have higher mental health scores than does the control group. The screen shows data that fit the scientific hypothesis. The experimental group shows higher mental health than the control group. The data pattern fits the scientific hypothesis. It is crucial to look at the pattern of results before rushing off into inferential statistics such as a t-test. There would be no point in further analysis if the therapy group had lower mental health scores than the no therapy group.

Ask yourself, "How does the data pattern relate to the predictions of the scientific hypothesis?" Your answer to this question combined with a bit of common sense will guide you on decisions about what sort of statistics you want to do. The scientists are, no doubt, happy to find a fit between their predictions and the data pattern.

 

The Skeptic replies...

The skeptic will not be deterred by the fact that the data pattern seems to fit the scientific hypothesis. As a first critique, the skeptic will claim that the pattern of results happened by chance alone. After all, one of the two groups had to have higher scores, so the scientists were just lucky that the group they predicted would be higher came out above the control.

The Skeptic's Logic

For example, suppose we divide all the students in any classroom into two groups randomly or, at least, irregularly. Suppose we put all those sitting in the odd numbered rows into one group and students sitting in the even numbered rows in another group. Then suppose we measure them on anything (GPA, height, the number of calories they ate yesterday).

We would expect that the means of the two groups by chance alone would differ. The means would not be equal right down to the fifth decimal point. The mean of one group would be higher than the mean of the other group.

For example, if we measure caloric intake no one would claim (without great mental gyrations) that, for example, people who sit in odd numbered rows eat more calories than people who sit in even numbered rows just because the mean calories eaten yesterday by the odd numbered group came higher than the mean of the even numbered group. The point is, in a very general way, it is plausible to argue that the results of any research study occurred by chance alone. This is the Plausible Competing Hypotheses (PCH) of chance. It is the first issue you must address in a discussion with a skeptic. Formally, evaluating the PCH of chance is called "Statistical Conclusion Validity." Inferential statistics such as t-tests evaluate the statistical conclusion validity of research.

A t-test for independent means will evaluate the claim that the group means differ only by chance. It will do nothing else. It will not help with the criticism that psychotherapy ought to have been evaluated against an "attention placebo" group. Issues around the appropriateness of the control groups used are included in "Internal Validity" and are not formally a part of statistics. Internal Validity is prior to and more important than statistical procedures.

Science to Statistics Interface Section

Skeptical Hypothesis

The skeptic hypothesizes that the IV (psychotherapy treatment) is ineffective, meaning it neither improves or worsens mental health. Because the IV is ineffective there will be only chance differences between the two groups in the study - the control group and the treatment group. The skeptic assumes that these chance differences between the groups is strong evidence that the two groups are from the same population with one mu.

The skeptic would look at the graph of the results and say, "Yes there are differences, of course, but they are only chance differences."

 
Scientific Hypothesis

The scientist hypothesizes that the IV (psychotherapy treatment) is effective, meaning it improves mental health. Because the IV is effective there will be a significant difference between the two groups in the study - the control group and the treatment group. The scientist assumes that the difference between the groups is strong evidence that the two groups are different populations with different mu's as opposed to two mu's that are from the same population and differing only by chance.

The scientist would look at the graph of the results and say, "Yes there are differences, which are due to the effect of the treatment."

 

From Scientific and Skeptical Hypotheses to Statistical Hypotheses

Next we need to translate our scientific hypotheses into statistical hypotheses. The skeptic's plausible competing hypothesis that any differences in our measurements of mental health (DV) between the two groups occurred only due to chance will be expressed in statistics as the null hypothesis (Ho). The scientific hypothesis that differences in mental health measurements between the two groups are caused by psychotherapy will be expressed in statistics as the alternative hypothesis (H1).

Ways of Expressing the Skeptical Hypothesis

In terms of this example, the skeptic expects that the difference between the two populations mu's will be zero. Another way of expressing this is to state the it is EXPECTED that difference between the two sample means will be zero.

You'll see both these forms in different books and I just want you to realize that those forms are exactly the same, logically speaking. That's what the skeptical hypothesis says in essence. Remember that the skeptical hypothesis goes with Ho. So Ho will be stated in a parallel way.

 

Ho in Symbols

TWO COMMON FORMS: As you can see from the graphic, Ho can be expressed in terms of the population means (top formula) or in terms of sample means (bottom formula).

Note: In the bottom formula, the large E is technically the"Expectation Operator" in probability theory. But for the intuitive level we are approaching the material in this class, you can assume E means pretty much what the word"expects" means in ordinary English. Ho says that the skeptic "expects" (E) [Mean 1 minus Mean 2] to equal zero. Of course, the actual sample means vary a great deal by chance alone, the skeptic EXPECTS the difference between the means to be zero.

 
Ways of expressing the Scientific Hypothesis

The scientific hypothesis says that there are treatment effects of psychotherapy. Consequently the treatment population mu is higher than the control population mu.

This can also be expressed at the in talking about the data means. The scientist expects Mean 1 (the mean of the Experimental Group which was given psychotherapy) to be greater than Mean 2 (the mean of the control group which was not given psychotherapy). Is the scientific hypothesis directional or non-directional? Answer this question for yourself before going on.

Symbols for H1

In symbols H1 can be expressed in terms of treatment populations as in the top formula.

Or H1 can be expressed using the "Expectation Operator." According to H1 we EXPECT the difference between the data means to be greater than zero. This is because M1 comes from the psychotherapy group which should have higher mental health than does the control group.

 

 

 
One-Tailed Alternative Hypothesis

Is the scientific hypothesis directional or non-directional?

The scientific hypothesis is directionalbecause the scientist is expecting the mean of the psychotherapy group to show higher mental health scores than the mean of the control group. That is, we are predicting a direction.

We're NOT just saying that the two means will be different somehow. We are NOT saying that it doesn't matter which way the means differ. We ARE specifying that the psychotherapy group should do better than the control group. That is why the scientific hypothesis is directional.

As we've said, a directional scientific hypothesis translates into a one-tailed alternative hypothesis. H1, the alternative hypothesis, says that we expect Mean 1 to be greater than Mean 2. We call this a one-tailed alternative. That is, we will reject Ho in one tail of the distribution.

We've been over this a little bit in the lecture on Hypothesis Testing. But we haven't developed the idea of one-tailed versus two-tailed tests fully yet. So we'll come back to this idea a little further along in the lecture.

STUDENT QUESTION: If the scientific hypothesis were non-directional, does that mean we would have a two-tailed test?

ANSWER: In short: Yes. It's a good question, and we'll circle back to it eventually, but as long as it's asked right now, let's give an answer. To do this we need to alter the example a little. What if there's another scientist who disagrees deeply with your approach to therapy. This scientist may think that you have some effective therapy components but due to the way you are putting them together, the scientist can't tell what will happen. In her opinion, your therapy might either make people worse or make them better. She doesn't know what it will do. So her hypothesis is a non-directional scientific hypothesis. Therapy might work but she doesn't know which direction. When you have a non-directional scientific hypotheses it will lead to a two-tailed H1. On the current screen there is no two-tailed H1 written down. We'll get to looking at an example a little later. But a two-tailed alternative would look like this:… H1: E ( M1 - M2 not = to 0). That is, we "Expect Mean 1 minus M2 to not equal to Mean 2."

That's the set up for statistical conclusion validity. Remember, Statistical Conclusion Validity only evaluates one simple Plausible Competing Hypothesis: The PCH of Chance. It evaluates the plausible claim that (even if the data pattern is consistent with the scientific hypothesis) the data pattern occurred only due to chance. Any results might have happened, and the scientist was lucky: the data pattern turned out as expected. Statistical Conclusion Validity does NOT evaluate deeper scientific issues such as whether or not the design of the study makes sense or whether or not the scientific hypothesis has gained support through the data. Statistical Conclusion Validity simply gives a probability statement regarding the conjecture that the results only happened by chance.

Back To Menu Locator Map


Let's go ahead now and look at the formula for the independent means t-test.

Back To Menu Locator Map

Formula

Here are two forms of the formula. Write them down and then we will explain them. The two forms of the formula are exactly the same, except for the top line. In the first formula the top line ends with E(M1 - M2 given Ho). In the seccond formula, the top line ends with (mu1 - mu2 given Ho).

 

Now let's make sure that all the parts of the formula are clear to you. M1 is the mean of whichever experimental condition you're calling the first group. In our example, M1 is the psychotherapy mean. M2 is the mean of the other experimental condition. In our example, it is the the mean of control group mean on posttest scores.

THEORETICAL TERM. The next term, E(M1 - M2 given Ho), is theoretical. As I mentioned when defining the statistical hypotheses, the E can be understood as the verb "to expect." And the thick vertical line is a symbol for "given" or "given that (whatever follows it) is true." So you can read this term in ordinary English as "the expected value of the difference between Mean 1 and Mean 2 given Ho is true." In general, the expected value of Mean 1 minus Mean 2 given Ho is true is zero.

Another way to think about this term (see second form of the formula) is that it represents treatment effect size according to Ho. Usually Ho says that the treatment effect size (mu1 minus mu2) is zero. So, again, this theoretical term is usually 0.

So in our example we would simply put a "0" into the formula on the top line where the theoretical term is. It's just a big, long symbol that almost always equals zero. It so often has a value of "0" that many textbooks don't even put this term into the formula because it's just clutter. In this class we will only use examples in which Ho expects the difference between means to be zero. (But there are cases where Ho might not expect the difference between the mean to be zero. I'll give an example, below, under supplementary information.)

Parts of the Formula

Next we have n1, this is the number of subjects in group one. Correspondingly, n2 is the number of subjects in group two. The number of subjects in groups one and two can be different; you can have a different number of subjects in the two groups. Next we have the variances of the two groups. The variance of group one is s1 squared . The variance of group one is s2 squared. [Remember we are using the true variance formula which is divided by n. There is another variance formula used to estimate population variances; that formula has (n -1) as a divisor. For people who are using that formula t would have to have slight modifications.] An idea which we've not discussed before is the degrees of freedom (df). Right now, just note what the "df" symbol means in English. We will work with it more later, but as a start you can note that in this case df = n1 + n2 - 2.

That finishes the definitions of the various parts of the formula.


 

Supplemental Information: [You will not need this information in this class. So read this section only if it is of interest to you. It is provided as a response to a frequently asked student question.] STUDENT QUESTION: "Does the expected value of the difference between the means always have to equal zero if Ho is true?"

ANSWER: It is logically possible that you would have a case and a hypothesis in which Ho is actually predicting some known difference between groups. Let's think of an example. Okay, let's say there's a known difference in reaction time between men and women; women have faster reaction times than men. Let's say that difference is known to be 10 milliseconds. So you could do some intervention with men, perhaps you could give a group of men six cups of Café Gourmet's strongest brew. A comparable group of women would be given six cups of decaf. Suppose the scientific hypothesis is that caffeine will decrease reaction time (so the men should be responding as fast or faster than the women). The skeptic says that caffeine has no effect on reaction time.

If the skeptic is right, then your group of men should still be responding 10 milliseconds slower than women. So Ho would be that we still expect the difference between the means to be 10 milliseconds. Therefore, in this example for the E( M1 - M2 given Ho) term you would put 10 instead of 0. This is because 10 milliseconds actually represents no change as a result of your coffee intervention. So, theoretically, this term could have a value other than zero. But in this class we won't use those kinds of examples.

Back To Menu Locator Map


This section begins with a series of screens that we've already discussed. They serve as reminders of what the scientific example is. Go over them quickly and then move on to the screen shown below.
Back To Menu Locator Map

Calculating t

This screen presents the data from our hypothetical experiment. The numbers are a result of our dependent variable measurement operations and indicate mental health, with higher numbers corresponding to better mental health.

The formula for the t test for independent means is presented so we can use it in the context of the example. BEFORE you go onto the next screen, the most important activity for you at this point is to use the data to substitute into t formula, writing the substitution in your class note outline. The next screen will show the correct substitution but you will learn the most by being active and substituting into the formula on your own.

When you have made your substitution go onto the next screen and carefully check to see if you understand how to use the formula. A series of screens will follow, showing the steps of the arithmetic, including the calculated t value and the degrees of freedom. When you are comfortable with how to use the formula, go on to the next section.

 

 

 

Back To Menu Locator Map


Critical Values of t

Now that we have a calculated t, we have to find a critical t. Using this critical value, we will divide the range of our test statistic (in this case t) into two regions: the "reject Ho region" and the "do not reject Ho region."

Back To Menu Locator Map

One or Two Tailed? Where to Reject?

All t tests are either one-tailed or two-tailed. The one-tailed tests have either an upper or a lower rejection region. This refers to whether you are rejecting in only one tail of the distribution (the right long tail, or the left long tail) or whether you can reject in either of the two tails of the distribution (see the top distribution to the left).

To set up rejection regions, one has to decide two things. (1) Is it one- or two-tailed? (2) If it is one-tailed, does it have an upper rejection region or a lower rejection region?

The way to determine this, either in science, on an exam, or in the homework, is to read the scientific hypothesis. If the hypothesis is non-directional, then it goes with a two-tailed t test. This means that the researcher can reject Ho if the effect is far enough from zero either on the negative side or the positive side of the distribution.

Usually scientists are making a specific directional hypothesis. In the case of the psychotherapy example we've developed, we're expecting the treatment group to do better than the control group. That is, psychotherapy improves mental health. So this is directional; it only uses one tail. Obviously we're looking for mean one minus mean two (M1 - M2) to be a positive value or above zero. Since Ho is predicting zero, we'd want to reject for t values that are larger than zero. Ho would be rejected if the calculated t fell in the upper tail of the distribution.

For the skiing example, the researcher predicted that the treatment group would have lower scores than the control group. Therefore, the mean of the treatment minus the mean of the control would be a negative number. The researcher, therefore, expected t values below zero. In this case, we would reject Ho for values of t less than zero (i. e. in the lower tail of the distribution.)

Ho is always predicting zero and then H1 is either predicting above zero or below zero or both directions (either above or below). By reading the problem, and determining the nature of the scientific hypothesis, you should be able to decide what kind of test you need, whether there is two rejection regions or just one, and if it is one, in which direction.

For a one-tailed test, the alternative hypothesis (H1) statement will always have either a greater than ( > )or a less than( < ) sign. For a two-tailed test, H1 will have a not equal sign (). The null hypothesis is always that the difference between the means is equal to zero. [NOTE: This last statement is not technically correct, but for this level class it is conceptually accurate.]

Back To Menu Locator Map

Expected Values of t

Let's discuss what the expected values of t will be if Ho is true or if H1 is true.

Back To Menu Locator Map
Expected Value if Ho is True

The PCH of chance in science goes with Ho in statistics. The skeptic expects the difference between the means to be equal to zero. There's two ways for Ho to express this. One is that mu one minus mu two is equal to zero. The other is that we expect mean one minus mean two to be equal to zero.

It is important to realize that if Ho is true one would expect t, on the average, to be centered at zero.

To understand that Ho is that t is zero, keep track of three points: (1) The structure of t is basically to divide a square root into the difference between the means. (2) If Ho is exactly right, the difference between the means would be zero. (3) When you divide any number into zero, you always get zero. If Ho is exactly right in its prediction, expect t to be zero. Ho is predicting a zero for a t value. That will be important in our logic later on.

Expected Value if H1 is True

In the psychotherapy example, the scientific hypothesis is directional.

If H1 is true, we expect mean one to be greater than mean two. So H1 is expecting some positive number on top of the t. Therefore when you divide whatever value is in the square root into the positive number, you're going to get, in this case a positive number. So H1, in this particular case, is predicting large values of t, above zero at least. So the expected value of t, given H1 is true, is greater than zero.

Is H1 one- or two-tailed?

The answer is that it is one-tailed because we will reject Ho in only one tail of the t probability distribution. We haven't examined that logic in detail, but it will develop as we go along. It will become apparent that there is a flow of logic from the structure of the scientific logic in the H1 statement through the way the rejection regions fall in one or two tails. And when the rejection region is only in one tail, which one.

Back To Menu Locator Map


Back To Menu Locator Map

One Tailed Upper Test

 

In the Psychotherapy example, the scientist was expecting that the group which received the psychotherapy treatment would have improved mental health as evidenced by higher mental health scores. Therefore the scientist expected M1 (the mean of the treatment group) to be higher than M2 (the mean of the control group).

In general, if the Treatment group will have larger values on the DV than the control group in a study then the result of M1-M2 will always be a value greater than zero. The null or skeptical hypothesis expects the value of t to be at or near zero. Logically if we receive a t value that is much larger than zero then we have found strong support for the scientific hypothesis and can be fairly sure (usually 95% sure) that we can correctly reject the null hypothesis.

 

There are three things required to look up a critical value for t in a statistical table. First, calculate the degrees of freedom. In the Psychotherapy example there are five subjects in group one and four subjects in group two. Following the formula, the degrees of freedom would be five plus four minus two, which is seven. You'll need those degrees of freedom to look up the critical value in a t table.

One or Two Tailed?

The second requirement in using the table of critical values of t is that you must know whether you have a one- or a two-tailed alternative hypothesis. In our Psychotherapy example our H1 is one-tailed because the scientific hypothesis is directional: We expect the group given psychotherapy to have higher mental health scores.

Alpha Level

The final requirement is that we must specify an alpha (a) level. This is a choice that you have. The tables will list several alpha levels, including .05, .025, .01 and others. You choose the level. By convention, in the social sciences you ought not choose an a level larger than .05. Any of the smaller alpha levels, such as .01 or .001 is acceptable. For this example, let's choose alpha = .05. [Remember the alpha is the probability that you are wrong when you reject Ho. By choosing alpha, you are choosing the probability that you're wrong.] We have calculated the df = 7, we know we have a one-tailed test, and we have chosen alpha = .05. Now we are ready to look up the critical value in the statistical tables.

Reading the Critical Values Table

Your table will not look very much like the one on the screen because the screen just shows a schematic. There's only a few simple points to using the table.

First, the critical values are the numbers in the body of the table. You just have to go down some number of rows and across so many columns and read the number you find at the intersection of the row and column you choose. It's about like looking up the mileage from Las Vegas to Salt Lake City on a road map while sitting in a coffee shop. Use the schematic on the screen to follow along with the instructions below.

The degrees of freedom run down the side of the table. Our degrees of freedom (five plus four minus two) equal seven; so we go down to the row with "7" in front of it. Our critical value will be somewhere on this row. Next we go across the top of the table based on alpha.

But there's one complication. Alpha is different if you have a one- versus a two-tailed test. So the table has a distinction between one- and two-tailed tests. A common symbol for this is Q (one-tailed) versus 2Q (two-tailed). Since we have a one-tailed test, go across the top of the table using the alpha's in the Q row.

When you get to .05 you will be above the correct column. Then you'll come down to where you meet the seven degrees of freedom row. There, at the intersection of the row and column, will be the critical value, 1.895. This is the critical value of t that we need for our particular example.

Critical Value Divides the Distribution

Now we know the exact value of our critical value. So we can put critical t = 1.895 right below the line we've drawn to divide the range of t into two regions. We are going to talk in more detail about what this picture means in the section on statistical conclusion validity.

Back To Menu Locator Map

One Tailed Lower Test

Back To Menu Locator Map

We have already discussed a one-tailed upper test using the psychotherapy research study as an example. We are going to change the example in order to discuss one-tailed lower tests.

Imagine that a researcher is doing the same psychotherapy study but instead of using mental health scores as the dependent variable (DV) the researcher is going to use the number of symptoms a client has as the DV. When the scientist used mental health scores as the DV, higher scores meant better mental health. However, if the scientist uses number of symptoms as the DV then LOWER scores are indicative of better mental health. The people who have fewer symptoms are the ones who are healthier. Therefore the scientist now expects M1 (the mean of the treatment group) to be lower than M2 (the mean of the control group).

 

Reject Ho in a
One-Tailed Lower Test

If the Treatment group will have smaller values on the DV than the control group then the result of M1-M2 will be a value less than zero. The null or skeptical hypothesis expects the value of t to be at or near zero. Logically if we obtain a t value that is much smaller than zero (in this particular case) then we have found strong support for the scientific hypothesis and can be fairly sure (usually 95% sure) that we can correctly reject the null hypothesis.

Now we are going to go through the same process we discussed for a one-tailed upper test. First we need to determine the critical value of t. To do this we need to know the degrees of freedom associated with the test. We also have to determine whether it is a one- or two- tailed test and what the alpha level is. These decisions are made in the same way that was presented for the one-tailed upper test. If you are unclear about how to do this, please review that example above.

Just to review, here is how to find the critical value on the table. There's only a few simple points to using the table.

First, the critical values are the numbers in the body of the table. You just have to go down some number of rows and across so many columns and read the number you find there.

The degrees of freedom run down the side of the table. Our degrees of freedom (five plus four minus two) equal seven; so we go down to the row with "7" in front of it. Our critical value will be somewhere on this row. Next we go across the top of the table based on alpha. But there's one complication. Alpha is different if you have a one- versus a two-tailed test. So the table has a distinction between one- and two-tailed tests. A common symbol for this is Q (one-tailed) versus 2Q (two-tailed). Since we have a one-tailed test, go across the top of the table using the A's in the Q row. When you get to .05 you will be above the right column.

Then you'll come down to where you meet the seven degrees of freedom row. There, at the intersection of the row and column, will be the critical value, 1.895.

This is the critical value of t that we need for our particular example. However in this example you need to remember that the value in the table is an absolute value. For a one-tailed upper test you should consider the table's value to be equal to + 1.895 but for a one-tailed lower test, you should consider the value to be - 1.895 (a negative value less than zero).

Back To Menu Locator Map


Two-Tailed Test
Back To Menu Locator Map
Two-Tailed Test

To discuss a two-tailed test we have to change our example again. Now our group of psychotherapy researchers are testing a new, controversial, and powerful therapy technique but they are unsure if this treatment will be beneficial or detrimental to the client. Now the scientific research team is unsure which of the two means will be greater or less than the other. They are unsure and think it could go either way. So the alternative or scientific hypothesis is just that M1 (the mean of the treatment group) could be either higher or lower than M2 (the mean of the control group). The scientists are not predicting the direction anymore; what they are predicting is that the two means will be very different from one another.

Since the test is no longer directional, the researchers are not predicting that the t value will be higher than 0 or less than 0. All they are predicting is the t value will be different from zero to such a magnitude that the researchers are fairly confident that they can reject Ho.

Non-Directional

and Two-Tailed

A two-tailed test is, by definition, a non-directional test. In other words we are going to reject Ho if the calculated t is far enough away from zero. It could be far enough away and below zero (and hence a negative value) or it could be far enough away and above zero (and hence a positive value).

Critical Values for Two-Tailed Tests

Again we are going to use the same basic process to determine whether the obtained or calculated t value is of a sufficient magnitude that the researcher can reject the null hypothesis. First, one determines the degrees of freedom associated with the test. Second, one determines whether it is a one- or two- tailed test. In this particular example we have determined that a two-tailed test is most appropriate to the research question. Finally one selects the alpha level for the test.

Just to review, for this example the researcher has selected a two-tailed and nondirectional test.

Finding the Critical Values for Two-Tailed Tests in the Table

Just as before, the critical values are the numbers in the body of the table. You just have to go down some number of rows and across so many columns and read the number you find there. Use the schematic on the screen to follow along with the instructions below.

Just as before, the degrees of freedom run down the side of the table. Our degrees of freedom (five plus four minus two) equal seven; so we go down to the row with "7" in front of it. Our critical value will be somewhere on this row. Next we go across the top of the table based on alpha. But there's one complication. Alpha is different if you have a one- versus a two-tailed test. So the table has a distinction between one- and two-tailed tests.

Since we now have a two-tailed test, go across the top of the table using the alpha's in the two-tailed row. When you get to .05 you will be above the correct column. It is essential that you make sure you are in the 2Q row since the Q row will give you the value associated with a one-tailed test and the critical value will be lower.

Then you'll come down to where you meet the seven degrees of freedom row. There, at the intersection of the row and column, will be the critical value, 2.365. This is the critical value of t that we need for this particular example. In this example you need to remember that the value in the table is an absolute value. For a two-tailed test you should consider the table's value to be equal to + 2.365 OR -2.365. If the researcher gets a calculated t value of less than - 2.365 or greater than a + 2.365, she can reject Ho with some assurance that she is correctly doing so.

 

Divide the Range of T

Here is a picture that shows how the rejection regions fall on either side of zero. If the researcher gets a calculated t value of less than - 2.365 or greater than a + 2.365, she can reject Ho.

Back To Menu Locator Map


 

Back To Menu Locator Map

Now let's use all the pieces we've developed in the above lectures to evaluate the PCH of chance. That's what Statistical Conclusion Validity refers to. If we can validly argue that chance alone did not generate our research results, then we have good statistical conclusion validity. If we can validly argue that chance alone generated our research results, then we have poor statistical conclusion validity.

Another way to say this is that based on our statistics we make a conclusion. That conclusion validly argues against chance or it does not validly argue against chance.

Let's see how all this works.


Let's begin by reviewing the 4-step process from population to sampling distribution. First, we assume that Ho is true and that the DV scores in the Therapy and No Therapy groups are just two samples from a single normal distribution. Second, we do the research; that is, we collect the two samples. Third, we calculate our test statistic (t for independent means) on the data from our samples. Finally, we build our Reject Ho logic on the sampling distribution of t.

Now lets look at that logic more closely.

 

As we've argued before, if Ho is true, then the expected value of t should equal 0. So the sampling distribution of t should have 0 as its center.

We've also argued that the psychotherapy example is a one-tailed test in the upper tail of the t distribution. And we've found the critical value of t to be 1.895.

So we put the critical value (red line on the graphic) above zero in the upper tail.

Remember, we choose alpha to be .05. That means the critical value divides the probability under the t curve in such a way that (if Ho is true) the probability of falling in the "Do Not Reject Ho" region is 19 out of 20 or .95.

By the same logic, the probability of falling in the "Reject Ho Region" is 1 in 20 or .05.

Remember what this all means. If Ho is true (that is, the means differ from each other by chance alone) then the probability of falling in the rejection region and incorrectly rejecting Ho is only .05.

That is, if Ho is true and chance alone is acting in the data, the probability of getting a calculated t value above 1.895 is .05.

 

Let's see where the calculated value falls. When we calculated the actual t value, we found it to be +3.52.

As the graphic shows, a calculated value of 3.52 falls in the rejection region because it is greater than 1.895.

So we reject Ho.

 

Let's go back and put that into our 4-step schema and review the big picture.

We assume Ho is true and our data are just two samples from the same normal population. Therefore the data, including the two means, differ only by chance.

[Here, you may want to recall your experience with the Double Sample Tool and Detect Difference Game, or even go back and play with those again because your experience with them is important for understanding this whole argument.}

Next we calculate the t for independent means. We find it to be 3.52. Then calculate the degrees of freedom and choose alpha = .05. Then find a critical value that divides the values of t into two regions. When we look at the calculated value of t, we decide that it is larger than the critical value of t. So the calculated value of t falls in the "Reject Ho" region.

Back To Menu Locator Map


 

Back To Menu Locator Map

REVIEW: In the last section we found that if Ho is true, by chance alone there is a .95 probability that a calculated t will fall into the "Do not reject Ho" region.

Conversely, if Ho is true, then by chance alone there is only a .05 probability that a calculated t will fall in the "Reject Ho" region.

 

Our calculated t fell in the "Reject Ho" region so we rejected Ho.

Remember that Ho assumed that there was only one normal population and that the two samples (and their means) would differ only by chance.

So we are rejecting the idea that the two samples are from only one distribution.

 

On the other hand, H1 was based on the idea that the two samples (and their means) came from two different normal population. On the graphic the Psychotherapy data was sampled from the red population where mental health scores are higher. In contrast, the No Psychotherapy data were sampled from the green population which has lower mental health scores.

So we expect the Psychotherapy mean (M1 from the red population) should be higher than the No Psychotherapy mean (M2 from the green population).

Therefore, under H1, we expect t to be greater than zero. Our calculated value of t =3.52 which is much higher than zero is what we expected under H1. So H1 remains plausible after we have collected the data and calculated the t-test.

In the realm of statistics, Ho has been been shown to be improbable. H1 remains plausible.

The statistical conclusion that we can make, therefore, in the realm of science is that Chance which once was thought to be plausible is no longer plausible.

In contrast the way the two means came out (Psychotherapy higher than No Psychotherapy) is consistent with the scientific hypothesis.

 

So Chance is no longer a plausible hypothesis while the scientific hypothesis still is plausible.

 

 

 

 

 

What we have done is eliminate one important hypothesis (Chance) competing with the scientific hypothesis.

But there may be other competing hypotheses. For example, critics and skeptics may point out that our control group (No Psychotherapy) is a poor one. The volunteers in the No Psychotherapy group received no attention at all. They were told that they were on a waiting list and would called back. The Psychotherapy group received not only the psychotherapy but also a lot of human attention. Data pattern (Mean 1 greater than Mean 2) is consistent with the plausible competing hypothesis that people who receive human attention get better. It's not the psychotherapy, it's the attention.

The researchers need an "Attention Placebo" control group. These kinds of issues are a matter common sense, critical thinking and a good understanding of the research literature in the journals. They are usually addressed in great length in a research methods class.

Statistical conclusions only evaluate Chance as a competing hypothesis.

It is very important to realize there are many other PCH's which need to be addressed by other means.

 

 


Back To Menu Locator Map