t-Independent Lecture
Trouble
Printing? Download Acrobat File here!
Copyright 2000, Tom Malloy.
This
is the text of the in-class lecture which accompanied the Authorware
visual graphics on this topic. You may print this text out and
use it as a textbook. Or you may read it online. In either case
it is coordinated with the online Authorware graphics.
PRINT:
You may print this web page on your local printer if you wish.
Then you can read the hard copy of the lecture text as you look
at the Authorware graphics.
Topic
Locator Map
This
map allows you to--
-
Jump directly to a topic which interests you.
-
Co-ordinate the dynamic visual Authorware presentations with the
corresponding text available on this web page.
1. To
find a topic which interests you: Look at the map of menus above.
Choose a menu that interests you. Notice that the menu buttons have
topics printed on them. Click on any button (topic) on the menu;
you will jump directly to the text that corresponds to the topic
printed on the button.
2. To
coordinate this web page with Authorware presentations: The corresponding
Authorware program should already be open. Go to the menu of your
choice in the Authorware program and click any button which interests
you. Then on the topic locator map above click on the same button
on the same menu; you will jump to the text that corresponds to
the Authorware presentation.
End
of Topic Locator Map
Background
Inferential Statistics. In an earlier overview on statistics,
we described the two basic types of statistics which you will encounter
in this course, descriptive and inferential statistics. All the
different statistics we have discussed so far--mean, median, variance,
standard deviation, correlation, and regression--are types of descriptive
statistics. They are used to describe samples and populations.
In the lectures on sampling distributions, estimating parameters,
and hypothesis testing (statistical conclusion validity), we laid
the theoretical foundations for comprehending how inferential statistics.
We are now about to start on one of the most exciting topics in
this course--applied inferential statistics. These are the statistics
that are used to test hypotheses or make inferences about hypotheses
and populations. These will include t-tests, chi-square tests, and
F-tests. But we are beginning with t-tests.
The t-test was the first applied inferential statistic developed.
It is important because of the power it has given to researchers
to evaluate the results of experimental research. It is probably
the most used statistic in social and medical research because of
its wide applicability. Learning how to use it gives you the same
power to conduct research and evaluate statistical hypotheses as
researchers have.
Historical Note. The t-test was originally developed by
a statistician named W.S. Gosset who published his research under
the pseudonym of "Student" very early in the 20th century.
"Student" was the most important predecessor of Fisher
who was described in the Difference to Inference lecture. The theory
behind the t-distribution was a breakthrough. Up to that point,
use of a probability distribution required that you knew values
of the population parameters such as mu and sigma. The t-distribution
permits us to use the sample estimates in calculating probabilities.
The story, as I heard it, was this. A brewry in Milwaukee around
the turn of the century was confronted with a research puzzle. They
thought (hypothesized) that their new brewing process would produce
beer that was better. They had measurements of beer quality, but
the data were ambiguous and not clear cut. Did the data really show
that the new brewing process was better than the old way of doing
things. If so, they were willing to invest in the new brewing process.
But they did not want to spend all that money of the new process
didn't really produce better beer. So they hired a young mathematician,
W. S. Gosset, to address that puzzle. Gosset invented the now famous
"student's t" in this context. He, of course, wanted to
publish his new discovery. But his employers, realizing the power
of this new idea, didn't want other breweries to have this same
advantage. So Gosset published his theory in mathematical journals
as "Student." His employers felt that esoteric math journals
were the sort of reading that breweries did on any regular basis,
nor would they understand what to do with the theory even if someone
brought it to their attention. In this and the succeeding lectures,
we will try to give you the basis for understanding a tool so powerful
that at least one business sought to hide from their competition.
Some people take that story to mean inferential statistics is the
work of the devil. Others take it as a demonstration of how alcohol
leads to the refinement of civilaztion. Either way, "Student's
t" has been imposed on students for many generations.
There are many different applications of the t-tests. t-tests are
used for comparing independent means, for comparing correlated means,
for testing hypotheses about single means, for correlations, for
regression statistics, and many more. This lecture will address
the t for two independent means.
Degress of Freedom. One new concept you will encounter with
the t-test (and all other inferential statistics) is degrees
of freedom. Generally, degrees of freedom will be the number
of observations in the sample minus the number of means. For a one
group study, where there is only one mean then df = n-1. For a two
group study, where there are two means (a mean for each group),
the df = n1 + n2 -2, where n1 is the number of scores in group 1
and n2 is the number of scores in group 2. But we are getting ahead,
so don't worry about that for now.
Formulas for calculating the degrees of freedom give you a value
which determines the shape of the t-distribution. We will talk more
about this later in this t lecture and describe how degrees of freedom
are used in the t-test. A later lecture will develop the theory
behind degrees of freedom.
Let's begin with a simple, applied example: Elite Skiers.

Elite
Skier Example
Suppose a sports psychologist
wants to study the effects of using an imagery technique on the
performance of elite skiers.
DEPENDENT VARIABLE. She decides the dependent
variable will be the elapsed time down some famous course. (Elapsed
time is how long it takes skiers to ski down under racing conditions.)
The psychologist assumes that elapsed time data can be modeled by
a normal distribution.
INDEPENDENT VARIABLE. The independent variable
for this study is using (or not using) an imagery technique (IT).
The imagery technique is based on imagining in vivid detail that
one is completing a perfect run down the race course.
TWO GROUP STUDY. She plans to do a two group
study. She will randomly divide a pool of world-class, elite skiers
into two groups. The treatment group will receive her special imagery
technique (IT). The control group will receive no special treatment.
Back To Menu Locator
Map
Control
Population
Some of the study's participants will get imagery
training and some will not get any special training. The group that
does not get any imagery training is called the control group. It
is assumed to be sampled from the control population.
The control population is the population of world-class
elite skiers. For this example, let's assume that the mean (mu)
elapsed time down the course for elite skiers is 122.66 seconds.
The control population is expected to have a normal distribution,
shown here as a green normal curve.
Imagery
Technique Population
The researcher expects that the imagery technique
will improve the performance of elite skiers. Her expectation is
that those elite skiers in the group which receives the imagery
technique (IT) training will have lower elapsed times down the course
than those without the IT training. In other words, she thinks that
there are actually two populations--the Imagery (Treatment) Population
and the untreated Control Population.
The IT population is expected to have a normal
distribution, shown here as a red normal curve. For the moment,
let's assume that the imagery technique (IT) works. Let's suppose
that the mu of the treatment population is 120.47 seconds.
HOMOGENEITY OF VARIANCE. As we pointed out in
the Detect Difference and Double Sample lecture, statisticians commonly
assume that the two populations (control population and treatment
population) have the same variability. That is, the sigma's for
the two populations are equal.
ONLY MU's CAN DIFFER. In general the Normal Distribution
has only two parameters (mu and sigma). Therefore, because we have
assumed the sigma's are equal, the only potential distinction between
the treatment and control populations comes from their mu's.
Two
Populations
In summary, the researcher hypothesizes
that the IT treatment is effective in improving performance. In
probability theory, this scientific hypothesis is modeled in terms
of there being two populations--the IT Treatment Population and
the untreated Control Populaltion.
The Treatment and Control populations are
assumed to have the same sigma.
In the graphic, the red distribution is
the model of the elapsed times of the IT Treatment population and
the green distribution is the model of the elapsed times of the
Control or regular population that did not receive imagery training
The participants in the two groups are similar;
both the Control Group and the Imagery Group are elite, world class
skiers; they have comparable race times. The researcher will compare
the performance of skiers sampled from the Control Population to
performance those sampled from the Imagery Population in order to
determine whether the imagery technique affects performance.
Decision
Making
Even though researchers make hypotheses
about populations, they hardly ever are able to test an entire population.
Instead they collect samples from relevant populations. Then the
researcher examines the sample data and makes an inference about
the populations. This scientific thinking strategy is very similar
to your experience with the Detect Difference Game. So think about
your experience with that game as we continue on in this discussion.
For instance, this researcher wants to know
if the IT technique will help elite skier's performance. She can't
test all elite skiers in the world so she takes a random sample
of elite skiers and gives half of them the IT training and the other
half no special training. Then she looks at the performance of these
two groups in order to make an inference about the effectiveness
of the imagery technique.
Another way of thinking about the researcher's
question is "Do the sample groups come from two different populations
or from one population?" If the IT treatment has an effect
on skiers' elapsed time on the course then the two groups come from
different populations. If the treatment has no effect then the two
groups come from the same population.
Ho versus H1. We will discuss this in greater
detail further along in this lecture, but for now we can point out
that the null hypothesis is equivalent to saying that there is only
one population and the alternative hypothesis is equivalent to saying
that there are two populations.
Theoretical
Treatment Effect
Here are the two populations again. For
this example we have made up that the IT population has a mean (mu)
elapsed time of 120.47 seconds and the control group or regular
population has a mean (mu) elapsed time of 122.76 seconds.
The size of the theoretical treatment effect
is easily calculated. It is the difference between the mu of the
treatment population and the mu of the control population. In our
example the treatment effect size is -1.29 seconds.
We have worked with this idea of treatment
effect in the Double Sample tool and the Detect Difference lecture
so that you should have lots of experience with it.
Larger treatment effects are easier to detect
than smaller ones so if the mu's are far apart it is easier to determine
that there are two populations rather than just one. When the distributions
get closer and there is more overlap between the curves, then it
is more likely that you will need a statistical test like a t-test
to determine whether there are two populations or only one. The
levels of difficulty on the Detect Difference Game are based on
treatment effect size.
Science
vs. Skepticism
The scientist proposes that the treatment
is effective and so there are two separate populations - the treatment
and the control populations. In other words the IV has an effect
upon the DV and therefore there really are two populations -- those
who have been trained and can get down the hill faster, and those
who haven't been trained who get down the hill slightly more slowly.
The skeptic, of course, does not believe
in this imagery stuff. So to the skeptic, it doesn't really matter
if someone receives training or not, it's all the same. In other
words, the skeptic believes that the treatment is not effective
and therefore both groups are coming from the same population. That
is, there is only one population.
Homogeniety
of Variance Assumption
Just to repeat: Statisticians assume that the
two populations have the same sigma. That's a standard assumption
for t tests and all of these kinds of statistical tests. The variances
are assumed to be equal, and so the only possible difference between
the two populations is their mu. The real question is, given that
the sigma's are the same, "Is there a difference between the
mu's?"
Statistical
Hypotheses
NOTATION: We need general symbols that go
beyond this example. So, for this example, let's call the treatment
population Population 1. Population 1 has as its center, Mu 1. Let's
also call the control population Population 1, with Mu 2 as its
center.
NULL HYPOTHESIS: Ho, the null hypothesis,
comes from the skeptical hypothesis. One general form of Ho states
that "Mu 1 minus Mu 2 is equal to zero." If the two population
mu's are equal then their difference is zero. [Another way to say
this is that if the two mu's are the same, then there's just one
population.]
ALTERNATIVE HYPOTHESIS. On the other hand,
the alternative hypothesis, H1, states that "Mu 1 minus Mu
2 is less than zero." Remember we are measuring elapsed times
so smaller values on the DV indicate better skiing performance.
In other words, the treatment population will have lower scores
than the control population. Therefore, there are probably two distributions.
This scientific hypothesis is directional.
The statement "Mu 1 minus Mu 2 is less
than zero" implies there are two populations. The logic is
that if the two mu's are different, then there have to be two populations.
ANSWER TO STUDENT QUESTION: Yes, H1
always corresponds the scientific hypothesis, and Ho always corresponds
to the skeptical hypothesis. Ho particularly corresponds to the
assertion that the data pattern is due to chance. So, Ho goes with
skepticism and the PCH of chance.
How
do we decide if our two groups come from different populations or
from the same population? We decide in the same way you decided
when you played the Detect Difference Game.
REVIEW: The dependent
variable is elapsed time; the independent variable is whether or
not the skiers receive the imagery treatment. There are two groups
of skiers; one group was trained and the other was not. The two
groups both race down the same course. The researcher finds that,
on the average, the elapsed time for the imagery group is less than
the control group. The scientist predicted this result.
The scientist wonders if she is sampling from
two populations (as H1 suggests) or from only one population (as
Ho indicates)?
By now you have enough experience with our probability
tools to know that if Ho is correct and if you take two samples
from the same population, you're going to get chance differences
between them. The whole PCH of chance idea is that these data, the
imagery group having lower elapsed times, could be due to chance
alone. Another way of saying this is that the skeptic thinks these
data could result from two samples from the same population.
Deciding:
1 or 2?
Just like in the Detect Difference Game, the problem is that the
two populations are not available to the researcher for examination.
In reality, scientists don't know the populations, all they know
is the data. We've got to decide from the data whether or not there
is one population like Ho says or two populations like H1 says.
This is the whole point of all the experience
you had playing the Detect Difference Game. The idea is to put you
in the formal situation of the scientist, and in the Detect Difference
game, you don't see the populations. You have to use the data to
guess whether there are one or two populations. After you make your
guess the Detect Difference game gives you feedback by showing you
the populations.
Scientists never get that kind of feedback
because these populations are only models. They don't really exist.
It may be worth just going back and playing the Detect Difference
Game again to help solidify these ideas.
SUMMARY: We have two samples of data. Did
the data come from two populations, a red and a green one, or are
they just two samples from the single green population that differ
by chance? We can't see the two populations, all we see is the data.
That is the conceptual background for the
t-test. Let's review another research
study example.
Back To Menu Locator
Map

Back
To Menu Locator Map

Psychotherapy Example
In our next example, we're moving from elite
skiers where we want to reduce the value of the dependent variable,
to mental health, where we want to increase the value of the dependent
variable. People want more mental health and skiers want to take
less time down the hill. So it all depends on the context of the
study which direction we would want our results to go.
Design
of Psychotherapy Study. Let's assume that a group of scientists
is interested in determining whether a new psychotherapy they have
developed is effective. As a first step in this process, they do
the following experiment. They randomly divide people who have consented
to participate in the study into two groups: A group which receives
the new psychotherapy and a group which is told that no psychotherapist
is currently available. This group is told it is on a waiting list
to receive psychotherapy later.
After the study, both groups are asked to come into the counseling
center where they are measured on the dependent variable (DV). The
DV is some set of measurement operations which give us a number
for each participant. This may be a paper and pencil test of depression
or a structured interview or behavioral observation. But whatever
it is it must generate a number for each participant.
NOTE: In the Interface to Science
lecture, we discussed how some measurement operations are elegant
and valid while others are very poor and lack construct validity.
These validity issues are crucial because all that statistical procedures
can do is work with numbers. If these numbers are invalid or are
conceptually shaky, then the statistical conclusions will be invalid
or shaky. Creating valid measurement operations is an important
scientific issue that provides a foundation for statistical procedures.
Let's assume (and it's a big assumption) that we have a valid measure
of mental health. The scientists manipulate the IV by giving the
psychotherapy group the new psychotherapy while withholding the
new psychotherapy from the waiting list control group. At the end
of the study both groups are measured for mental health (DV).
Of course, at this point, the waiting list control group is given
the new psychotherapy for human and ethical reasons, but this is
not part of the study.
The
Hypotheses
The scientific hypothesis expects the
IV, psychotherapy, (1) will have an effect and (2) that this effect
will be beneficial to mental health (DV).
This experimental design implies that
the psychotherapy group (experimental group) will have higher
mental health scores than does the control group. The screen shows
data that fit the scientific hypothesis. The experimental group
shows higher mental health than the control group. The data pattern
fits the scientific hypothesis. It is crucial to look at the pattern
of results before rushing off into inferential statistics such
as a t-test. There would be no point in further analysis if the
therapy group had lower mental health scores than the no therapy
group.

Ask yourself, "How does the data pattern relate to the predictions
of the scientific hypothesis?" Your answer to this question
combined with a bit of common sense will guide you on decisions
about what sort of statistics you want to do. The scientists are,
no doubt, happy to find a fit between their predictions and the
data pattern.
The
Skeptic replies...
The skeptic will not be deterred by the fact that
the data pattern seems to fit the scientific hypothesis. As a first
critique, the skeptic will claim that the pattern of results happened
by chance alone. After all, one of the two groups had to have higher
scores, so the scientists were just lucky that the group they predicted
would be higher came out above the control.
The
Skeptic's Logic
For example, suppose we divide all the
students in any classroom into two groups randomly or, at least,
irregularly. Suppose we put all those sitting in the odd numbered
rows into one group and students sitting in the even numbered rows
in another group. Then suppose we measure them on anything (GPA,
height, the number of calories they ate yesterday).
We would expect that the means of the two
groups by chance alone would differ. The means would not be equal
right down to the fifth decimal point. The mean of one group would
be higher than the mean of the other group.
For example, if we measure caloric intake
no one would claim (without great mental gyrations) that, for example,
people who sit in odd numbered rows eat more calories than people
who sit in even numbered rows just because the mean calories eaten
yesterday by the odd numbered group came higher than the mean of
the even numbered group. The point is, in a very general way, it
is plausible to argue that the results of any research study occurred
by chance alone. This is the Plausible Competing Hypotheses (PCH)
of chance. It is the first issue you must address in a discussion
with a skeptic. Formally, evaluating the PCH of chance is called
"Statistical Conclusion Validity." Inferential statistics
such as t-tests evaluate the statistical conclusion validity of
research.
A t-test for independent means will evaluate the claim that the
group means differ only by chance. It will do nothing else. It will
not help with the criticism that psychotherapy ought to have been
evaluated against an "attention placebo" group. Issues
around the appropriateness of the control groups used are included
in "Internal Validity" and are not formally a part of
statistics. Internal Validity is prior to and more important than
statistical procedures.
Science to Statistics
Interface Section
Skeptical
Hypothesis
The skeptic hypothesizes that the IV (psychotherapy
treatment) is ineffective, meaning it neither improves or worsens
mental health. Because the IV is ineffective there will be only
chance differences between the two groups in the study - the control
group and the treatment group. The skeptic assumes
that these chance differences between the groups is strong evidence
that the two groups are from the same population with one mu.
The skeptic would look at the graph of the results
and say, "Yes there are differences, of course, but they are
only chance differences."
Scientific
Hypothesis
The scientist hypothesizes that the IV (psychotherapy
treatment) is effective, meaning it improves mental health. Because
the IV is effective there will be a significant difference between
the two groups in the study - the control group and the treatment
group. The scientist assumes that the difference between the groups
is strong evidence that the two groups are different populations
with different mu's as opposed to two mu's
that are from the same population and differing only by chance.
The scientist would look at the graph of
the results and say, "Yes there are differences, which are
due to the effect of the treatment."
From
Scientific and Skeptical Hypotheses to Statistical Hypotheses
Next we need to translate our scientific hypotheses
into statistical hypotheses. The skeptic's plausible competing hypothesis
that any differences in our measurements of mental health (DV) between
the two groups occurred only due to chance will be expressed in
statistics as the null hypothesis (Ho). The scientific hypothesis
that differences in mental health measurements between the two groups
are caused by psychotherapy will be expressed in statistics as the
alternative hypothesis (H1).
Ways of Expressing
the Skeptical Hypothesis
In terms of this example, the skeptic expects
that the difference between the two populations mu's will be zero.
Another way of expressing this is to state the it is EXPECTED that
difference between the two sample means will be zero.
You'll see both these forms in different
books and I just want you to realize that those forms are exactly
the same, logically speaking. That's what the skeptical hypothesis
says in essence. Remember that the skeptical hypothesis goes with
Ho. So Ho will be stated in a parallel way
. Ho
in Symbols
TWO COMMON FORMS: As you can see from the
graphic, Ho can be expressed in terms of the population means (top
formula) or in terms of sample means (bottom formula).
Note: In the
bottom formula, the large E is technically the"Expectation
Operator" in probability theory. But for the intuitive
level we are approaching the material in this class, you can assume
E means pretty much what the word"expects" means in ordinary
English. Ho says that the skeptic "expects" (E)
[Mean 1 minus Mean 2] to equal zero. Of course, the actual sample
means vary a great deal by chance alone, the skeptic EXPECTS the
difference between the means to be zero.
Ways
of expressing the Scientific Hypothesis
The scientific hypothesis says that there are
treatment effects of psychotherapy. Consequently the treatment population
mu is higher than the control population mu.
This can also be expressed at the in talking about
the data means. The scientist expects Mean 1 (the mean of the Experimental
Group which was given psychotherapy) to be greater than Mean 2 (the
mean of the control group which was not given psychotherapy). Is
the scientific hypothesis directional or non-directional? Answer
this question for yourself before going on.
Symbols
for H1
In
symbols H1 can be expressed in terms of treatment populations as
in the top formula.
Or H1 can be expressed using the "Expectation
Operator." According to H1 we EXPECT the difference between
the data means to be greater than zero. This is because M1 comes
from the psychotherapy group which should have higher mental health
than does the control group.
One-Tailed
Alternative Hypothesis
Is the scientific hypothesis
directional or non-directional?
The scientific hypothesis
is directionalbecause the scientist is expecting the mean of the
psychotherapy group to show higher mental health scores than the
mean of the control group. That is, we are predicting a direction.
We're NOT just saying that
the two means will be different somehow. We are NOT saying that
it doesn't matter which way the means differ. We ARE specifying
that the psychotherapy group should do better than the control group.
That is why the scientific hypothesis is directional.
As we've said, a directional scientific hypothesis
translates into a one-tailed alternative hypothesis. H1, the alternative
hypothesis, says that we expect Mean 1 to be greater than Mean 2.
We call this a one-tailed alternative. That is, we will reject Ho
in one tail of the distribution.
We've been over this a little bit in the lecture
on Hypothesis Testing. But we haven't developed the idea of one-tailed
versus two-tailed tests fully yet. So we'll come back to this idea
a little further along in the lecture.
STUDENT QUESTION: If the scientific
hypothesis were non-directional, does that mean we would have a
two-tailed test?
ANSWER: In short: Yes. It's a good
question, and we'll circle back to it eventually, but as long as
it's asked right now, let's give an answer. To do this we need to
alter the example a little. What if there's another scientist who
disagrees deeply with your approach to therapy. This scientist may
think that you have some effective therapy components but due to
the way you are putting them together, the scientist can't tell
what will happen. In her opinion, your therapy might either make
people worse or make them better. She doesn't know what it will
do. So her hypothesis is a non-directional scientific hypothesis.
Therapy might work but she doesn't know which direction. When you
have a non-directional scientific hypotheses it will lead to a two-tailed
H1. On the current screen there is no two-tailed H1 written down.
We'll get to looking at an example a little later. But a two-tailed
alternative would look like this:… H1: E ( M1 - M2 not = to 0).
That is, we "Expect Mean 1 minus M2 to not equal to Mean 2."
That's the set up for statistical conclusion validity. Remember,
Statistical Conclusion Validity only evaluates one simple Plausible
Competing Hypothesis: The PCH of Chance. It evaluates the plausible
claim that (even if the data pattern is consistent with the scientific
hypothesis) the data pattern occurred only due to chance. Any results
might have happened, and the scientist was lucky: the data pattern
turned out as expected. Statistical Conclusion Validity does NOT
evaluate deeper scientific issues such as whether or not the design
of the study makes sense or whether or not the scientific hypothesis
has gained support through the data. Statistical Conclusion Validity
simply gives a probability statement regarding the conjecture that
the results only happened by chance.
Back To Menu Locator
Map

Let's go ahead now and look at the formula
for the independent means t-test.
Back
To Menu Locator Map
Formula
Here are two forms of the formula.
Write them down and then we will explain them. The two forms of
the formula are exactly the same, except for the top line. In the
first formula the top line ends with E(M1 - M2 given Ho). In the
seccond formula, the top line ends with (mu1 - mu2 given Ho).

Now let's make sure that all the parts of the formula are clear
to you. M1 is the mean of whichever experimental condition you're
calling the first group. In our example, M1 is the psychotherapy
mean. M2 is the mean of the other experimental condition. In our
example, it is the the mean of control group mean on posttest scores.
THEORETICAL TERM. The next term, E(M1 - M2 given Ho), is theoretical.
As I mentioned when defining the statistical hypotheses, the E
can be understood as the verb "to expect." And the thick
vertical line is a symbol for "given" or "given
that (whatever follows it) is true." So you can read this
term in ordinary English as "the expected value of the difference
between Mean 1 and Mean 2 given Ho is true." In general,
the expected value of Mean 1 minus Mean 2 given Ho is true is zero.
Another way to think about this term (see second form of the formula)
is that it represents treatment effect size according to Ho. Usually
Ho says that the treatment effect size (mu1 minus mu2) is zero.
So, again, this theoretical term is usually 0.
So in our example we would simply put a "0" into the formula
on the top line where the theoretical term is. It's just a big,
long symbol that almost always equals zero. It so often has a value
of "0" that many textbooks don't even put this term into the
formula because it's just clutter. In this class we will only use
examples in which Ho expects the difference between means to be
zero. (But there are cases where Ho might not expect the difference
between the mean to be zero. I'll give an example, below, under
supplementary information.)
Parts
of the Formula
Next we have n1, this is the number of subjects
in group one. Correspondingly, n2 is the number of subjects in group
two. The number of subjects in groups one and two can be different;
you can have a different number of subjects in the two groups. Next
we have the variances of the two groups. The variance of group one
is s1 squared . The variance of group one
is s2 squared. [Remember we are
using the true variance formula which is divided by n. There is
another variance formula used to estimate population variances;
that formula has (n -1) as a divisor. For people who are using that
formula t would have to have slight modifications.] An idea which
we've not discussed before is the degrees of freedom (df). Right
now, just note what the "df" symbol means in English.
We will work with it more later, but as a start you can note that
in this case df = n1 + n2 - 2.
That finishes the definitions of the various
parts of the formula.
Supplemental Information:
[You will not need this information in this class. So read this
section only if it is of interest to you. It is provided as a response
to a frequently asked student question.] STUDENT
QUESTION: "Does the expected value of the difference
between the means always have to equal zero if Ho is true?"
ANSWER: It
is logically possible that you would have a case and a hypothesis
in which Ho is actually predicting some known difference between
groups. Let's think of an example. Okay, let's say there's a known
difference in reaction time between men and women; women have faster
reaction times than men. Let's say that difference is known to be
10 milliseconds. So you could do some intervention with men, perhaps
you could give a group of men six cups of Café Gourmet's strongest
brew. A comparable group of women would be given six cups of decaf.
Suppose the scientific hypothesis is that caffeine will decrease
reaction time (so the men should be responding as fast or faster
than the women). The skeptic says that caffeine has no effect on
reaction time.
If the skeptic is right, then your group
of men should still be responding 10 milliseconds slower than women.
So Ho would be that we still expect the difference between the means
to be 10 milliseconds. Therefore, in this example for the E( M1
- M2 given Ho) term you would put 10 instead of 0. This is because
10 milliseconds actually represents no change as a result of your
coffee intervention. So, theoretically, this term could have a value
other than zero. But in this class we won't use those kinds of examples.
Back To Menu Locator
Map

This section begins
with a series of screens that we've already discussed. They serve
as reminders of what the scientific example is. Go over them quickly
and then move on to the screen shown below.
Back
To Menu Locator Map
Calculating
t
This screen presents the data from our hypothetical
experiment. The numbers are a result of our dependent variable measurement
operations and indicate mental health, with higher numbers corresponding
to better mental health.
The formula for the t test for independent means
is presented so we can use it in the context of the example. BEFORE
you go onto the next screen, the most important activity for you
at this point is to use the data to substitute into t formula, writing
the substitution in your class note outline. The next screen will
show the correct substitution but you will learn the most by being
active and substituting into the formula on your own.
When you have made your substitution go onto the
next screen and carefully check to see if you understand how to
use the formula. A series of screens will follow, showing the steps
of the arithmetic, including the calculated t value and the degrees
of freedom. When you are comfortable with how to use the formula,
go on to the next section.

Back
To Menu Locator Map
Critical
Values of t

Now that we have a calculated t,
we have to find a critical t. Using this critical value, we will
divide the range of our test statistic (in this case t) into two
regions: the "reject Ho region" and the "do not reject
Ho region."
Back
To Menu Locator Map
One
or Two Tailed? Where to Reject?
All t tests are either one-tailed or two-tailed.
The one-tailed tests have either an upper or a lower rejection region.
This refers to whether you are rejecting in only one tail of the
distribution (the right long tail, or the left long tail) or whether
you can reject in either of the two tails of the distribution (see
the top distribution to the left).
To set up rejection regions, one has to
decide two things. (1) Is it one- or two-tailed? (2) If it is one-tailed,
does it have an upper rejection region or a lower rejection region?
The way to determine this, either in science,
on an exam, or in the homework, is to read the scientific hypothesis.
If the hypothesis is non-directional, then it goes with a two-tailed
t test. This means that the researcher can reject Ho if the effect
is far enough from zero either on the negative side or the positive
side of the distribution.
Usually scientists are making a specific directional hypothesis.
In the case of the psychotherapy example we've developed, we're
expecting the treatment group to do better than the control group.
That is, psychotherapy improves mental health. So this is directional;
it only uses one tail. Obviously we're looking for mean one minus
mean two (M1 - M2) to be a positive value or above zero. Since Ho
is predicting zero, we'd want to reject for t values that are larger
than zero. Ho would be rejected if the calculated t fell in the
upper tail of the distribution.
For the skiing example, the researcher predicted that the treatment
group would have lower scores than the control group. Therefore,
the mean of the treatment minus the mean of the control would be
a negative number. The researcher, therefore, expected t values
below zero. In this case, we would reject Ho for values of t less
than zero (i. e. in the lower tail of the distribution.)
Ho is always predicting zero and then H1 is either predicting
above zero or below zero or both directions (either above or
below). By reading the problem, and determining the nature
of the scientific hypothesis, you should be able to decide what
kind of test you need, whether there is two rejection regions or
just one, and if it is one, in which direction.
For a one-tailed test, the alternative hypothesis (H1) statement
will always have either a greater than ( > )or a less than( <
) sign. For a two-tailed test, H1 will have a not equal sign (…).
The null hypothesis is always that the difference between the means
is equal to zero. [NOTE: This last
statement is not technically correct, but for this level class it
is conceptually accurate.]
Back
To Menu Locator Map

Expected
Values of t
Let's discuss what the expected values of
t will be if Ho is true or if H1 is true.
Back
To Menu Locator Map
Expected
Value if Ho is True
The PCH of chance in science goes with Ho
in statistics. The skeptic expects the difference between the means
to be equal to zero. There's two ways for Ho to express this. One
is that mu one minus mu two is equal to zero. The other is that
we expect mean one minus mean two to be equal to zero.
It is important to realize that if Ho is true one would expect
t, on the average, to be centered at zero.
To understand that Ho is that t is zero, keep track of three points:
(1) The structure of t is basically
to divide a square root into the difference between the means. (2)
If Ho is exactly right, the difference between the means would be
zero. (3) When you divide any number
into zero, you always get zero. If Ho is exactly right in its prediction,
expect t to be zero. Ho is predicting a zero for a t value. That
will be important in our logic later on.
Expected
Value if H1 is True
In the psychotherapy example, the scientific
hypothesis is directional.
If H1 is true, we expect mean one to be greater
than mean two. So H1 is expecting some positive number on top of
the t. Therefore when you divide whatever value is in the square
root into the positive number, you're going to get, in this case
a positive number. So H1, in this particular case, is predicting
large values of t, above zero at least. So the expected value of
t, given H1 is true, is greater than zero.
Is H1 one- or
two-tailed?
The answer is that it is one-tailed because we will reject Ho in
only one tail of the t probability distribution. We haven't examined
that logic in detail, but it will develop as we go along. It will
become apparent that there is a flow of logic from the structure
of the scientific logic in the H1 statement through the way the
rejection regions fall in one or two tails. And when the rejection
region is only in one tail, which one.
Back
To Menu Locator Map
Back
To Menu Locator Map
One Tailed Upper
Test
In the Psychotherapy example, the scientist was expecting that
the group which received the psychotherapy treatment would have
improved mental health as evidenced by higher mental health
scores. Therefore the scientist expected M1 (the mean of the treatment
group) to be higher than M2 (the mean of the control group).
In general, if the Treatment group will have larger values on the
DV than the control group in a study then the result of M1-M2 will
always be a value greater than zero. The null or skeptical hypothesis
expects the value of t to be at or near zero. Logically if we receive
a t value that is much larger than zero then we have found strong
support for the scientific hypothesis and can be fairly sure (usually
95% sure) that we can correctly reject the null hypothesis.
There are three things required to look up a critical
value for t in a statistical table. First, calculate the degrees
of freedom. In the Psychotherapy example there are five subjects
in group one and four subjects in group two. Following the formula,
the degrees of freedom would be five plus four minus two, which
is seven. You'll need those degrees of freedom to look up the critical
value in a t table.
One
or Two Tailed?
The second requirement in using the table of critical
values of t is that you must know whether you have a one- or a two-tailed
alternative hypothesis. In our Psychotherapy example our H1 is one-tailed
because the scientific hypothesis is directional: We expect the
group given psychotherapy to have higher mental health scores.
Alpha
Level
The final requirement is that we must specify
an alpha (a) level. This is a choice
that you have. The tables will list several alpha levels, including
.05, .025, .01 and others. You choose the level. By convention,
in the social sciences you ought not choose an a level larger than
.05. Any of the smaller alpha levels, such as .01 or .001 is acceptable.
For this example, let's choose alpha = .05. [Remember the alpha
is the probability that you are wrong when you reject Ho. By choosing
alpha, you are choosing the probability that you're wrong.] We have
calculated the df = 7, we know we have a one-tailed test, and we
have chosen alpha = .05. Now we are ready to look up the critical
value in the statistical tables.
Reading
the Critical Values Table
Your table will not look very much like
the one on the screen because the screen just shows a schematic.
There's only a few simple points to using the table.
First, the critical values are the numbers
in the body of the table. You just have to go down some number of
rows and across so many columns and read the number you find at
the intersection of the row and column you choose. It's about like
looking up the mileage from Las Vegas to Salt Lake City on a road
map while sitting in a coffee shop. Use the schematic on the screen
to follow along with the instructions below.
The degrees of freedom run down the side of the table. Our degrees
of freedom (five plus four minus two) equal seven; so we go down
to the row with "7" in front of it. Our critical value will
be somewhere on this row. Next we go across the top of the table
based on alpha.
But there's one complication. Alpha is different if you have a
one- versus a two-tailed test. So the table has a distinction between
one- and two-tailed tests. A common symbol for this is Q (one-tailed)
versus 2Q (two-tailed). Since we have a one-tailed test, go across
the top of the table using the alpha's in the Q row.
When you get to .05 you will be above the correct column. Then
you'll come down to where you meet the seven degrees of freedom
row. There, at the intersection of the row and column, will be the
critical value, 1.895. This is the critical value of t that we need
for our particular example.
Critical
Value Divides the Distribution
Now we know the exact value of our critical value.
So we can put critical t = 1.895 right below the line we've drawn
to divide the range of t into two regions. We are going to talk
in more detail about what this picture means in the section on statistical
conclusion validity.
Back
To Menu Locator Map
One
Tailed Lower Test

Back
To Menu Locator Map
We have already discussed a one-tailed upper test using the psychotherapy
research study as an example. We are going to change the example
in order to discuss one-tailed lower tests.
Imagine that a researcher is doing the same psychotherapy study
but instead of using mental health scores as the dependent variable
(DV) the researcher is going to use the number of symptoms a client
has as the DV. When the scientist used mental health scores as
the DV, higher scores meant better mental health. However, if
the scientist uses number of symptoms as the DV then LOWER
scores are indicative of better mental health. The people who
have fewer symptoms are the ones
who are healthier. Therefore the scientist now expects M1 (the
mean of the treatment group) to be lower
than M2 (the mean of the control group).
Reject
Ho in a
One-Tailed Lower Test
If the Treatment group will have smaller
values on the DV than the control group then the result of M1-M2
will be a value less than zero. The null or skeptical hypothesis
expects the value of t to be at or near zero. Logically if we obtain
a t value that is much smaller than
zero (in this particular case) then we have found strong support
for the scientific hypothesis and can be fairly sure (usually 95%
sure) that we can correctly reject the null hypothesis.
 \
Now we are going to go through the same process we
discussed for a one-tailed upper test. First we need to determine
the critical value of t. To do this we need to know the degrees
of freedom associated with the test. We also have to determine whether
it is a one- or two- tailed test and what the alpha level is. These
decisions are made in the same way that was presented for the one-tailed
upper test. If you are unclear about how to do this, please review
that example above.
 
Just to review, here is how to find the
critical value on the table. There's only a few simple points to
using the table.
First, the critical values are the numbers
in the body of the table. You just have to go down some number of
rows and across so many columns and read the number you find there.
The degrees of freedom run down the side
of the table. Our degrees of freedom (five plus four minus two)
equal seven; so we go down to the row with "7" in front of
it. Our critical value will be somewhere on this row. Next we go
across the top of the table based on alpha. But there's one complication.
Alpha is different if you have a one- versus a two-tailed test.
So the table has a distinction between one- and two-tailed tests.
A common symbol for this is Q (one-tailed) versus 2Q (two-tailed).
Since we have a one-tailed test, go across the top of the table
using the A's in the Q row. When you get to .05 you will be above
the right column.
Then
you'll come down to where you meet the seven degrees of freedom
row. There, at the intersection of the row and column, will be the
critical value, 1.895.
This
is the critical value of t that we need for our particular example.
However in this example you need to remember
that the value in the table is an absolute value. For
a one-tailed upper test you should consider the table's value to
be equal to + 1.895 but for a one-tailed lower test, you should
consider the value to be - 1.895 (a negative value less than zero).
Back
To Menu Locator Map
Two-Tailed
Test

Back
To Menu Locator Map
Two-Tailed
Test
To discuss a two-tailed test we have to
change our example again. Now our group of psychotherapy researchers
are testing a new, controversial, and powerful therapy technique
but they are unsure if this treatment will be beneficial or detrimental
to the client. Now the scientific research team is unsure which
of the two means will be greater or less than the other. They are
unsure and think it could go either way. So the alternative or scientific
hypothesis is just that M1 (the mean of the treatment group) could
be either higher or lower than M2 (the
mean of the control group). The scientists are not predicting the
direction anymore; what they are predicting is that the two means
will be very different from one another.
Since the test is no longer directional,
the researchers are not predicting that the t value will be higher
than 0 or less than 0. All they are predicting is the t value will
be different from zero to such a magnitude that the researchers
are fairly confident that they can reject Ho.

Non-Directional
and Two-Tailed
A two-tailed test is, by
definition, a non-directional test. In other words we are going
to reject Ho if the calculated t is far enough away from zero.
It could be far enough away and below zero (and hence a negative
value) or it could be far enough away and above zero (and hence
a positive value).
Critical
Values for Two-Tailed Tests
Again we are going to use
the same basic process to determine whether the obtained or calculated
t value is of a sufficient magnitude that the researcher can reject
the null hypothesis. First, one determines the degrees of freedom
associated with the test. Second, one determines whether it is
a one- or two- tailed test. In this particular example we have
determined that a two-tailed test is most appropriate to the research
question. Finally one selects the alpha level for the test.
Finding
the Critical Values for Two-Tailed Tests in the Table
Just as before, the critical
values are the numbers in the body of the table. You just have
to go down some number of rows and across so many columns and
read the number you find there. Use the schematic on the screen
to follow along with the instructions below.
Just as before, the degrees
of freedom run down the side of the table. Our degrees of freedom
(five plus four minus two) equal seven; so we go down to the row
with "7" in front of it. Our critical value will be somewhere
on this row. Next we go across the top of the table based on alpha.
But there's one complication. Alpha is different if you have a
one- versus a two-tailed test. So the table has a distinction
between one- and two-tailed tests.
Since we now have a two-tailed
test, go across the top of the table using the alpha's
in the two-tailed row. When you get
to .05 you will be above the correct column. It
is essential that you make sure you are in the 2Q row since the
Q row will give you the value associated with a one-tailed test
and the critical value will be lower.
Then you'll come down to
where you meet the seven degrees of freedom row. There, at the
intersection of the row and column, will be the critical value,
2.365. This is the critical value of t that we need for this particular
example. In this example you need to remember
that the value in the table is an absolute value. For
a two-tailed test you should consider the table's value to be
equal to + 2.365
OR -2.365.
If the researcher gets a calculated t value of less than -
2.365 or greater than a +
2.365, she can reject Ho with some assurance that she is
correctly doing so.
Divide
the Range of T
Here is a picture that shows
how the rejection regions fall on either side of zero. If
the researcher gets a calculated t value of less than -
2.365 or greater than a +
2.365, she can reject Ho.
Back
To Menu Locator Map

Back
To Menu Locator Map
Now let's use all the pieces we've developed
in the above lectures to evaluate the PCH of chance. That's what
Statistical Conclusion Validity refers to. If we can validly argue
that chance alone did not generate our research results, then we
have good statistical conclusion validity. If we can validly argue
that chance alone generated our research results, then we have poor
statistical conclusion validity.
Another way to say this is that based on
our statistics we make a conclusion. That conclusion validly argues
against chance or it does not validly argue against chance.
Let's see how all this works.
Let's
begin by reviewing the 4-step process from population to sampling
distribution. First, we assume that Ho is true and that the DV
scores in the Therapy and No Therapy groups are just two samples
from a single normal distribution. Second, we do the research; that
is, we collect the two samples. Third, we calculate our test statistic
(t for independent means) on the data from our samples. Finally,
we build our Reject Ho logic on the sampling distribution of t.
Now lets look at that logic more closely.
As
we've argued before, if Ho is true, then the expected value of t
should equal 0. So the sampling distribution of t should have 0
as its center.
We've also argued that the psychotherapy
example is a one-tailed test in the upper tail of the t distribution.
And we've found the critical value of t to be 1.895.
So we put the critical value (red line on
the graphic) above zero in the upper tail.
Remember, we choose alpha to be .05. That
means the critical value divides the probability under the t curve
in such a way that (if Ho is true) the probability of falling in
the "Do Not Reject Ho" region is 19 out of 20 or .95.
By
the same logic, the probability of falling in the "Reject Ho
Region" is 1 in 20 or .05.
Remember what this all means. If Ho is true
(that is, the means differ from each other by chance alone) then
the probability of falling in the rejection region and incorrectly
rejecting Ho is only .05.
That is, if Ho is true and chance alone
is acting in the data, the probability of getting a calculated t
value above 1.895 is .05.
Let's
see where the calculated value falls. When we calculated the actual
t value, we found it to be +3.52.
As the graphic shows, a calculated value
of 3.52 falls in the rejection region because it is greater than
1.895.
So we reject Ho.
Let's
go back and put that into our 4-step schema and review the big picture.
We assume Ho is true and our data are just
two samples from the same normal population. Therefore the data,
including the two means, differ only by chance.
[Here, you may want to recall your experience
with the Double Sample Tool and Detect Difference Game, or even
go back and play with those again because your experience with them
is important for understanding this whole argument.}
Next we calculate the t for independent
means. We find it to be 3.52. Then calculate the degrees of freedom
and choose alpha = .05. Then find a critical value that divides
the values of t into two regions. When we look at the calculated
value of t, we decide that it is larger than the critical value
of t. So the calculated value of t falls in the "Reject Ho"
region.
Back
To Menu Locator Map

Back
To Menu Locator Map
REVIEW:
In the last section we found that if Ho is true, by chance alone
there is a .95 probability that a calculated t will fall into the
"Do not reject Ho" region.
Conversely, if Ho is true, then by chance
alone there is only a .05 probability that a calculated t will fall
in the "Reject Ho" region.
Our
calculated t fell in the "Reject Ho" region so we rejected
Ho.
Remember that Ho assumed that there was
only one normal population and that the two samples (and their means)
would differ only by chance.
So we are rejecting the idea that the two
samples are from only one distribution.
On
the other hand, H1 was based on the idea that the two samples (and
their means) came from two different normal population. On the graphic
the Psychotherapy data was sampled from the red population where
mental health scores are higher. In contrast, the No Psychotherapy
data were sampled from the green population which has lower mental
health scores.
So we expect the Psychotherapy mean (M1
from the red population) should be higher than the No Psychotherapy
mean (M2 from the green population).
Therefore, under H1, we expect t to be greater
than zero. Our calculated value of t =3.52 which is much higher
than zero is what we expected under H1. So H1 remains plausible
after we have collected the data and calculated the t-test.
In
the realm of statistics, Ho has been been shown to be improbable.
H1 remains plausible.
The statistical conclusion that we can make,
therefore, in the realm of science is that Chance which once was
thought to be plausible is no longer plausible.
In contrast the way the two means came out
(Psychotherapy higher than No Psychotherapy) is consistent with
the scientific hypothesis.
So
Chance is no longer a plausible hypothesis while the scientific
hypothesis still is plausible.
What
we have done is eliminate one important hypothesis (Chance) competing
with the scientific hypothesis.
But there may be other competing hypotheses.
For example, critics and skeptics may point out that our control
group (No Psychotherapy) is a poor one. The volunteers in the No
Psychotherapy group received no attention at all. They were told
that they were on a waiting list and would called back. The Psychotherapy
group received not only the psychotherapy but also a lot of human
attention. Data pattern (Mean 1 greater than Mean 2) is consistent
with the plausible competing hypothesis that people who receive
human attention get better. It's not the psychotherapy, it's the
attention.
The researchers need an "Attention
Placebo" control group. These kinds of issues are a matter
common sense, critical thinking and a good understanding of the
research literature in the journals. They are usually addressed
in great length in a research methods class.
Statistical
conclusions only evaluate Chance as a competing hypothesis.
It is very important
to realize there are many other PCH's which need to be addressed
by other means.
Back To Menu Locator
Map
|