Double
Sample Tool Instructions
©Copyright,
2000 Tom Malloy
Note:
These instructions are abstracted from and can be supplemented
by the full web lecture on the Double Sample Tool and Detect
Difference Game available through another link on this page.
Double
Sample Tool
Double
Sample Tool allows you to take samples from two Normal
Populations simultaneously. You can begin to learn how the
two samples are affected when you change the characteristics
of the two populations.
The Double
Sample Tool also allows you to simulate data collection
in a research project with two groups.
A menu like
the one in the graphic will pop up. Select Double Sample.
Instructions.
Look at the Quick Reference Instruction page and then the
blue Start button so
you go past the instructions to the actual Tool.
Lavender,
White and Yellow Areas. As you can see on the graphic
(and even better on the actual Tool) in the lavender area
are two distributions: a red and a green distribution.
Each distribution has a mu and sigma. So we have a red mu
and a red sigma as well as a green mu and a green sigma.
Below the distribution
and on the left is a white area with 4 little boxes.
These boxes allow you to see (and to change by typing in
new values) the 4 parameters (red mu, green mu, red sigma,
and green sigma).
Below the distributions
and on the right is a pale yellow area with a large Get
Data button. When you press Get Data you will get two
samples of data (one from the red and one from the green
distribution.)
Changing
mu's. The red and green triangles are pointers which
you can click and drag to move the red and green Normal
Distributions. Drag the two distributions back and forth
(on the actual Tool, not on the lecture graphic). Notice
that, as you move the populations, the values of mu change
(in the two little boxes below the populations. Dragging
a population with its pointer changes where its center is,
so dragging changes the value of mu.
You can also
change the mu of either population by typing in an exact
value in one of the boxes. (But you have to click Get Data
before the populations change.
Changing
sigma's. The two sigma's (red and green) can only be
changed by typing in values in the little boxes below the
distributions. Change the sigma on one of the distributions
to see what happens (you have to click Get Data to get change).
Notice how the shape of the Normal Distributions change
with changing values of sigma.
Lock:
When you press the lock button it turns yellow and locks
the two population sigmas to the same value. This is because
many famous statistics which we will study later assume
that the two population sigma's are equal. This assumption
is called Homogeneity of Variance (and we will study
it later). For now just know that pressing the lock button
ensures that the data you get conforms to an important assumption
in statistics. (Don't worry about understanding that assumption
at the present). Of course, you can press the Lock button
a second time to unlock the sigma's.
Get Data:
When you press the Get Data button two columns of data appear,
one from the red distribution, one from the green distribution.
Observe how the data changes as you change the parameters
of the populations.
Some statistics.
Below the data columns are a bunch of statistics. You've
learned about the mean (M) and the standard deviation. In
lecture we used the symbol S for standard deviation. On
Double Sample Tool we use the symbol SD for standard deviation.
(This is not uncommon. I suppose there is a little irony
in the fact that there is a lot of variability in the symbols
for standard deviation in statistics.) You haven't learned
anything in this course yet about the two other statistics
given by Double Sample Tool (SEM and t). Even without knowing
a thing about them you can still notice their behavior as
you change the populations parameters and get data. I'm
going to ask you to notice the behavior of t, even
though you don't know about it yet. This may seem a little
backwards, but actually knowing something about the behavior
of a statistic can make understanding it much easier later.
(As we go along, we'll make explicit what we mean by the
"behavior" of a statistic, but basically it just
means what value did it take on in a certain circumstance
and how does that value change when you change the circumstances.)
For now, watch
the behavior of the statistics (i.e., what values they take
on). Notice any patterns in their relationship to the populations.
The mean (M) is particularly useful. You should also
notice the behavior of t (the bottom statistic).
How does it change as you change the distance between the
two populations? (FYI: The rule governing t is that
it depends on how far apart the two distributions are relative
to how spread out they are. So t depends on the sigma's
as well as the mu's. It's an interesting and useful statistic
that we will study later.)
For more about
the learning philosophy behind Double Sample Tool, read
on. Otherwise have fun with the tool.
Playing
with Data Simulations
Interactive
Simulation: A New Way to Learn
You've already
learned to use the Normal Sample Tool to take samples
from a Normal Distribution. The Normal Sample Tool simulates
with just the push of a button the weeks, months or years
of work it might take to collect data. Normal Sample tool
is interactive--it changes its output depending on what
we input for mu, sigma, and sample size.
Simulation
allows us to take an explicit model (such as the Normal
Probability Distribution) and learn what should happen in
the world if the model is true. We can simulate what the
sample data should be and how it should behave if the Normal
Distribution is the correct model.
Simulation has
a tremendous advantage--it allows us to take many samples
quickly and to learn what happens to the sample data when
we change the population parameters (mu and sigma). We quickly
learn how the population and sample data go together-- how
sample data are related to what is happening in the population,
and how what is happening in the population is related to
the sample data.
Interactivity
is crucial. Our ability to interact with computer programs
allows us to set variables and parameters (like mu and sigma)
in our model so that we can simulate a wide variety of practical
and theoretical situations. For example we can simulate
a small research project in which the IQ's of 7 participants
are measured. In theory IQ is distributed normally with
mu = 100. Different IQ tests have different sigma's, so
say that the particular test we are using has a sigma of
10. All we have to do is set the parameters in Normal Sample
Tool (mu = 100, sigma = 10, n = 7) and press the Sample
button. We will get a random sample of 7 IQ scores. This
simulates doing the actual research. Moreover, we can easily
press the sample button many times and look for important
meta-patterns in the data, things like how the sample mean
corresponds to the population mu. That's the direction we
are going in this lecture. We are going to gain extensive
experience with interactive simulations to learn important
relationships between sample data and population parameters
assumed to be true by our statistical model.
New avenue
of learning. Learning about these kinds of relationships
between populations and samples is one of the most difficult
and subtle parts of learning statistics. That's because
in the past this kind of learning only occurred through
mathematical proofs or years of experience collecting data.
Generally, introductory statistics students did not have
the mathematical sophistication to gain insight by mathematical
proof; nor did they have the research background to gain
insight that way either. But interactive computer simulation
opens up a new avenue of learning. The series of Tools and
Games you are going to use in this class are designed to
open up for you this new way to learn.
Now we are going
to build on your prior knowledge about sampling from the
Normal Distribution.
Read the web
lecture for Detect Difference and Double Sample Tool, to
get a guided tour of data simulation. Or just wander along
the new avenue of interactive learning. Just play with the
Double Sample Tool and figure out relationships between
how you set the population parameters and the what kind
of data you get.
Detect
Difference Game Instructions
©Copyright,
2000 Tom Malloy
Note:
These instructions are abstracted from and can be supplemented
by the full web lecture on the Double Sample Tool and Detect
Difference Game available through another link on this page.

Click
on "Detect Difference."
Simulating
a Scientific Puzzle
Detect
Difference Game
The
Detect Difference Game works very much like the Double Sample
Tool. One crucial difference is that when you play, your
score will be recorded and will count toward your grade.
Conceptually,
the biggest difference between the two is that you now no
longer see the Normal Distribution(s). A grey screen has
come up from the bottom like the door of an industrial elevator.You
can see black and yellow stripes along the leading edge
of the firmly shut door. It hides the populations. This,
of course, simulates the fact that scientists don't get
to see the populations, just the data. Imagine a two-group
study, just like in the Double Sample Tool. As a scientist,
your job is to decide if the Treatment had an effect or
not. When we studied the simulation model with the Double
Sample Tool, we found that if the Treatment has no effect
the data you see are just two samples from one distribution.
But if the treatment is effective, then the data you see
are two samples, one from each of two distributions.
If
the Treatment is effective, you are sampling from two distributions.
If the treatment is ineffective, you are sampling from one
distribution. So the fundamental scientific decision can
be boiled down to this: Am I sampling from one or two distributions?
Instructions.
Open the Detect Difference Game if you have not done so.
1) Begin by choosing your level of play. This is done by
looking for a white box (circled in red on the graphic)
that says "Please Select." Click on the arrow
next to the box and choose Easy, Medium, or Hard from the
drop down menu. 2) Next you must click "Get Data."
You will get two samples of data. You must decide if these
data came from two different distributions (the treatment
is effective) or from one distribution (the treatment is
ineffective). 3) Make your decision. Press either the "One
Distribution" or "Two Distributions" button
(circled in blue). That's it, for game play.
Score.
In the green box on the graphic, you can see where your
score will be shown as a percentage. To submit your score
to the database for a grade, press the submit button (just
to left of the green box on the graphic). You cannot submit
a score until you have made at least 10 decisions. The "Submit"
button will send your percentage for the last 10 decisions
you made. You can make as many decisions as you like, but
when you submit, you will be given the as your score the
percentage correct out of the last 10 decisions you made.
In other words, your score is percent correct of the last
10.
You
can logoff and log back on and return to Detect Difference
to improve your score as often as you like.
Statistical
Resources. On the graphic, above the green box and below
the blue box, you will see 4 buttons (t=, SD, M, and SEM).
Click on whichever button you want to see the corresponding
statistic calculated on your current two samples. You can
use what you have learned about the behavior of various
statistics to help you make decisions. You can open a second
copy of Netscape and use it to open Double Sample Tool simultaneously,
so that you can can play with it as you make your decisions.
The learning goal is to discover how statistical models
work while you play the game.
Feedback.
After you make your decision (by pressing the one or two
distribution button), you'll get immediate feedback. The
grey door with yellow and black stripes will disappear and
you'll see where the data came from (either one or two distributions).
You'll also get verbal confirmation that your are right
or wrong. And your score will be updated (but not submitted,
only you can submit your score).
Philosophical
Aside. This game creates the pretense that the population(s)
is (are) covered up. In fact, we must remember that the
populations are only a probability model. The model is something
that humans have made up. The normal distribution model
has been built carefully by mathematicians and scientists
over decades. It's not really "out there." There
is only the mysterious universe. There is only our attempts
to reduce the universe to numbers through our measurement
operations. There is only the data we collect. In contrast,
the model we made up for this data is only a fantasy; a
well-constructed fantasy, but it is just something we have
thought up to make sense of the universe. So, it's not that
the populations are covered up; it's that there really are
no populations to see when we are doing "real world"
science.
What
we have in science is the data. And, as we look at the data,
we can use our fantasies (models) to give meaning to the
data and to make decisions. That will be your job in this
game. Look at the two data samples. Use your own experience
with the normal distribution model to make decisions about
what the data mean.
That's
the game.
Levels
of difficulty. The database will keep a separate score
for you for the Easy, Medium, and Hard levels of the game.
So you have to play all three levels.
In terms of
the Normal Distribution Model:
Easy = large effect size
Medium = moderat effect size
Hard = small effect size
The
hard level of difficulty is rather, well, hard. Even statistical
aids may not help you to get a good score. Think about your
experience in the Double Sample Tool Blood Pressure example,
where the effect size was very small. It is next to impossible
to be able to detect a small effect size in your sample.
But
how do I get a decent grade at the hard level? Good
question. First, I strongly suggest that you play the Hard
Level just as scientists have to play it. Get one data set
and do your best to make a decision. Do this 10 or so times
to see what kind of score you get.
Replication.
Then use the "Get more data" button (below the
Get Data button). Watch what happens to the statistics,
particularly M and t, as you get many samples. Scientists
rarely have this option, but you do. Based on your knowledge
of how the model works in Double Sample Tool, figure out
the puzzle through multiple replicaitons. You should be
able to get a good grade; but you'll have to think about
the patterns you are learning (which is the point).