DETECT DIFFERENCE &
DOUBLE NORMAL TOOLS

Double Normal Tool for Applet MACINTOSH Computers Click HERE

Instructions for using Double Normal Tool

Instructions for using Detect Difference

Detect Difference & Double Sample: An online, printable lecture.


Double Sample Tool Instructions


©Copyright, 2000 Tom Malloy

Note: These instructions are abstracted from and can be supplemented by the full web lecture on the Double Sample Tool and Detect Difference Game available through another link on this page.


To PRINT this web page: Click on the "Print" button at the top of your browser.


 

Double Sample Tool

Double Sample Tool allows you to take samples from two Normal Populations simultaneously. You can begin to learn how the two samples are affected when you change the characteristics of the two populations.

The Double Sample Tool also allows you to simulate data collection in a research project with two groups.

A menu like the one in the graphic will pop up. Select Double Sample.

 

 

Instructions. Look at the Quick Reference Instruction page and then the blue Start button so you go past the instructions to the actual Tool.

Lavender, White and Yellow Areas. As you can see on the graphic (and even better on the actual Tool) in the lavender area are two distributions: a red and a green distribution. Each distribution has a mu and sigma. So we have a red mu and a red sigma as well as a green mu and a green sigma.

Below the distribution and on the left is a white area with 4 little boxes. These boxes allow you to see (and to change by typing in new values) the 4 parameters (red mu, green mu, red sigma, and green sigma).

Below the distributions and on the right is a pale yellow area with a large Get Data button. When you press Get Data you will get two samples of data (one from the red and one from the green distribution.)

Changing mu's. The red and green triangles are pointers which you can click and drag to move the red and green Normal Distributions. Drag the two distributions back and forth (on the actual Tool, not on the lecture graphic). Notice that, as you move the populations, the values of mu change (in the two little boxes below the populations. Dragging a population with its pointer changes where its center is, so dragging changes the value of mu.

You can also change the mu of either population by typing in an exact value in one of the boxes. (But you have to click Get Data before the populations change.

Changing sigma's. The two sigma's (red and green) can only be changed by typing in values in the little boxes below the distributions. Change the sigma on one of the distributions to see what happens (you have to click Get Data to get change). Notice how the shape of the Normal Distributions change with changing values of sigma.

Lock: When you press the lock button it turns yellow and locks the two population sigmas to the same value. This is because many famous statistics which we will study later assume that the two population sigma's are equal. This assumption is called Homogeneity of Variance (and we will study it later). For now just know that pressing the lock button ensures that the data you get conforms to an important assumption in statistics. (Don't worry about understanding that assumption at the present). Of course, you can press the Lock button a second time to unlock the sigma's.

Get Data: When you press the Get Data button two columns of data appear, one from the red distribution, one from the green distribution. Observe how the data changes as you change the parameters of the populations.

Some statistics. Below the data columns are a bunch of statistics. You've learned about the mean (M) and the standard deviation. In lecture we used the symbol S for standard deviation. On Double Sample Tool we use the symbol SD for standard deviation. (This is not uncommon. I suppose there is a little irony in the fact that there is a lot of variability in the symbols for standard deviation in statistics.) You haven't learned anything in this course yet about the two other statistics given by Double Sample Tool (SEM and t). Even without knowing a thing about them you can still notice their behavior as you change the populations parameters and get data. I'm going to ask you to notice the behavior of t, even though you don't know about it yet. This may seem a little backwards, but actually knowing something about the behavior of a statistic can make understanding it much easier later. (As we go along, we'll make explicit what we mean by the "behavior" of a statistic, but basically it just means what value did it take on in a certain circumstance and how does that value change when you change the circumstances.)

For now, watch the behavior of the statistics (i.e., what values they take on). Notice any patterns in their relationship to the populations. The mean (M) is particularly useful. You should also notice the behavior of t (the bottom statistic). How does it change as you change the distance between the two populations? (FYI: The rule governing t is that it depends on how far apart the two distributions are relative to how spread out they are. So t depends on the sigma's as well as the mu's. It's an interesting and useful statistic that we will study later.)

For more about the learning philosophy behind Double Sample Tool, read on. Otherwise have fun with the tool.

 

Playing with Data Simulations

Interactive Simulation: A New Way to Learn

You've already learned to use the Normal Sample Tool to take samples from a Normal Distribution. The Normal Sample Tool simulates with just the push of a button the weeks, months or years of work it might take to collect data. Normal Sample tool is interactive--it changes its output depending on what we input for mu, sigma, and sample size.

Simulation allows us to take an explicit model (such as the Normal Probability Distribution) and learn what should happen in the world if the model is true. We can simulate what the sample data should be and how it should behave if the Normal Distribution is the correct model.

Simulation has a tremendous advantage--it allows us to take many samples quickly and to learn what happens to the sample data when we change the population parameters (mu and sigma). We quickly learn how the population and sample data go together-- how sample data are related to what is happening in the population, and how what is happening in the population is related to the sample data.

Interactivity is crucial. Our ability to interact with computer programs allows us to set variables and parameters (like mu and sigma) in our model so that we can simulate a wide variety of practical and theoretical situations. For example we can simulate a small research project in which the IQ's of 7 participants are measured. In theory IQ is distributed normally with mu = 100. Different IQ tests have different sigma's, so say that the particular test we are using has a sigma of 10. All we have to do is set the parameters in Normal Sample Tool (mu = 100, sigma = 10, n = 7) and press the Sample button. We will get a random sample of 7 IQ scores. This simulates doing the actual research. Moreover, we can easily press the sample button many times and look for important meta-patterns in the data, things like how the sample mean corresponds to the population mu. That's the direction we are going in this lecture. We are going to gain extensive experience with interactive simulations to learn important relationships between sample data and population parameters assumed to be true by our statistical model.

New avenue of learning. Learning about these kinds of relationships between populations and samples is one of the most difficult and subtle parts of learning statistics. That's because in the past this kind of learning only occurred through mathematical proofs or years of experience collecting data. Generally, introductory statistics students did not have the mathematical sophistication to gain insight by mathematical proof; nor did they have the research background to gain insight that way either. But interactive computer simulation opens up a new avenue of learning. The series of Tools and Games you are going to use in this class are designed to open up for you this new way to learn.

Now we are going to build on your prior knowledge about sampling from the Normal Distribution.

Read the web lecture for Detect Difference and Double Sample Tool, to get a guided tour of data simulation. Or just wander along the new avenue of interactive learning. Just play with the Double Sample Tool and figure out relationships between how you set the population parameters and the what kind of data you get.


Detect Difference Game Instructions


©Copyright, 2000 Tom Malloy

Note: These instructions are abstracted from and can be supplemented by the full web lecture on the Double Sample Tool and Detect Difference Game available through another link on this page.


To PRINT this web page: Click on the "Print" button at the top of your browser.


 

Click on "Detect Difference."


Simulating a Scientific Puzzle

Detect Difference Game

The Detect Difference Game works very much like the Double Sample Tool. One crucial difference is that when you play, your score will be recorded and will count toward your grade.

Conceptually, the biggest difference between the two is that you now no longer see the Normal Distribution(s). A grey screen has come up from the bottom like the door of an industrial elevator.You can see black and yellow stripes along the leading edge of the firmly shut door. It hides the populations. This, of course, simulates the fact that scientists don't get to see the populations, just the data. Imagine a two-group study, just like in the Double Sample Tool. As a scientist, your job is to decide if the Treatment had an effect or not. When we studied the simulation model with the Double Sample Tool, we found that if the Treatment has no effect the data you see are just two samples from one distribution. But if the treatment is effective, then the data you see are two samples, one from each of two distributions.

If the Treatment is effective, you are sampling from two distributions. If the treatment is ineffective, you are sampling from one distribution. So the fundamental scientific decision can be boiled down to this: Am I sampling from one or two distributions?

Instructions. Open the Detect Difference Game if you have not done so. 1) Begin by choosing your level of play. This is done by looking for a white box (circled in red on the graphic) that says "Please Select." Click on the arrow next to the box and choose Easy, Medium, or Hard from the drop down menu. 2) Next you must click "Get Data." You will get two samples of data. You must decide if these data came from two different distributions (the treatment is effective) or from one distribution (the treatment is ineffective). 3) Make your decision. Press either the "One Distribution" or "Two Distributions" button (circled in blue). That's it, for game play.

Score. In the green box on the graphic, you can see where your score will be shown as a percentage. To submit your score to the database for a grade, press the submit button (just to left of the green box on the graphic). You cannot submit a score until you have made at least 10 decisions. The "Submit" button will send your percentage for the last 10 decisions you made. You can make as many decisions as you like, but when you submit, you will be given the as your score the percentage correct out of the last 10 decisions you made. In other words, your score is percent correct of the last 10.

You can logoff and log back on and return to Detect Difference to improve your score as often as you like.

Statistical Resources. On the graphic, above the green box and below the blue box, you will see 4 buttons (t=, SD, M, and SEM). Click on whichever button you want to see the corresponding statistic calculated on your current two samples. You can use what you have learned about the behavior of various statistics to help you make decisions. You can open a second copy of Netscape and use it to open Double Sample Tool simultaneously, so that you can can play with it as you make your decisions. The learning goal is to discover how statistical models work while you play the game.

Feedback. After you make your decision (by pressing the one or two distribution button), you'll get immediate feedback. The grey door with yellow and black stripes will disappear and you'll see where the data came from (either one or two distributions). You'll also get verbal confirmation that your are right or wrong. And your score will be updated (but not submitted, only you can submit your score).

Philosophical Aside. This game creates the pretense that the population(s) is (are) covered up. In fact, we must remember that the populations are only a probability model. The model is something that humans have made up. The normal distribution model has been built carefully by mathematicians and scientists over decades. It's not really "out there." There is only the mysterious universe. There is only our attempts to reduce the universe to numbers through our measurement operations. There is only the data we collect. In contrast, the model we made up for this data is only a fantasy; a well-constructed fantasy, but it is just something we have thought up to make sense of the universe. So, it's not that the populations are covered up; it's that there really are no populations to see when we are doing "real world" science.

What we have in science is the data. And, as we look at the data, we can use our fantasies (models) to give meaning to the data and to make decisions. That will be your job in this game. Look at the two data samples. Use your own experience with the normal distribution model to make decisions about what the data mean.

That's the game.

Levels of difficulty. The database will keep a separate score for you for the Easy, Medium, and Hard levels of the game. So you have to play all three levels.

In terms of the Normal Distribution Model:
Easy = large effect size
Medium = moderat effect size
Hard = small effect size

The hard level of difficulty is rather, well, hard. Even statistical aids may not help you to get a good score. Think about your experience in the Double Sample Tool Blood Pressure example, where the effect size was very small. It is next to impossible to be able to detect a small effect size in your sample.

But how do I get a decent grade at the hard level? Good question. First, I strongly suggest that you play the Hard Level just as scientists have to play it. Get one data set and do your best to make a decision. Do this 10 or so times to see what kind of score you get.

Replication. Then use the "Get more data" button (below the Get Data button). Watch what happens to the statistics, particularly M and t, as you get many samples. Scientists rarely have this option, but you do. Based on your knowledge of how the model works in Double Sample Tool, figure out the puzzle through multiple replicaitons. You should be able to get a good grade; but you'll have to think about the patterns you are learning (which is the point).

 

©Copyright 1997, 2000 Tom Malloy and Gary "Jake" Jensen