Difference
to Inference:
A Game
for
learning to think logically
about Theoretical Inferences
This is the text of the in-class lecture explaining the Difference to Inference Game. You may print this text out and use it as a textbook. Or you may read it online.
To PRINT this web page: Click on the "Print" button at the top of your browser.
Red Green. More than other web lecture this lecture depends on color, specifically the difference between red and green. For other fans of PBS's Red Green show you can note that we chose red and green as the contrasting colors in these tools and games partly because of the show; we also choose red and green because red and green (think about traffic lights) are very distinct. But in black and white printings, red and green are sometimes hard to distinguish. It is still worthwhile to print this lecture, even if only in black and white, but you may have to refer to the online text and graphics at times while you read the black and white print out. If you have a color printer and can afford the ink, this lecture may be worth printing in color. It is between 18 and 22 pages of printed text.
Integrated
Whole
The current graphic (left) emphasizes that the Normal Sample Tool, Double Sample Tool, Detect Difference Game and Difference to Inference Game are all carefully integrated to give you a holistic experience of statistical theory starting from probability distributions and finishing with making high level inferences about competing theories.
Review
Normal Sample Tool. The take-home message from your experience with Normal Sample Tool is that common statistical models assume that collecting research data is like sampling from a normal probability distribution.
Double Sample Tool. The take-home message from your experience with the Double Sample Tool is that, if you have a study with two groups, the data you get in the two groups is like sampling from two (possibly different) normal distributions. Let's examine this take-home message in more detail using an example.
Double Sample--Fertilizer Example. Imagine a study in which you want to evaluate whether a fertilizer actually increases the crop productivity of a plot of ground. You have several plots of ground. You randomly select half of them to fertilize; the other half you leave unfertilized as a control. At the end of the growing season you measure the yield for each group of plots (fertilized plots versus unfertilized plots). So you have two groups of scores, the fertilizer group and the control group. Does the fertilizer have an effect? If it has an effect, what's its size? Is it big or small?
In our statistical model, let's say that the crop yields for the fertilized plots are sampled from the red normal distribution in the Double Sample Tool. Similarly, let's suppose that the crop yields for the unfertilized plots are sampled from the green normal distribution. Let's also suppose, as is normally the case, that the sigma's for the two distributions are equal. (This is known as the Homogeneity of Variance Assumption).
Identical
Normal Distributions. In the Double Sample Tool you can set the two normal
distributions (red and green) to be identical--in which case there is really
only one normal distribution. Consequently, the two groups of scores
are two samples from the same distribution.
Notice in the graphic that the red and green normal distributions have been set to have the same mu and same sigma. So the graphic shows the two distributions completely overlapping, i.e., they are the same distribution.
No Effect. Suppose the fertilizer has no effect on crop yield, suppose it is completely worthless. Then, of course, having the red and green distributions be identical is a good model. The crop yield data for fertilized and unfertilized plots is sampled from the same distribution.
Effect Size. In the statistical model, the size of the fertilizer's effect is calculated by subtracting the mu's of the two distributions. In this case the effect size = 145 - 145 = 0. An effect size of 0 is, of course, logically equivalent to the fertilizer having no effect.
Two
Normal Distributions--moderately far apart. In the Double Sample Tool the
two normal distributions may be different (i.e., have different mu's, different
sigma's, or both). Since we are assuming the the two sigma's are the same, then
the only way left for the the distributions to be different is in their mu's.
Moderate Effect. Suppose the fertilizer has a moderate effect on crop yield. Then, of course, having the red distribution with a higher mu (average crop yield) than the green distribution is a good model. Notice in the graphic that the green mu has been set to 120 while the red mu has been set to 150. (The sigma's are both set to 21.) Now we can clearly see two distributions. When the two distributions are different, then the data in the fertilized and unfertilized groups comes from two different distributions.
Effect Size. The distance between two distributions is called "effect size" in statistical jargon. In the case where we had a moderate effect size of the fertilizer, the effect size = 150 - 120 = 30. This is a pretty good effect size (it is larger than one sigma which equals 21).
Two
Normal Distributions--very far apart. In the Double Sample Tool the two
normal distributions may be set to be a long way apart (i.e., have very different
mu's).
Large Effect. Suppose the fertilizer has a very large effect on crop yield. Then, of course, having the red distribution with a much higher mu than the green distribution is a good model. Notice in the graphic that the green mu has been set to 100 while the red mu has been set to 180. When the two distributions are very different, then the data in the fertilized and unfertilized groups come from two very different distributions.
Effect Size. In this graphical example the effect size = 180 - 100 = 80. This is a large effect size (it is almost four sigma's).
Sample Data. Scientists, of course, don't see the normal distributions--those are only models. Scientists collect data. Below are three different data sets. One of them was sampled from a model in which there was no effect of fertilizer (the red and green distribution are identical). One is sampled from a model where there was a moderate effect of fertilizer. The third was sampled from a model where there as was a large effect of fertilizer. Put yourself in the shoes of scientists and discern which is which. [You haven't learned about t-tests yet, but it doesn't hurt to look at the t values shown below as you puzzle out the answer to the scientific question.]
![]() |
![]() |
![]() |
Detect Difference Game. When you played Detect Difference, it built on your experience with the Double Sample Tool.
One
or Two Distributions?
In the Detect Difference Game a screen kept you from seeing the distributions. You had to look at the data in the two groups and decide whether they came from one or two normal distributions. The example graphic was taken from and "Easy" level of the game where the treatment effect was large.
Feedback
After you made your guess in the Detect Difference Game, the screen was removed and you saw the statistical model. That is, you saw whether the two groups of data came from one distribution or from two distributions.
Difference to Inference Game
The General Idea
Do NOT open the
Difference to Inference Game itself yet.
The graphics in the next section do not correspond to the game.
As we said at the beginning of this lecture, the graphic below shows the progression of ideas across tools and games. We are now going to work with a game that integrates all these ideas and combines them with inductive, deductive and inferential logic. And, some people, at least, think it's fun.
7
by 7 Grid behind a screen
The bottom image on the current graphic (left) shows a grey screen about to cover up a grid with 49 cells (7 x 7).
Each cell on the grid contains either a red distribution or a green distribution.
You won't ever see these distributions; but you will be able to collect data from any pair of cells. Based on the data you collect, you must decide (by inductive logic) whether two hidden distributions are the same (both red, both green) or different (one red, the other green).
[Note: For the following discussion, if you are looking at a black and white printout, the red distributions are darker and the green distributions are paler. While this lecture is worth printing out, even if only in black and white, you may want to refer to the full color online version as you read your printout.]
The
Main idea: Which pattern?
In the Difference to Inference game, there will be a 7 by 7 grid that has 49 cells. Each cell has a normal distribution which can generate data. Some of these cells have red distributions; others have green distributions. The current graphic is misleading in that you will never see the distributions; they will remain hidden. You will only see the data that comes from a particular cell.
A Pattern. The red distributions will form a coherent pattern in the middle of the 7x7 grid (see example graphic). Your job will be to discover the shape of this pattern using only data you collect.
Research Projects: Collect Data from two adjacent cells. You will be able to select any two cells that are next to each other on the grid. They can be horizontally next to each other or vertically next to each other. We will tell you how to select two adjacent cells later. When you select two adjacent cells, you will get two sets of data. We will refer to a "Research Project" as selecting two cells and collecting data from them.
Detect the Edge of the Pattern. The way to discover the shape of a pattern is to find its edges. When you select two adjacent cells, their two distributions will give you two sets of data. When you get these two sets of data you will have to decide for yourself whether the two cells both have the same kind of distribution (both red or both green) or whether they have different distributions (one red, the other green). If the cells have different distributions, then you have found the edge of the pattern.
Another way to put this is that you will be playing the Detect Difference Game each time you collect data from two adjacent cells. Detecting Difference is your initial goal because difference (red versus green) occurs only at the edge of the pattern.
Which Pattern? Along the right side of the graphic (see above) you will see five candidate patterns, one above another. The red distribution in the center of the grid is described best by one of these five candidate patterns. In other words there are five theories (candidate patterns) for describing the the hidden pattern of red distributions. You are to determine which theory best fits with the data you get. It will require several research projects and a lot of puzzle solving on your part to do so.
Let's move on from generalities to specific example.
An Example from the History of Statistics
Open the Difference
to Inference Game now.
Graphics in the remainder of the lecture should correspond to the game.
Two
Stories
We have made up two stories to wrap around the Difference to Inference Game. One has to do with deforestation and hurricane damage. The other (Fertilize the Fields) is based on the history of statistical thought. The stories are logically equivalent. You can choose either one to understand and play the game. To play, you must click on one of the stories.
For this lecture, we will assume you are reading "Fertilize the Fields."
Click on Fertilize the Fields.
Then run your mouse over "Historical Note."
History
R. A Fisher was one of the most creative statisticians of this century. As you can read on the Historical Note he invented many fundamental statistical procedures and the F test is named after him. During one part of his career he was working on the problem of whether fertilizers actually had an effect of crop yield. In the 1920's fertilizers were somewhat more natural than they are now, and the statistical procedures he invented in that context have come to be called (especially by critics of statistical methodology) "manure pile statistics." In fact manure is one of the more polite words applied to these methods. In any event, whatever you may think of the use of statistical methods philosophically, Fisher's mathematics were brilliant and his intuitive leaps were breathtaking. His procedures have changed the history of science.
Run your mouse over "A Historically Based Puzzle."
Which
Map best fits the Territory?
Please read the text on the graphic first and then continue.
When he wrote, "A map is not the territory it represents, but, if correct, it has a similar structure to the territory, which accounts for its usefulness," A. Korzybski made an important distinction for science and epistemology.
Clearly, the piece of paper you buy in a gas station that is called a map of Salt Lake City is not the same as Salt Lake City itself. The map is not the territory.
The map is not the territory, but a map can be very useful. Scientific theories are, in this metaphor, like maps describing a territory. Theories guide us in operating in the world (e.g., building computers, or doing psychotherapy, or, even, designing computer games to teach people the principles of statistics). But often we have available to us many theories that disagree with each other about what to do or where to go. Which map should we choose? Which is most useful? This is a very general puzzle encompassing the foundations of scientific activity. The Difference to Inference game is designed to give you repeated experience of how statistical procedures are integrated into the scientific adventure in a way that helps decide which theory is most useful. Which theory best fits the available data?
Run your mouse over "Difference to Inference."
An
integrated strategy
Read the text of the current graphic and then continue. It verbally summarizes the main points of a way of thinking strategically about science. The game will naturally give you experience with this strategy, so its verbalization is not essential.
As we describe playing the game below, we will point out various elements of this strategy.
Click on "Start Game."
The Game Interface
We will now describe how the game interface works. It would be a good idea for you to open up the Difference to Inference Game and fiddle with the actual game interface while we are describing how it works. NOTE: The game randomly creates five new theories (explained below) each time it is played. So your game interface will not look exactly like the graphics shown below.
Maps
versus Territory
You are in a field (the territory). Five maps (theories) are candidates for describing the fertilizer pattern in this field. You must collect data on crop yield and on the basis of the data decide which theory is most useful for describing the fertilizer pattern.
On the right, circled in red, are the five theories (candidate patterns), one of which describes hidden red pattern.
In the center, circled in green, is a white 7 by 7 grid with 49 cells. You can choose any two adjacent cells and collect data from them.
Choose
Level of Difficulty
Choose an effect size (see green circle on graphic). The game is much easier with large effect sizes than with small effect sizes. This is because it is easier to detect large differences than it is to detect small differences.
Begin playing with large effects sizes (EASIEST).
Horizontal
Tool
On the left side of the game interface you will find a large grey button with an "H" on it. Click on the H button to activate the Horizontal Tool.
Select two cells. The Horizontal Tool allows you to click anywhere on the white 7 by 7 grid and select two horizontally adjacent cells. The two cells you select will be highlighted in grey (see green circle on graphic). You can select any two other cells just by clicking somewhere else.
Get Data. When you select two cells a large grey "Get Data" button will appear. When you press "Get Data" two columns of numbers (the data) will appear in place of the Get Data button (see blue circle on the graphic). The leftmost column of numbers corresponds to the distribution in the left cell
Color the two cells. Circled in yellow on the graphic are four buttons. Clicking on one of these buttons will color in the two cells you selected. Look at the data and decide which color(s) you think the two cells should be: red-red, green-green, red-green, or green-red.
Vertical
Tool
On the left side of the game interface you will find a large grey button with an "V" on it. Click on the V button to activate the Horizontal Tool.
The Vertical Tool works just like the Horizontal Tool, only vertically. When the Get Data button is pushed, the two columns of numbers will correspond to the upper and lower distributions. And the four coloring buttons will automatically rearrange themselves to color any two vertically adjacent cells.
An Aside: Five New Theories. This lecture was written over a period of time. Consequently, the author quit playing the game and later came back and started a new game to finish the lecture. Every time you start a new game, the five candidate theories are randomly reconstructed. You never play the same game twice. As a result, the five theories stacked up along the right side of the game interface are different in the example graphics below this point in the lecture than they were in the graphics above. This is simply to emphasize that every time you play a new game (and you will be required to play many games) the five "theories" change. Compare the five theories in the illustration above with the five theories in the illustration below.
Now let's get back to learning about the game interface.
PIC
buttons
To the right of each of the 5 candidate patterns, is a small grey button labeled "PIC". PIC is short for Picture (or outline). Clicking on a PIC button will turn on (or off) an outline of its corresponding pattern.
For example in the graphic, the top PIC button has been pressed and an outline of the top pattern has appeared over the white 7 by 7 grid.
The PIC buttons are toggles; that is, push them once to turn an outline on; push them again to turn an outline off.
Another
PIC button
In this graphic, the second PIC button has been pushed and so the outline of the second theory appears on the white grid.
The PIC buttons are VERY useful in choosing which two cells to select and in interpreting your results.
Check Boxes. To the left of each candidate theory is a little white box. It is a toggle. Click on it once and a check mark appears; click on it again and the check mark disappears.
The check boxes are there for your convenience. They allow you to put checks next to or remove checks from certain theories. Players usually use the check boxes in one of two ways. First, they might check off theories that they have eliminated on the basis of the data. Second they might put a check on all the theories and remove the check for each theory that is eliminated. It doesn't matter how you use the checks, or even if you use them at all. But they are a convenience in keeping track which theories are eliminated and which aren't.
Making a conclusion. To choose one of the five candidate theories, just double click on the one you think is the best description of the hidden pattern. That is, on the right side of the screen choose which of the five patterns you like and double click it. A small window asking you if your sure you want to choose that theory will pop up. Click "Yes" if you are sure.
Playing the Science Game
Grant Money. Doing research costs money. As a player you are given 500 grant bucks to begin your research. Each time you collect data on two cells it will cost you 15 grant bucks. It also costs money to statistically analyze the results of your research project. So you spend money to do research.
But if you do your research well you can earn more grant bucks. Every time you correctly identify the most useful available theory, you get a lot more grant bucks. If you do research logically and efficiently, you should be able to build up larger and larger amounts of grant bucks.
Grade. The course syllabus will tell you how many grants bucks you have to build up. If the syllabus doesn't set an amount of grants bucks you need to acquire, assume that for each level of Difference to Inference (and there are five levels) you will be required to build up 2500 grant bucks. Your grade for each level will be the percentage that your earned grant bucks are of 2500. So if you got only 1000 grant bucks on a certain level of difficulty, your grade would be 40%. Any grant bucks above 2500 do not give you extra credit. [NOTE: A PARTICULAR TEACHER MAY CHANGE YOUR GOAL FROM 2500 TO SOME OTHER VALUE, SO BE ALERT FOR ANNOUNCEMENTS CHANGING THE LEVEL OF GRANT BUCKS REQUIRED.]
The most you can get is 100%, even if you you earn 4500 grant bucks. So why would you bother to earn more than 2500?

Prestige. Science isn't only about money. It's certainly about truth and beauty. It's also about fame and prestige. At the very top of the game interface is a little blue link (Check High Scores) that will take you to a page where you can check The Top 10 high scores for each level of difficulty by login name. The legendary Flatcat once earned 21, 605 grant bucks on the hardest level of Difference to Inference. Flatcat was awarded the prestigious Nobell prize for this achievement.

Below are tables listing the costs of doing research and the payoffs for making discoveries. These are only for your information, and should not be studied too closely because successful game play depends good logic not the costs and payoffs. The best way to build up grant bucks is to use a good research strategy. An excellent strategy is laid out in this lecture. The first table makes it clear that it costs to do research and analyze data. So be strategic. Make your discoveries with as few research projects as possible.
|
COST TABLE
|
|
|
Research
Activity
|
Cost
|
| Collect Data on two cells |
15 grant bucks
|
| Stat Analysis: A Mean for each group |
5 grant bucks
|
| Stat Analysis: A Standard Deviation for each group |
5 grant bucks
|
| Stat Analysis: Estimated SEM for each group |
10 grant bucks
|
| Stat Analysis: a t-test |
15 grant bucks
|
All statistical analyses have been free up to now in the Double Sample Tool and the Detect Difference Game. In Difference to Inference you have to pay. You haven't even learned what a t-test is and they're charging you for it. That's ok, a lot of researchers find themselves in the same or very similar situations. You don't have to buy it if you don't want.
The second table makes it clear that there are more rewards for cleverly deducing the solution in the fewest number of moves. So move strategically and choose your research projects carefully..
|
PAYOFF
TABLE
|
|
|
Number of
Research Projects required to make Conclusion
|
New grant
funds received
|
| Solve the puzzle after 0 research projects |
0
|
| Solve the puzzle after 1 research project |
400
|
| Solve the puzzle after 2 research projects |
400
|
| Solve the puzzle after 3 research projects |
350
|
| Solve the puzzle after 4 research projects |
300
|
| Solve the puzzle after 5 research projects |
285
|
| Solve the puzzle after 6 research projects |
270
|
| Solve the puzzle after 7 research projects |
255
|
| And so on (15 less bucks for each research project required) | |
| Publicly commit yourself to a less useful theory |
-100
grant bucks
|
The payoff table should make clear that just guessing about theories without any empirical research will not be rewarded. But it's clearly better to make your conclusion with as few research projects as possible. Not only does the research cost, but as you do more and more studies, the payoffs get less and less. Science likes elegance and economy of thought.
The payoff table also makes clear that choosing a less useful theory is costly. You lose 100 grant bucks for making the wrong conclusion. It's hard to get grant funds if you have a reputation for making conclusions that don't stand up well to further research.
Game Strategy
Differences that Make a Difference
But what's a good game strategy? Seek the edge. The edge is where difference is. Find those differences which make a difference.
Seek the edge. This is pretty obvious but we'll be explicit about it. In studying how humans perceive visual form, Fred Attneave found that humans seek the edge of patterns. In short, if you want to know the shape of a pattern you gain no information by wandering around the background behind the pattern, never encountering the pattern. Neither do you gain information by wandering around the homogeneous interior of the pattern never encountering the edge where the pattern ends and the background begins. You must find find the edge between the pattern and what is not the pattern.
So you want to put your horizontal or vertical tool where they cross over the edge of candidate patterns. There is little to be found out by investigating places on the 7 by 7 grid that all theories agree are green or all theories agree are red. You want to be doing research and collecting data on the edge of a theory. The edge is where the information is.
Detect Difference. The edge boils down to being any place you can detect difference. When things are the same, completely homogeneous in every way, there is no difference and so there is no information. Difference occurs where things change. Where there is a difference there is an edge between pattern and background (or between two patterns).
So scientists naturally design research to probe for difference. When we do a two-group research project, we are naturally looking for conditions where we expect that the two groups of data come from two different populations.
Let's say we are using the Horizontal Tool. Where would you put it to find a difference that makes a difference?
Find
an Edge (Difference)
The current graphic shows that the third of the five candidate theories has its shape projected onto the white screen using the PIC button. Anywhere along the edge of the the outline is a place where you might expect to detect difference.
I've used the horizontal tool to select two cells that straddle the edge of the candidate pattern. If that pattern (map) is a good description of the territory behind the hidden screen, then I should find a difference between the two groups of data from the two cells.
Find
a Difference that Makes a Difference
Compare this graphic with the one above. On this graphic the top the PIC button has been pressed for the top candidate pattern. You can see that the top theory (this graphic) predicts "no difference" in the two selected cells, while the third theory (above graphic) predicts there should be a difference between the cells.
This is what we mean by a difference that makes a difference. Look at which two cells have been selected (shaded grey). Of the five candidate theories three of them (1, 2, and 4, from the top) predict that the two cells should show no difference while two of them (3 and 5, from the top) predict there should be a difference. No matter what the data come out to be, some theories will be eliminated. We want to find differences that make a difference between theories. When scientists run an experiment they're often attempting to distinguish among several theories. They are trying to discover which theories are still candidates because they are consistent with the data and which theories are eliminated because they are inconsistent with the data.
In the example below a new game will be played. So the five candidate theories have changed once again.
Choosing
Research Projects that Find Differences which make a Difference between Theories
The current graphic shows an example of a puzzle which was solved in two moves (two research projects). It could have been solved in by one research project if I had gambled and been lucky. But as it was, I took a more conservative strategy and it it took two research projects.
My first move was to compare two vertical cells (now both colored green) and labeled "1st." As you can see, the second research project was also a vertical comparison. My reasoning (looking at the five candidate patterns) is that three of them (the bottom three) predict a difference between the two cells selected for the first project. The other two theories (the top two) predict no difference. So, whatever the results of the study, I would eliminate either 2 or 3 of the theories.
Based on the data, I concluded that there as no difference between the data sets from the two cells. (This is just like playing the Detect Difference Game.) So I colored the two cells green. This was a lack of difference which made a difference--it eliminated three theories.
At that point only the top two theories were in contention. There a couple places where they make different predictions. I selected two indicated by "2nd" on the graphic. As you can see I concluded that the data from the two cells came from two different distributions, so I colored the bottom one red and the the top one green.
(NOTE: It would make no sense to color the top one red and the bottom green because none of the theories makes that prediction. This is a bit of a scientific shortcut that I can use only because I know there are only five theories. But in a more open universe, where there are many yet unspecified theories it would be better to calibrate by running a couple of baseline studies--one somewhere out in the green area and one somewhere safely in the middle of the red area to get a sense of what values I'm expecting from the green and red areas. That would give me are more solid feel for what color to assign to what. In fact, in the hard levels of difference to inference I still have to do that. But one clue for you, is that we've set the program up so that the green areas always have a lower mu than the red areas.)
Consistent
with the known Data
The current graphic shows that the second from the top theory is consistent with the known data. The PIC button has been pressed for the second theory. You can see that the second theory predicts no difference in the right spot (1st project). It also predicts difference in the right spot (2nd) project.
The second theory is consistent with the known data. That does not mean that some day someone will run so other study that that results in inconsistencies with the second theory. But up to this point, the second theory is consistent with the known data. That's an important criterion for the acceptability and health of theories--that they be (reasonably) consistent with known data.
Inconsistent
with the Data
The current graphic shows that the top theory (it's PIC button has been pushed) is not consistent with the known data. It is consistent with the data from the 1st study but not with data from the 2nd.
Note that the top theory predicts "no difference" for the results of the 2nd study while the study shows a difference.
Notice also the check marks in the small white boxes. I like to check a theory off when it is eliminated. Other people like to leave only those theories in contention with check marks. Either way, the checks help to keep track of the logic.
Feedback
and Funds
Double click right on the picture of the theory you think is "right." In the computer game, somewhat artificially, the computer does have in mind a "right" theory. In science the thinking is more sophisticated. A good theory should be a useful description of the territory, perhaps it will even be the best available description of the territory. As a minimum, a good theory should be consistent with the known data (or as much or more consistent than other theories).
My selection of theory number two was correct in the computer's mind, so I received a pat on the back and some more grant funds. Not too much different than science.
That's an overview of how to play the game. You should be ready to go and play.