Sampling Distributions Web Page
|
Click above to start an Interactive Visual Presentation
(Plugin Required)
|
|
Click here
to go to our plugin download and plugin tutorial page
|
This is the text of the in-class lecture which accompanied the Authorware visual graphics on this topic. You may print this text out and use it as a textbook. Or you may read it online. In either case it is coordinated with the online Authorware graphics.


This map allows you to--
1. To find a topic which interests you: Look at the map of menus above. Choose a menu that interests you. Notice that the menu buttons have topics printed on them. Click on any button (topic) on the menu; you will jump directly to the text that corresponds to the topic printed on the button.
2. To coordinate this web page with Authorware presentations: The corresponding Authorware program should already be open. Go to the menu of your choice in the Authorware program and click any button which interests you. Then on the topic locator map above click on the same button on the same menu; you will jump to the text that corresponds to the Authorware presentation.
End of Topic Locator Map
Beginning of Text explaining Sampling Distributions

Sampling Distributions. Now we're going to move onto what's generally considered the iceberg for the Titanic of statistics classes. The topic of Sampling Distributions very often is the most difficult and confusing concept for beginning students to learn. We've worked hard to create graphical and interactive presentations that will make this idea clear. We've also created interactive tools which will give you extensive experience with sampling distributions. All this experience is carefully designed to make your learning easier. Still, this is a difficult topic the first time you come across it.


Abduction. The first thing I want to do is review the abduction process. I want to emphasize how these concepts connect to science. There are complex processes in nature which a scientist wishes to study. The scientist invents measurement operations which we call in our jargon "dependent variables" or DV's. DV's turn these natural processes into numbers. That is, scientists like to measure the world. Once we have our numbers we then model the numbers in terms of mathematical ideas such as probability distributions. When we model our DV as a probability distribution we sometimes call it a Random Variable.

Example of Abduction. Remember the baseball player we studied in basic probability. She's about to take a swing at the ball. She's the infinite process we want to study and to do so we're going to take measurements on her. We going to measure the number of times she's been at bat, the number of hits she's made, and we're going to get the relative frequency of her hits. These are generally called her batting average. Then we can model that batting average in terms of some probability distribution as is shown on the graphic. So we have one human woman, and for cultural reasons we decide summarize her has a single number (her batting average) which we print in the newspaper next to her name. She's been reduced to this single number. But this number is of considerable importance to us in this culture. Some people become millionaires based on this number. As of 1999 no women that I know of have become millionaires that way, but people can become millionaires because of the performance that leads to that number.

Abduction: Rolling a single die. As a review, the next screen shows the roll of a single die. We've considered this in detail before as an abductive process. We pointed out that we could measure the roll of a die many ways because it is a complex, multi-dimensional process. We could measure the number of times it hits the ground, the amount of time it takes to come to rest, the rhythmic pattern of the sounds it makes as it rolls, and so on. We choose to measure it by counting the number of dots facing up when it comes to rest. Then we model those numbers as an equiprobable distribution. So, again, we take a complex process in nature, reduce it to numbers via measurement operations (DV's) and finally model the numbers in terms of probability distributions.


Sampling distribution overview. The visual schema on this screen will be used over and over to conceptualize an important process in the inferential statistics (such as t, F and chi square) which we will be developing. This schema gives the major steps in our statistical model.
1) Population. We start the process by assuming a population. By population we mean a process in nature that has been reduced to numbers by measurement operations and then modeled as a probability distribution. So this is like the batting averages or the roll of a die which we just reviewed. The population in statistics is just a probability distribution. But it is also something which is connected back to the context of science.
2) Sample. The next thing we do, from the scientific point of view, is go to the lab or some other place and do a piece of research. We do an experiment. From a statistical point of view we think of doing an experiment as taking a random sample from the population (which is a probability distribution). This statistical model corresponds to the homework you did with the Sample from Normal Tool (which is found in Difference to Inference on the StatCenter main menu). Recall your experience with sampling from the normal distribution. The homework asked you to define normal distributions and then take samples. In the statistical model, doing research is like sampling numbers from a well-defined probability distribution.
An important relationship: The population determines the probability that a single score in the sample will take on a certain value or fall in a certain region. Recall your work with the normal distribution. You were able to find, for example, the probability that one northern European male had a height between 165 and 175 cm. The important point is that the population relates to single scores in the sample.
3) Define a Statistic. Statistics are just formulas which apply to the data in the sample. In steps 1 and 2, we have defined a probability distribution and sampled a bunch of numbers from it. In step 3 we choose a statistical formula which we can apply to those numbers. Some statistics which we are already familiar with in this class include the mean, the standard deviation, and little r. Up through step three, we're on familiar ground. We've worked with probability distributions; we've taken samples; and we've applied statistics to the numbers in a sample.
4) Find the Sampling Distribution of the Statistic. Step 4 is new territory; it will take some time to develop and understand these ideas. In short, what we've got to do next is find the probability distribution (sampling distribution) of the statistical formula. I don't necessarily expect those words to mean anything much right now; they are more a guide to what we will be learning.
A sampling distribution lets us find the probability that the sample statistic takes on a certain value or falls in a certain interval. So a sampling distribution does for the sample statistic what the population does for a single score in the sample. We'll work with that idea a lot so that it takes on some depth.


Binomial Sampling Distribution. Let's start by developing a sampling distribution based on the binomial probability distribution.

Abduction from nature to science to statistics. Once again, we're going to start with this abductive process. We will take an infinite process in nature, a baby, and reduce it to gender, which is a massive reduction. For data collection and statistical reasons, we prefer to have numbers. So we will use a zero for a boy and a one for a girl. Turning a child into a 0 or a 1 is a tremendous loss of information since there is so much more to know about a child. But measurement has profound advantages for science which can make this loss worthwhile. Still it is important to remember that measurement is a always massive reduction of some infinite processes.
Next we are going to model our data (0's and 1's) as the Bernoulli process which is a simple probability model we studied earlier. A Bernoulli process can have only two outcomes (in our case a girl or a boy). Each outcome has a known probability of occurring. In our case the P(Boy) = P(Girl) = 0.5.

Four steps in getting the Sampling Distribution. Now we are going to go through each step carefully.
1. Assume Population. The first step is to assume a probability distribution. We have just argued in the previous section that gender at birth can be reasonably modeled as a Bernoulli probability process. So we will assume that the population of human births are a Bernoulli process. When a child is born two things can happen, girl or boy, and each has a 0.5 probability of occurring.
Recall that in a Bernoulli process one outcome is called a success and the other a failure. As we mentioned, these terms are not evaluative in this context. Suppose for some reason we want to know how many girls are being born. In our research project we want to count the number of girls. Using the "success" and "failure" jargon, it makes sense then to call the birth of a girl a "success" and the birth of a boy a "failure."
2. Construct a Sample. The next step is to take a random sample from the population of human births. We've had practice using StatCenter Probability Tools to take samples from populations so this is something we are familiar with.
Science. In this example, the scientists are engaged in a research project to count the number of girls being born, say in Salt Lake County. They examine county birth records, taking a random sample of 10 births during some period of time. They give male births a 0 and female births a 1. Their data then will be n = 10 scores. Each score will be a 0 or a 1. In step 2 of the graphic you can see that the first birth is a girl (X1 = 1), the second birth is a boy (X2 = 0), and so on down to the last birth which is a girl (Xn = 1).
Statistics. From the point of view of the statistical model, the whole research project is conceived of as simply randomly sampling n = 10 times from a Bernoulli population. This random sample yields the same n = 10 scores that the scientists got through all their hard work (see step 2 of the graphic).
The point is simple. Scientists have to do a lot of work in a research project to collect data. All this work is summarized in the statistical model simply as "sampling from a population."
3. Define Statistical Formula. The third step of the process is to define some statistic. By statistic we mean any of the formulas you are familiar with such as the Mean or little r. Or we might have in mind some statistic you have yet to study, such as t or F or Chi Square. Don't worry about these for now; we'll get to them later.
In our current example, we are going to make up a very simple statistic. Since we're counting the number of girls in our sample, then our statistic will be called G for number of girls. The formula for our statistic will be very simple. It will just be the sum of the X's. Each X is either a 1 or a 0, so if you sum all the the 0's and 1's, the 0's won't count for anything and the 1's will. If our ten scores were 1,0, 0, 1,1, 0,1 0, 1,1, then the sum of X would be 6, because there are six 1's. Each 1 indicates a girls so G = 6 means there were 6 girls in the sample. X is called an indicator variable because it indicates a particular event (in this case a girl).
In our research we want to count the number of girls. The statistic we have defined,G = sum of X, counts the number of girls in the sample. So the statistic is a good one in the sense that it accomplishes our goal.
4. Find the Sampling Distribution of the Statistic. The last step is to figure out what the probability distribution of the statistic is. Since our research question is leading us to count the number of girls in our sample of 10 births, we are going to want to know answers to questions like what is the probability of exactly 7 girls in 10 births. Or we might want to know what is the probability of between 3 and 7 girls in 10 births. Notice that the statistic we defined in step three counted the number of girl births. If we find the probability distribution of our statistic, we can answer those kinds of questions.
The probability distribution of a sample statistic is called its sampling distribution.
Let's start figuring out how to find the sampling distribution of our statistic.

Argument that the sampling distribution should be the Binomial. If you think about it, then the sampling distribution of the statistic should be a binomial distribution. This is because our population is a Bernoulli Trial. So our random sample of ten births is 10 independent Bernoulli Trials. Our DV is X, where X = 0 for boy and X = 1 for girl. So X is a Bernoulli Trial. If a girl is a success, then p = .5. We have N = 10 births, therefore N = 10.
This all a perfect set up for using the Binomial Distribution which gives us r, the number of successes, in N Bernoulli Trials for any value of p. Our sample statistic (G = Sum of X) will gives r, the number of girls (or successes). So G is distributed as the Binomial Distribution.
In short, the sampling distribution of G is the Binomial Distribution.

So now we see the full four steps of the sampling distribution process. 1) We take some process in nature, measure it, and model it as a population. In this case the population is a Bernoulli Trial. 2) We take a random sample of size n from the population. This generates a number of sample data points or scores. We generally give these scores some symbol, like X. 3) We define a statistic on the sample data. In this case the sample statistic is G = Sum of X. 4) We discover via some logical-mathematical argument what the sampling distribution of the sample statistic is. In this case, the sampling distribution of G was the Binomial.
Finding the Sampling Distribution is often difficult. Finding the sampling distribution of various statistics is, in general, non-trivial. It is one of the most mathematically sophisticated parts of statistics. In our simple example, we were able to make a rigorous argument that the sampling distribution of the statistic G is the Binomial, even assuming very minimal math as a background. But finding the sampling distribution for most applied statistics is very difficult. So now that we have a simple example to form a basis for our understanding, we will skip the logic of going from step 3 to step 4, and simply tell you what the sampling distribution of various statistics are. What you need to understand is this overall, 4-step schema.
The homework will give you lots of practice using and assimilating this schema.
Population versus Sampling Distribution. One crucial thing to notice is that the population and the sampling distribution are different probability distributions. The Population gives you probabilities that an individual score will take on certain values. The Sampling Distribution gives you the probability that a statistic (which is a function of many individual scores) will take on certain values.

We've looked at how the Binomial Distribution can be used as a Sampling Distribution when the Population is a Bernoulli process. Let's move on now to examining a case involving the Normal Probability Distribution.
When the sample statistic is the mean, and the Population is a Normal Distribution, what is the Sampling Distribution of the Mean?

Spatial Ability Example. Suppose a research group develops a test of Spatial Ability. They will call the score a person gets on their test a Spatial Ability Quotient (SAQ). The next screen shows examples of the kinds of procedures (operations) that might be included on a test of Spatial Ability.

A typical question on a spatial ability test is shown on the screen. Which figure below (a or b) is the upper figure rotated by 180 degrees? The test would have a long string of these kinds of questions. At the end each person gets a number, depending on how many they got right. The answer, by the way, to the sample question is "a."

Let's say SAQ scores can be modeled as a normal distribution with mu equal to 150 and sigma equal to 30. This is another example of how people might be measured in psychology and how the measurement operations might be modeled as a normal distribution.

Abduction Again. Once again we have a summary of the process by which a person (or other process in nature) can come to be modeled as a probability distribution. In this case the operational definition of how the person is measured consists of the SAQ test. Next the SAQ test scores are modeled as a normal probability distribution. This normal distribution is what we will call our population.
Statistically, when a person takes the SAQ test, it is as if we've randomly sampled from from a normal probability distribution, with mu equal to 150 and standard deviation equal to 30.
Student Question: How does the researcher know that SAQ scores can be modeled as N(150,30)? A first answer is that people who make up standardized tests collect data on very large samples of people so that they can get good information about this question. A second answer is that in general in statistics we make the assumption that our DV's are modeled as a normal distributions. We may not know mu and sigma, but we generally assume the population is normal. A third answer is that for this example, I just made up the parameters (mu and sigma) so we would have a clear example to work with.
Summary. We have a person; from the picture he appears to be man. He used to be happy, but then he took this test. He had to sit there and stare at the picture and decide if it's a or b which is the top figure rotated 180 degrees. This poor guy has to answer a whole bunch of similar questions, and if he misses any he's going to lose points. When he completes the test he gets a number, a test score. So the scientific measurement operations reduce the human being to a number. Finally we model these numbers as a normal probability distribution in which the average score is 150 and the standard deviation is 30.
This is how we come up with the population which we will use in the next section dealing with the sampling distribution of the mean.


Find the Sampling Distribution of the Mean (SDM). Okay, in the previous section we have developed the argument for Step 1. That is, we have assumed the probability distribution of spatial ability quotient (SAQ) is normal with a mean of 150, standard deviation of 30. In short, SAQ is modeled as N(150,30).
For Step 2, we've sampled 25 people, measured each of them with our test and got 25 spatial ability quotient scores. In other words, we've constructed our sample.
For Step 3, we find the mean of the sample which is simply the sum of all the scores over n.
Now comes the tricky part. The new concept that we're working on is finding the sampling distribution. In this case we have to find the sampling distribution of the mean (SDM). The SDM is a probability distribution which gives the probability that a sample mean will take on a certain score. In contrast, the original population gives the probability that an individual score will take on a certain value.
Step 4. Finding the SDM. If we want to find the SDM, there's a series of things we need to know.
The first thing we need to know is that it's been proven mathematically that if the population is normal then the sampling distribution of the mean must be normal. This is simple and elegant and important. If we assume that the population is normal then the SDM is normal.
That solves most of our problems in finding the SDM. We know it is a normal probability distribution. Now all we have to do is find its mu and sigma. (Recall that a normal distribution is completely specified by its mean and standard deviation.)
So the sampling distribution of the mean is normal if the original population is normal. Next, let's find its sigma (its standard deviation).

Let's introduce some new jargon and notation. We're going to call the standard deviation of the SDM the Standard Error of the Mean or, for short, SEM.
In math notation we will symbolize the SEM as "sigma sub M." Look on the graphic and you'll see a sigma with an M as a subscript. That's our symbol for the standard deviation of the sampling distribution of the mean.
In contrast the standard deviation of the population will be notated simply as sigma (like it is for any normal distribution).
Okay so how do we calculate the SEM?

There's an extremely simple formula for finding the standard error of the mean. SEM, or sigma sub m, is equal to the population standard deviation, (plain old sigma) divided by the square root of n, where n is sample size. So in other words, the standard error of the mean is the population standard deviation divided by the square root of the sample size.
In terms of the spatial ability example we've been developing, the population of individual SAQ scores was a normal distribution with a mean of 150 and standard deviation of 30. That is, the population is N(150, 30). We gave the test to 25 people, so our sample size is 25. Therefore, we have 30 over the square root of 25, which is 6. So the standard error of the mean is 6 in the example we're working with.

It turns out that it's even easier to find the mean of the sampling distribution of the mean because the mean of the SDM happens to be equal the mean of the original population. So we don't even have to do a calculation.
I know this sounds like word salad the first time you hear it, but the mean of the sampling distribution of the mean is the same as the mean of the original population. Consequently we don't even have a separate symbol for the mean of the SDM.
In terms of our example, all you have to do is recall that the mean of population of spatial ability quotients was 150. Therefore the mean of the sampling distribution of the mean will be 150.
So the mu of the SDM is just equal to the mu of the population. In this case it is equal to 150.
What have done is go through several steps that give us three pieces of information. First, if the population is normal, the sampling distribution of the mean is normal. Second, the standard error of the mean is given by a simple formula (population sigma over square root of n). Third, the mu of the population and the SDM are the same.
So we have fully specified the SDM. It's N(150, 6). In contrast the population is N(150, 30).
Let's look back to our overview again. Now we have all four steps fully developed..

This screen illustrates the four major ideas we have been talking about: 1) a population; 2) a sample drawn from the population; 3) a sample statistic (the mean); and 4) the sampling distribution of the mean.

Summary To find the sampling distribution of the mean we need to know a few things. First, if the population is normal then the SDM is normal. Second, the mu of the SDM is the same as the mu of the population. Third, the standard deviation of the SDM is simply the population sigma divided by the square root of n.

Here's another little aspect that's interesting. The population is N(150, 30) and the SDM is N(150, 6). They are identical normal distributions except that the SDM has a smaller standard deviation. Therefore the SDM is taller and less spread out than the population. The screen we are looking at puts them side by side. As you know from your previous experience with the normal distribution, as sigma gets smaller, the distribution gets thinner and taller. As sigma gets larger the distribution gets shorter and wider.
The sampling distribution of the mean is going to be more compact, more of the probability will be closer to the center, 150 in this case, than is true of the population. The sampling distribution of the mean is less variable than the population. That will turn out to be a good and interesting characteristic. Sample means are less variable than individual sample scores.

So lets go back and summarize. You start with some process in nature and you measure it via some dependent variable measurement operations; then you model your dependent variable as a population. Next you go to the lab and do the research. That is, you collect a sample from the population. Third, when you're done with the research, you analyze the data, and one of the simplest things you can do to analyze the data is find the average or the mean. Finally, you find the probability distribution of sample means.
Questions? SEM is short for the standard error of the mean (which itself is just a short way of saying the standard deviation of the sampling distribution of the mean).
PRACTICE PROBLEM. Let's use what we've learned to solve some probability problems.
Question 1: What is the probability that a single individual's SAQ score falls between 140 and 160? You've already practiced problems like this. If you want to know the probability that a single SAQ score falls between 140 and 160, you use the population.
Question 2: What is the probability that the mean of a sample of individuals falls between 140 and 160. This is a different question. It is about the mean not about an individual score. If you want to know the probability that a sample mean falls between 140 and 160 you use the SDM to answer the question.
Let's solve those two questions. Open up StatCenter's "Normal Tool." Suppose you want to answer the first question which requires that you find the probability that one score falls between 140 and 160. To answer this, set Normal Tool's mu to 150 and sigma to 30 because the population from which that individual is sampled is N(150, 30). Then enter 140 as the lower score and 160 as the upper score. The probability output window will show you that the probability is .2586 of getting an individual score between 140 and 160.
The second question asks you to find the probability that the mean of a sample of 25 individuals will fall between 140 and 160. On the Normal Tool, keep mu the same (150) but change sigma to 6 because the SDM is N(150, 6). Enter 140 and 160 as lower and upper scores. Now the Normal Tool will output a .9031 probability. The probability that a mean will fall between 140 and 160 is .9031.
Look back at one of the illustrations showing both the population and the SDM. The SDM is the less variable of the two; it is taller and thinner. Since the SDM is less variable than the population can you figure out why the probability of the mean falling between 140 and 160 is less than the probability of a single score falling between the same two points?


Two ways to think of sampling a mean. There are two ways to think about sampling means. The first way is that we assume there's a population, we sample it, and we get a mean. If we did that we might get a specific mean, lets say 148.76, in a particular sample. In other words, I've constructed a specific sample of 25 people. The graphic shows that the first person's score was 156, the second person's score was 137 and so on down to the last person's score, which was 148. I haven't shown all of them but lets say we've have 25 scores. I calculate the mean and I get 148.76. That's the whole process. That's one way to think about how a mean comes into existence.

A second way to think about how a mean comes into existence is to suppose that we already have defined the sampling distribution of the mean. This option can be seen under the number 2 on the screen.
If the SDM is defined, we can think of sampling one mean from it. In our particular case, we sample once from the SDM and get a mean = 148.76.
Those are two rather distinct ways to think about how a mean comes into existence. It can be sampled directly from the SDM or you can think of sampling a whole large research sample from the original population, and then calculating the mean.
Just to practice, here's another example. Suppose I ask what is the probability of sampling a mean between 144 and 156? Notice that these two numbers are one SEM below and one standard deviation above 150. By now we might even have memorized that the probability of falling between -1 and +1 standard deviation from mu is .6827. So we know that the probability of getting a mean between 144 and 156 is .6827.
Review: Parameters versus Statistics. You need to make a strong distinction between population parameters, like mu and sigma, and sample statistics like the mean and the sample standard deviation. In our SAQ example, the mu of the population is the same as the mu of the sampling distribution of the mean; they are both 150. And they are both population parameters.
In contrast, the mean of the sample is the average of a bunch of scores. You have to calculate the sample mean from the data. In our example, we didn't list all 25 scores in the sample. But if we had them all we could calculate the sample mean. The answer we gave for the sample mean was 148.76.
In general, when you draw a sample, you usually don't get a sample mean that is exactly the same as the population mean.
It is important to make this distinction between population parameters and sample statistics.
StatCenter Tools
StatCenter's SDM Tool is designed to let you experience first hand all the processes involved in the Sampling Distribution of the Mean. From these experiences you will naturally learn about the relationship between a Population, a Sample, a Mean of a Sample, and the Sampling Distribution of the Mean. As a way of focusing on these learning experiences, you will be given homework using the SDM Tool.

SDM Tool. Lets look at the StatCenter sampling distribution of the mean tool. From Ducks in a Row simply click on the "Use Sampling Distribution of the Mean Tool" link. From you Virtual Desk, just click on your "Interactive Tool" icon and select "SD of Mean Tool" from the list of interactive tools.
A
"StatCenter" web page will pop up. If you scroll down on it, it has
a practice problem or two.
A menu for the Sampling Distribution of the Mean tool will also pop up. It has two buttons. Click the lower button which says, "SDM Tool."

Upper Right Hand Panel. You'll find two distributions in the upper right hand panel of the SDM Tool. The black distribution is the population. The red distribution is the sampling distribution of the mean. They have the same mu so the two distributions are centered at the same value (because, of course, they have the same center). The population and SDM differ only in that they have different standard deviations. And so the population (which has a larger sigma) is lower and wider and the SDM (which has a smaller sigma) is thinner and taller.
Upper Left Hand Panel. Looking at the upper left hand panel, you see an interface where you can enter information. There is a place where you can set population mu depending on what's given in the homework or test problem. You can also set the population standard deviation. Finally, you can set the sample size, n.
Sample size, n, along with population mu and sigma, are the three really important pieces of information you need to get from a word problem. You need to set all three of them to use the SDM Tool. As an example to work with, set population mu = 100, sigma = 5, and n = 10. Set mu, sigma, and n. PRESS UPDATE.
The SDM tool will immediately and automatically give you the mean and standard deviation of the SDM. (Remember that the standard deviation of the SDM is also called the standard error of the mean or SEM.)
Press the "Sample" button in the lower right hand corner panel.

Lower Left Hand Panel. Sample scores will appear in the lower left hand panel when you press the "Sample" button. (The "Sample" button is in the lower right panel.)
The sample scores are called empirical data since they correspond to the data collected in a research project. The SDM Tool calculates the mean of the scores automatically for you. This mean is the empirical mean that a scientist would calculate in a research project. It is always important to distinguish between the theoretical population mean, mu, and the empirical research mean.
New sets of empirical data will appear every time you press the "Sample" button. Notice that (empirical) individual scores are written in black. That is because they are sampled from the (theoretical) population which is black. In contrast, the (empirical) sample mean is written in red. That is because it is sampled from the (theoretical) SDM which is red.
Click the "Sample" button many times. Notice how the sample data and the sample mean change with each new sample you take. The theoretical populations are constant and unchanging. The empirical data change with every sample. Just stare at the data as you click or just stare at the sample mean. Notice how their values change.
Lower Right Panel. As you click "Sample" many times, also notice the lower right panel. Every time you click "sample" a small red hatch mark appears representing that mean. Each empirical mean is a different value so the hatch marks are placed in different places along the number line. If two means have values very close to each other the hatch marks are piled on top of one another. Click "Sample" quickly many times to get a sense of this.
If you click quickly, over and over, these hatch marks will eventually stack up and begin to take the shape of the Sampling Distribution of the Mean. This shows you that across huge numbers of samples there is an empirical frequency distribution of sample means which looks roughly like the theoretical SDM (shown above it in the upper right hand panel).

1 to 10,000 Samples per click. There's a pop-up menu in the lower right panel just below where the red hatch marks appear. There's a small white window. Next to the window is an arrow. Click on the arrow. A menu will pop down. This menu will give you the choice of how many samples you can draw with a single click. You can draw 1, 5, 10, 50, 100, 500, 1,000, or 10,000 samples with a single click. This allows you to easily see the evolution of the shape of the frequency distribution of sample means.
Play with taking large numbers of samples with a single click. You'll notice that the shape of the empirical distribution of sample means quickly conforms to the normal shape of the theoretical SDM.
How many samples are necessary before you think the frequency distribution of empirical means closely takes on the shape of the the theoretical SDM? Do you need hundreds of samples? thousands? tens of thousands?
A central idea in using this tool is to compare the theoretical SDM (upper right panel) with the distribution of sample means which pile up empirically (lower right panel).
[Note: The empirical data and mean in the lower left panel only change when the number of samples per click is set to 1. This is because when more than one sample is collected with a single click, the computer samples means directly from the SDM rather than sampling n different scores from the population and then computing the sample mean. This allows it to get 1,000 or even 10,000 mean very quickly.]
Summary of Sampling Distributions. We will wind up this topic with a review. It's important to realize that a scientist is filled with curiosity about the mystery of nature. What interests us most is that somewhere out there is stuff that's vastly beyond all of our theories. As Shakespeare put it, "There are more things in heaven and earth, Horatio, than are dreamt of your philosophy." It's these undreamt of, undiscovered, things that fascinate us so. In psychology we're fascinated with human beings who are always elusively beyond the reach of any theory no matter how good that theory is. Other scientists might be fascinated with plant ecology in a rain forest. This fascination leads scientists to study the world.
Abduction. In studying the universe, one thing scientists do is to measure things, to reduce them to dependent variable measurement operations. This reduction of what we are interested in to numbers is what leads to statistics. In statistics we model the numbers we get as random variables or, in different words, as probability distributions. These probability distributions we call populations. This process of moving sideways across ideas and models from nature to measurement to probability distribution is called abduction.
By way of contrast with abduction, induction is to infer upward from information to a higher order principle. Deduction is to infer downward from a principle to lower order consequences. Abduction is neither upward nor downward. It is the sideways movement from one way of conceptualizing to another way of conceptualizing on the same level. For statistics we take the scientific idea of measurement and model it as a probability distribution.
Four Step Schema. Step one is (through abduction) to model our dependent variable as a population. The population gives us the probability that a single score will take on certain values. Second, draw a sample of many scores. This is not a simple process. It requires that we do research projects that often take years to complete. In statistics we summarize years of thought and effort with the simple phrase "draw a sample." Third we analyze the data in the sample with some statistic, such as the mean. Finally we find the sampling distribution of the statistic.

Word Salad. This four step process often leads to us speaking in confusing phrases which sound like word salad. For example, we need to find the mean of the sampling distribution of the mean, which might be shortened to the mean of the mean. Of course we just found out that the mean of the SDM is the mean (mu) of the population. But, there we go; until you are used to these ideas the previous sentence might sound like word salad.
We also might talk about the standard deviation of the mean (which is sometimes called the standard error of the mean). In this lecture we found out that the standard deviation of the mean is equal to the population standard deviation divided by the square root of sample size (which is more word salad until you work with these ideas for a while).
The point is that it's ok to feel confusion at this point. These are abstract and tricky concepts and you need to work with them extensively using StatCenter tools and doing homework. You should be aiming for a time when these word salad phrases take on clear meaning.