Central Tendency Web Page
|
Click above to start an Interactive Visual Presentation
(Plugin Required)
|
|
Click here
to go to our plugin download and plugin tutorial page
|
This is the text of the in-class lecture which accompanied the Authorware visual graphics on this topic. You may print this text out and use it as a textbook. Or you may read it online. In either case it is coordinated with the online Authorware graphics.

This map allows you to--
1. To find a topic which interests you: Look at the map of menus above. Choose a menu that interests you. Notice that the menu buttons have topics printed on them. Click on any button (topic) on the menu; you will jump directly to the text that corresponds to the topic printed on the button.
2. To coordinate this web page with Authorware presentations: The corresponding Authorware program should already be open. Go to the menu of your choice in the Authorware program and click any button which interests you. Then on the topic locator map above click on the same button on the same menu; you will jump to the text that corresponds to the Authorware presentation.
End of Topic Locator Map
Beginning of Central Tendency Text

Overview of Statistics. Now we're going to go into statistics proper. We're going to divide the world up into descriptive and inferential statistics.
We'll define Descriptive and Inferential statistics, and I suggest you write definitions down in the note outline you printed out from the web. But the definitions will take on real meaning only as we go through the course and experience with various statistics in the two categories. Descriptive statistics are mathematical formulas and functions which help us describe, summarize and communicate the main characteristics of large amounts of data. In contrast Inferential statistics are mathematical formulas and functions which help us make guesses and inferences about the characteristics of whole populations. Right now just get some sense that there's a division in statistics between two branches, descriptive and inferential.
Basically descriptive stats are exactly what the word "descriptive" says. They are ways of describing characteristics of large sets of numbers. Similarly, inferential statistics are procedures for making inferences.

Map of Statistics. This map of statistics is also in your printed-out Notes, and on the Authorware program. You can read it more easily in those two sources. Notice on the map that the first big division is between descriptive and inferential statistics. Later we will consider the inferential branch, so I won't talk about it here.
Descriptive Statistics. You can see that descriptive statistics is broken into three branches: Central Tendency, Variability, and Measures of Linear Relationship. Within each of those are several further subdivisions.
This lecture will address central tendency. You can see that the three branches of central tendency are the mean, the median, and the mode. After we finish central tendency, later lectures will cover variability and measures of relationship.
We will study lots of statistics and the relationships among them get complicated. You can refer back to this map as we go through the course if you ever wonder how it all goes together. It allows you to look back and see how what we're studying at any time relates to the overall structure of statistics.
What is central tendency? Suppose that we have a large group of numbers. Typically these numbers will be dependent variable measurements (e.g., reaction time, blood pressure, Beck Depression Inventory Score, hours spent in exercise per day) on the individuals who participate in our research. Central tendency refers to our intuition that there is a center around which all these scores vary. Or it is the sense we have that there is some number that typifies all the numbers. In short it's the idea that there is a center to all the numbers in our group. We might have ten numbers or a hundred or a thousand or ten thousand numbers. For large data sets all those numbers are quite a jumble of information to process. So one thing we'd like to do is find a single number that typifies all those numbers, that indicates what their center is.
Measures of Central Tendency. We are going to mention three formulas that follow from three different approaches to the intuitive idea that a set of numbers has a center. These formulas are the Mean, the Median, and the Mode. Each of these is a detailed way to specify the intuitive idea of central tendency.

The Mean. For the kind of statistics that we will be learning, the most useful measure of central tendency is the mean. So we'll start with the mean.
Formula. Let's look at the formula for the mean which you can see on the screen. We'll be symbolizing the mean by a capital M. But other people and books might symbolize the mean by a capital X with a line over it. That's more trouble for me to type, so I'll use M. If you look at the formula for the mean, the Greek capital sigma is a "sum sign" which indicates that we should add up or sum whatever comes after it. What comes after the sum sign is Xi which is our symbol for one score. The small subscript, i, attached to the X indicates we are talking about the score of one individual. But we have many individuals in our study. So the sum sign has an i =1 below it and an n above it. This means we should start adding with the score of the first person (i =1) and keep adding scores until we get to the last person (n). The symbol "n" is used throughout statistics to indicate the number (n) of scores in a group. So the top of the formula for the mean simply tells you to sum up all of the scores that you've got from the first to the last (n), whether there's ten scores or ten thousand scores. The formula then indicates that, once you sum up all the scores, divide by n. Divide the sum by the number of scores which contribute to the sum. You're probably familiar with this formula intuitively already. We use it so frequently in our culture that it hardly has to be taught, though it is more commonly called the average than it is called the mean. [PRESS CONTINUE]
Example.
Here's a simple example; there are four scores so n = 4. If the Xi's (or individual
scores) are 4, 8, 8, and 8, then the sum of the XI's (adding them up from the
first to the last) is 28. Plugging 28 into the formula for the mean (which is
the sum of the XI's over n) we put 28 over 4. We get M = 28/4 = 7. There is
nothing particularly difficult about that. [PRESS CONTINUE]
Example.
Here's another example, one with a bit more data. If the numbers are 12, 15,
11, 20, 13, 10, and 17, then adding them up gives us 98. There are n = 7 scores.
So 98 divided by 7 is 14. Our mean (which is the sum of Xi over n) is 14. Next
we will go on and examine some interesting characteristics of the mean.
[PRESS UP TO MENU BUTTON]
[PRESS "SUM OF DEVIATIONS = 0"]
Sum of Deviations = 0. One of the interesting characteristics of the mean is that the sum of the deviations around the mean is equal to zero. Let's see what that means using a very simple example. [PRESS "SIMPLE EXAMPLE"]
What is a deviation from the mean? An individual score's deviation from the mean is that score minus the mean. A deviation = Xi - M where Xi represents any single score. [PRESS CONTINUE]
Let's
look at the simple example we've been using where the Xi's are 4, 8, 8, 8. The
Mean is 7. Next to the scores we set up a second column where we create all
the deviations. The deviation of the first score is the first score, 4, minus
the mean, 7. As you can see 4 - 7 gives a deviation of minus 3. The second score,
8, gives us a deviation of 8 minus 7, which is plus 1. In that way we can create
a deviation for each of the scores.
You'll notice that when we add up these deviations the sum of Xi minus M is equal to zero. And that's not just a coincidence peculiar to this example; that's an algebraically true statement. The sum of Xi minus the Mean is always zero. That was kind of a simple case so let's look at another one that's a little more complicated. [PRESS NEXT]
This
is another example we've been using, where the Xi's are 12, 15, 11, 20, 13,
10, and 17. For each of the individual scores we can create a deviation, 12
minus 14 is -2, and so forth. The last score, 17 minus 14, has a deviation of
+3. Again when we add up all of those deviations we get a sum of zero. And that
will always be the case, other than possibly rounding error. You may be off
by some hundredths or thousandths, due to carrying decimals, but algebraically
the sum of the deviations around the mean is always zero.
[PRESS "UP TO MENU" BUTTON until you are at the Mean Menu]
[PRESS "BALANCE BEAM"]
Balance Beam. Let's look at the mean from the point of view of a beam balanced on a pivot point. In everyday life this is like a balance scale or a teeter-totter. [PRESS "PIVOT AT MEAN"]
We'll
use the simple example which we've worked with already where the individual
scores are 4, 8, 8, 8, and the mean of those four scores is 7. Now we will represent
the scores as blocks which all weigh exactly the same. Put all the scores (represented
as equal blocks) on a teeter-totter or a balance beam. This teeter-totter runs
along a number line so you put one block at 4 because it represents the score
which has a value of 4. And you put three blocks stacked on top of each other
at 8, because there's three 8's. Then if you put the pivot point right at the
mean (7), the system will balance perfectly. The mean is the balance point around
which all of the weight on one side is equal to all of the weight on the other
side. And that's really the same thing as saying what we discussed previously--that
the sum of the deviations around the mean is equal to zero.
So in terms of physics the mean is the center of gravity, or balance point.
Notice that there is a way to move the pivot point up or down. You can play with moving the pivot point above or below the mean. For this discussion, we'll assume you move the pivot point up above the mean to a higher value on the number line. [PRESS "NEXT" ARROW which moves the pivot point up]

Were we to move our pivot point up above the mean, then the whole teeter-totter would tilt down on the left side and up on the right side. [PRESS THE "PREVIOUS" ARROW to move the pivot back to the mean. PRESS AGAIN to move the pivot below the mean]
You'll
notice that when we move the pivot below the mean, then the teeter-totter tilts
the other way. On the graphics, I've made the movements rather large, but even
if you moved the pivot slightly to one side of the mean or the other, the whole
teeter-totter would tilt.
The mean is the unique point at which the weights of all the scores will balance on a teeter-totter or on a balance beam. So that's another interpretation of the mean as a measure of central tendency--it's the center of a group of numbers in the sense of center of gravity or balance point.
[PRESS "UP TO MENU" until you get to the main Central Tendency Menu]
[PRESS MEDIAN]
Median. Let's go ahead and look at the median. In this particular class, we won't go into much depth about the median. We will only define the basic concept of the median and apply the concept to very simple examples. We will also talk about the advantages and disadvantages of the median relative to other measures of central tendency.
The median is the halfway point. The median is the point above which half the scores reside, and below which half the scores reside. To look at that in terms of an example, change to the next screen.

Example. Here's an example. We have seven numbers. The first thing we have to do is line them up along a number line.

As we line up the numbers in order, we find that there's two 10's; so we just stack them on top of each other. Now what you want to find is the point above which and below which there are an equal number of numbers. Try to do that before going to the next screen.

On the next screen we can see that the median is 9 because there are three scores below 9 and three scores above 9. Conceptually, the median is the point above which and below which there are an equal number of scores.
Now I've chosen the easiest case for calculating the median. There's all kinds of difficult cases, but in this class, as far as the things that we're going do, we're not going to offer any more difficult cases. I want you just to have the general idea that the median is the center point and there's just as many scores below it as there are above it.
There's more to know about the median. If you have an even number of scores, then of course you're going to have to have a way to find the median as a point between the two middle scores. Generally, people just average those two middle scores, that is, add them up and divide by two and put the median at the point that's half way between the two middle scores. There are even more complicated cases than that. But, as I said, we won't do anything complicated with the median. Being able to do a simple example like we've done and being able to understand the median as the point above and below which there are an equal number of scores is sufficient. [PRESS THE "UP TO MENU" BUTTON" until you get to the main central tendency menu]

[PRESS "MODE"]
Definition of Mode. The third measure of central tendency, which is the simplest of all, is the mode. It is the most frequent of all scores.
Example.
Here we have the same set of scores we've been using. Let's put them in order,
just like we did with the median.
You'll
notice that there's only one of each score, but there are two 10's. That makes
10 the mode because it occurs twice and everything else occurs once. Ten is
the most frequent of all the scores. So the mode is the measure of central tendency
in the sense of being the score that occurs most often. That's certainly one
way to describe what's typical in set of numbers. For the simple cases we will
do, there's no calculations to do to get the mode.
Now the mode does not have to be unique, you can have a bi-modal or tri-modal set of numbers. That is, you can have two or three different numbers that have the same or almost the same frequency. [PRESS "UP TO MENU" BUTTON until you get to the main central tendency menu]

[PRESS "MEDIAN VS MEAN"]
Mean versus Median. Now let's compare the mean and the median. The mean is the measure of central tendency that we're going to use most in this class because it is the basis of most of the advanced statistics that we will be developing. For that reason the mean is important. But the the median acts a little differently than the mean; the two measures of central tendency have different operating characteristics. So it's important to have a basic idea of when it is appropriate to use median versus the mean.
On the screen you can see two cases, case one and case two. You'll notice that case two has what's called an extreme score or outlier. An outlier is a score that does not seem to fit with the other scores. Notice that in case two we have 69. All other numbers are 1 and 10, so 69 is just much larger than any of the others. In case one we don't have this extreme score, we have numbers between 1 and 17, and they all seem to be in the same range.
One interesting property of the median is that if you do have extreme scores the median is not affected by extreme scores. Notice that the median, shown in blue, is the same for both cases. Extreme scores don't make any difference in the value of the median. The median is the point (in this case 9) at which there are an equal number of numbers above and below. So it doesn't matter that we change 17 to 69. Making 17 into 69, creating an extreme doesn't affect our calculation of the median. It is still 9. Sixty-nine is just one more score above the median. So the median is not affected by extreme scores.
The mean on the other hand is highly affected by extreme scores. Notice that the value of the mean, shown in red, is affected by the extreme score. In case one the mean is 8 and in case two the mean is 15.4. We've almost doubled the mean by changing one score. So the mean is extremely sensitive to extreme scores.
It may be a good thing or it may be a bad thing in a particular conceptual context to have your measure of central tendency be sensitive or insensitive to extreme scores. It just depends. What you should know is that the median is not sensitive to extreme scores while the mean is. They operate differently when there are extreme scores. [PRESS "UP TO MENU" BUTTON until you get to the main central tendency menu]

[PRESS "INCOME-$"]
Family Income. Let's look at family income in dollars. And I took numbers from the local newspaper in about 1996. You can see a frequency distribution of household income in dollars in the United States. On the graph, the vertical axis (Frequency) shows the number of families which earn various levels of income.
You can see the number of families who earn a given amount of money. There is a very low number (frequency) of families who earn very low incomes, near zero. Then the frequency increases up to a peak, the most frequent score, at 22,000 dollars. So the mode is $22,000. The median is at $33,000 and the mean is 45,000. So depending on how you want to represent central tendency you can say the typical household in the United States earns $22,000 or you could say the typical household earns $33,000 or you could say the typical household earns $45,000. Now the reason the mean and the median are so different is because of extremely rich people, billionaires and multi-millionaires. These extreme scores will pull up the mean household income. But the median is just the point above and below which there are an equal number of household incomes. It doesn't matter to the median that there are some billionaires and millionaires out in the extreme pulling the average up. A billionaire is just one more household above the median. So the median is not sensitive to the extreme scores of high wage earners while the mean is.
And so you might notice if you're reading data in the newspaper, whether they are reporting the median or the mean. It makes a difference in you interpretation whether someone's reporting the mean or the median. The thing is people often say phrases like "average income" without specifying whether they mean the mean, the median, or the mode (although, the word "average" most often refers to the mean). You need to know what measure of central tendency people are using to make sense of what their data means.
And of course if you want to know what kind of family income the most number of Americans experience, then you would want to know the mode. Modal income in the United States was around $22,000 when these data were collected. That's a very different impression of central tendency than saying household income in the United States is $45,000.