|
Central Tendency
Web Page
©Copyright 1997, 2000 Tom
Malloy
This is the text of the
in-class lecture which accompanied the Authorware visual graphics
on this topic. You may print this text out and use it as a textbook.
Or you may read it online. In either case it is coordinated with
the online Authorware graphics.
Topic
Locator Map


This
map allows you to--
-
Jump directly to a topic which interests you.
-
Co-ordinate the dynamic visual Authorware presentations with the
corresponding text available on this web page.
1.
To find a topic which interests you: Look at the map of menus
above. Choose a menu that interests you. Notice that the menu buttons
have topics printed on them. Click on any button (topic) on the
menu; you will jump directly to the text that corresponds to the
topic printed on the button.
2.
To coordinate this web page with Authorware presentations: The
corresponding Authorware program should already be open. Go to the
menu of your choice in the Authorware program and click any
button which interests you. Then on the topic locator map above
click on the same button on the same menu; you will
jump to the text that corresponds to the Authorware presentation.
End
of Topic Locator Map
Beginning of Central Tendency Text
Overview of Statistics.
Now we're going to go into statistics proper. We're going to divide
the world up into descriptive and inferential statistics.
Back
to Topic Locator Map
We'll define Descriptive and Inferential statistics,
and I suggest you write definitions down in the note outline you
printed out from the web. But the definitions will take on real
meaning only as we go through the course and experience with various
statistics in the two categories. Descriptive statistics
are mathematical formulas and functions which help us describe,
summarize and communicate the main characteristics of large amounts
of data. In contrast Inferential statistics are mathematical
formulas and functions which help us make guesses and inferences
about the characteristics of whole populations. Right now just get
some sense that there's a division in statistics between two branches,
descriptive and inferential.
Basically descriptive stats are exactly what the word
"descriptive" says. They are ways of describing characteristics
of large sets of numbers. Similarly, inferential statistics are
procedures for making inferences.

Map of Statistics.
This map of statistics is also in your printed-out Notes, and on
the Authorware program. You can read it more easily in those two
sources. Notice on the map that the first big division is between
descriptive and inferential statistics. Later we will consider the
inferential branch, so I won't talk about it here.
Descriptive Statistics.
You can see that descriptive statistics is broken into three branches:
Central Tendency, Variability, and Measures of Linear Relationship.
Within each of those are several further subdivisions.
This lecture will address central tendency.
You can see that the three branches of central tendency are the
mean, the median, and the mode. After we finish central tendency,
later lectures will cover variability and measures of relationship.
We will study lots of statistics and the relationships
among them get complicated. You can refer back to this map as we
go through the course if you ever wonder how it all goes together.
It allows you to look back and see how what we're studying at any
time relates to the overall structure of statistics.
What is central
tendency? Suppose that we have a large group of numbers.
Typically these numbers will be dependent variable measurements
(e.g., reaction time, blood pressure, Beck Depression Inventory
Score, hours spent in exercise per day) on the individuals who participate
in our research. Central tendency refers to our intuition that there
is a center around which all these scores vary. Or it is the sense
we have that there is some number that typifies all the numbers.
In short it's the idea that there is a center to all the numbers
in our group. We might have ten numbers or a hundred or a thousand
or ten thousand numbers. For large data sets all those numbers are
quite a jumble of information to process. So one thing we'd like
to do is find a single number that typifies all those numbers, that
indicates what their center is.
Measures of Central Tendency.
We are going to mention three formulas that follow from three different
approaches to the intuitive idea that a set of numbers has a center.
These formulas are the Mean, the Median, and the Mode. Each of these
is a detailed way to specify the intuitive idea of central tendency.

The Mean. For the kind of statistics
that we will be learning, the most useful measure of central tendency
is the mean. So we'll start with the mean.
Back to Topic Locator
Map
Back
to Topic Locator Map
Formula.
Let's look at the formula for the mean which you can see on the
screen. We'll be symbolizing the mean by a capital M. But other
people and books might symbolize the mean by a capital X with a
line over it. That's more trouble for me to type, so I'll use M.
If you look at the formula for the mean, the Greek capital sigma
is a "sum sign" which indicates that we should add up
or sum whatever comes after it. What comes after the sum sign is
Xi which is our symbol for one score. The small subscript, i, attached
to the X indicates we are talking about the score of one individual.
But we have many individuals in our study. So the sum sign has an
i =1 below it and an n above it. This means we should start adding
with the score of the first person (i =1) and keep adding scores
until we get to the last person (n). The symbol "n" is
used throughout statistics to indicate the number (n) of scores
in a group. So the top of the formula for the mean simply tells
you to sum up all of the scores that you've got from the first to
the last (n), whether there's ten scores or ten thousand scores.
The formula then indicates that, once you sum up all the scores,
divide by n. Divide the sum by the number of scores which contribute
to the sum. You're probably familiar with this formula intuitively
already. We use it so frequently in our culture that it hardly has
to be taught, though it is more commonly called the average than
it is called the mean. [PRESS CONTINUE]
Example.
Here's a simple example; there are four scores so n = 4. If the
Xi's (or individual scores) are 4, 8, 8, and 8, then the sum of
the XI's (adding them up from the first to the last) is 28. Plugging
28 into the formula for the mean (which is the sum of the XI's over
n) we put 28 over 4. We get M = 28/4 = 7. There is nothing particularly
difficult about that. [PRESS CONTINUE]
Example.
Here's another example, one with a bit more data. If the numbers
are 12, 15, 11, 20, 13, 10, and 17, then adding them up gives us
98. There are n = 7 scores. So 98 divided by 7 is 14. Our mean (which
is the sum of Xi over n) is 14. Next we will go on and examine some
interesting characteristics of the mean.
[PRESS UP TO MENU BUTTON]

[PRESS "SUM OF DEVIATIONS = 0"]
Sum of Deviations = 0. One
of the interesting characteristics of the mean is that the sum of
the deviations around the mean is equal to zero. Let's see what
that means using a very simple example. [PRESS "SIMPLE EXAMPLE"]
Back
to Topic Locator Map
What is a deviation from the mean?
An individual score's deviation from the mean is that score minus
the mean. A deviation = Xi - M where Xi represents any single score.
[PRESS CONTINUE]
Let's
look at the simple example we've been using where the Xi's are 4,
8, 8, 8. The Mean is 7. Next to the scores we set up a second column
where we create all the deviations. The deviation of the first score
is the first score, 4, minus the mean, 7. As you can see 4 - 7 gives
a deviation of minus 3. The second score, 8, gives us a deviation
of 8 minus 7, which is plus 1. In that way we can create a deviation
for each of the scores.
You'll notice that when we add up these deviations the sum of Xi
minus M is equal to zero. And that's not just a coincidence peculiar
to this example; that's an algebraically true statement. The sum
of Xi minus the Mean is always zero. That was kind of a simple case
so let's look at another one that's a little more complicated. [PRESS
NEXT]
This
is another example we've been using, where the Xi's are 12, 15,
11, 20, 13, 10, and 17. For each of the individual scores we can
create a deviation, 12 minus 14 is -2, and so forth. The last score,
17 minus 14, has a deviation of +3. Again when we add up all of
those deviations we get a sum of zero. And that will always be the
case, other than possibly rounding error. You may be off by some
hundredths or thousandths, due to carrying decimals, but algebraically
the sum of the deviations around the mean is always zero.
[PRESS "UP TO MENU" BUTTON until you are at the Mean
Menu]

[PRESS "BALANCE BEAM"]
Back
to Topic Locator Map
Balance Beam. Let's look at
the mean from the point of view of a beam balanced on a pivot point.
In everyday life this is like a balance scale or a teeter-totter.
[PRESS "PIVOT AT MEAN"]
We'll
use the simple example which we've worked with already where the
individual scores are 4, 8, 8, 8, and the mean of those four scores
is 7. Now we will represent the scores as blocks which all weigh
exactly the same. Put all the scores (represented as equal blocks)
on a teeter-totter or a balance beam. This teeter-totter runs along
a number line so you put one block at 4 because it represents the
score which has a value of 4. And you put three blocks stacked on
top of each other at 8, because there's three 8's. Then if you put
the pivot point right at the mean (7), the system will balance perfectly.
The mean is the balance point around which all of the weight on
one side is equal to all of the weight on the other side. And that's
really the same thing as saying what we discussed previously--that
the sum of the deviations around the mean is equal to zero.
So in terms of physics the mean is the center of gravity, or balance
point.
Notice that there is a way to move the pivot point up or down.
You can play with moving the pivot point above or below the mean.
For this discussion, we'll assume you move the pivot point up above
the mean to a higher value on the number line. [PRESS "NEXT"
ARROW which moves the pivot point up]

Were we to move our pivot point up above the mean, then the whole
teeter-totter would tilt down on the left side and up on the right
side. [PRESS THE "PREVIOUS" ARROW to move the pivot back
to the mean. PRESS AGAIN to move the pivot below the mean]
You'll
notice that when we move the pivot below the mean, then the teeter-totter
tilts the other way. On the graphics, I've made the movements rather
large, but even if you moved the pivot slightly to one side of the
mean or the other, the whole teeter-totter would tilt.
The mean is the unique point at which the weights of all the scores
will balance on a teeter-totter or on a balance beam. So that's
another interpretation of the mean as a measure of central tendency--it's
the center of a group of numbers in the sense of center of gravity
or balance point.
[PRESS "UP TO MENU" until you get to the main Central
Tendency Menu]
[PRESS MEDIAN]
Back
to Topic Locator Map
Median. Let's go ahead and
look at the median. In this particular class, we won't go into much
depth about the median. We will only define the basic concept of
the median and apply the concept to very simple examples. We will
also talk about the advantages and disadvantages of the median relative
to other measures of central tendency.
The median is the halfway point. The median is the point above
which half the scores reside, and below which half the scores reside.
To look at that in terms of an example, change to the next screen.

Example. Here's an example.
We have seven numbers. The first thing we have to do is line them
up along a number line.

As we line up the numbers in order, we find that there's two 10's;
so we just stack them on top of each other. Now what you want to
find is the point above which and below which there are an equal
number of numbers. Try to do that before going to the next screen.

On the next screen we can see that the median is 9 because there
are three scores below 9 and three scores above 9. Conceptually,
the median is the point above which and below which there are an
equal number of scores.
Now I've chosen the easiest case for calculating the median. There's
all kinds of difficult cases, but in this class, as far as the things
that we're going do, we're not going to offer any more difficult
cases. I want you just to have the general idea that the median
is the center point and there's just as many scores below it as
there are above it.
There's more to know about the median. If you have an even number
of scores, then of course you're going to have to have a way to
find the median as a point between the two middle scores. Generally,
people just average those two middle scores, that is, add them up
and divide by two and put the median at the point that's half way
between the two middle scores. There are even more complicated cases
than that. But, as I said, we won't do anything complicated with
the median. Being able to do a simple example like we've done and
being able to understand the median as the point above and below
which there are an equal number of scores is sufficient. [PRESS
THE "UP TO MENU" BUTTON" until you get to the main
central tendency menu]
[PRESS "MODE"]
Back
to Topic Locator Map
Definition of Mode. The third
measure of central tendency, which is the simplest of all, is the
mode. It is the most frequent of all scores.
Example.
Here we have the same set of scores we've been using. Let's put
them in order, just like we did with the median.
You'll
notice that there's only one of each score, but there are two 10's.
That makes 10 the mode because it occurs twice and everything else
occurs once. Ten is the most frequent of all the scores. So the
mode is the measure of central tendency in the sense of being the
score that occurs most often. That's certainly one way to describe
what's typical in set of numbers. For the simple cases we will do,
there's no calculations to do to get the mode.
Now the mode does not have to be unique, you can have a bi-modal
or tri-modal set of numbers. That is, you can have two or three
different numbers that have the same or almost the same frequency.
[PRESS "UP TO MENU" BUTTON until you get to the main central
tendency menu]
[PRESS "MEDIAN VS MEAN"]
Back
to Topic Locator Map
Mean versus Median. Now let's
compare the mean and the median. The mean is the measure of central
tendency that we're going to use most in this class because it is
the basis of most of the advanced statistics that we will be developing.
For that reason the mean is important. But the the median acts a
little differently than the mean; the two measures of central tendency
have different operating characteristics. So it's important to have
a basic idea of when it is appropriate to use median versus the
mean.
On the screen you can see two cases, case one and case two. You'll
notice that case two has what's called an extreme score or outlier.
An outlier is a score that does not seem to fit with the other scores.
Notice that in case two we have 69. All other numbers are 1 and
10, so 69 is just much larger than any of the others. In case one
we don't have this extreme score, we have numbers between 1 and
17, and they all seem to be in the same range.
One interesting property of the median is that if you do have extreme
scores the median is not affected by extreme scores. Notice that
the median, shown in blue, is the same for both cases. Extreme scores
don't make any difference in the value of the median. The median
is the point (in this case 9) at which there are an equal number
of numbers above and below. So it doesn't matter that we change
17 to 69. Making 17 into 69, creating an extreme doesn't affect
our calculation of the median. It is still 9. Sixty-nine is just
one more score above the median. So the median is not affected by
extreme scores.
The mean on the other hand is highly affected by extreme scores.
Notice that the value of the mean, shown in red, is affected by
the extreme score. In case one the mean is 8 and in case two the
mean is 15.4. We've almost doubled the mean by changing one score.
So the mean is extremely sensitive to extreme scores.
It may be a good thing or it may be a bad thing in a particular
conceptual context to have your measure of central tendency be sensitive
or insensitive to extreme scores. It just depends. What you should
know is that the median is not sensitive to extreme scores while
the mean is. They operate differently when there are extreme scores.
[PRESS "UP TO MENU" BUTTON until you get to the main central
tendency menu]
[PRESS "INCOME-$"]
Back
to Topic Locator Map
Family Income. Let's look at
family income in dollars. And I took numbers from the local newspaper
in about 1996. You can see a frequency distribution of household
income in dollars in the United States. On the graph, the vertical
axis (Frequency) shows the number of families which earn various
levels of income.
You can see the number of families who earn a given amount of money.
There is a very low number (frequency) of families who earn very
low incomes, near zero. Then the frequency increases up to a peak,
the most frequent score, at 22,000 dollars. So the mode is $22,000.
The median is at $33,000 and the mean is 45,000. So depending on
how you want to represent central tendency you can say the typical
household in the United States earns $22,000 or you could say the
typical household earns $33,000 or you could say the typical household
earns $45,000. Now the reason the mean and the median are so different
is because of extremely rich people, billionaires and multi-millionaires.
These extreme scores will pull up the mean household income. But
the median is just the point above and below which there are an
equal number of household incomes. It doesn't matter to the median
that there are some billionaires and millionaires out in the extreme
pulling the average up. A billionaire is just one more household
above the median. So the median is not sensitive to the extreme
scores of high wage earners while the mean is.
And so you might notice if you're reading data in the newspaper,
whether they are reporting the median or the mean. It makes a difference
in you interpretation whether someone's reporting the mean or the
median. The thing is people often say phrases like "average
income" without specifying whether they mean the mean, the
median, or the mode (although, the word "average" most
often refers to the mean). You need to know what measure of central
tendency people are using to make sense of what their data means.
And of course if you want to know what kind of family income the
most number of Americans experience, then you would want to know
the mode. Modal income in the United States was around $22,000 when
these data were collected. That's a very different impression of
central tendency than saying household income in the United States
is $45,000.
|