|
Interface
to Science Web Page
©Copyright 1997, 2000 Tom
Malloy
This is the text of the
in-class lecture which accompanied the Authorware visual graphics
on this topic. You may print this text out and use it as a textbook.
Or you may read it online. In either case it is coordinated with
the online Authorware teaching program.
Topic
Locator Map

This
map allows you to--
-
Jump directly to a topic which interests you.
-
Co-ordinate the dynamic visual Authorware presentations with the
corresponding text available on this web page.
1.
To find a topic which interests you: Look at the map of menus
above. Choose a menu that interests you. Notice that the menu buttons
have topics printed on them. Click on any button (topic) on the
menu; you will jump directly to the text that corresponds to the
topic printed on the button.
2.
To coordinate this web page with Authorware presentations: The
corresponding Authorware program should already be open. Go to the
menu of your choice in the Authorware program and click any
button which interests you. Then on the topic locator map above
click on the same button on the same menu; you will
jump to the text that corresponds to the Authorware presentation.
End
of Topic Locator Map
In this lecture we are
going to introduce the relationship between science and statistics.
We will show how scientific research generates the numbers which
will then be used in statistical analysis. This means we will have
to discuss important scientific ideas that are often the focus of
research methods courses. We will also introduce scientific jargon
which we will use throughout the remainder of the course.


Back to Menu Locator
Map
In research anything
which varies is called a variable. As you can see on the list, psychologists
might be interested in memory, aggression, happiness, and mental
health, among many other variables. An economist might be interested
in profit, a physiologist in heart rate, an engineer in highway
safety. Using memory as an example, memory varies within a person
depending on health, age, and the type of material being remembered.
It also varies from one person to another. So memory is a variable.


Back to Menu Locator
Map
Abstractions versus
Operations. When we talk about variables conversationally we
do so abstractly. For example if we say "Highway safety improved
recently," we have not defined what we mean by "highway
safety." Highway safety is left as an abstraction, such as
truth, justice, and the American way. Without detailed definitions,
highway safety could mean almost anything. Moreover, what do we
mean by improve and by recent?

An operational definition
of a variable is a set of detailed and public actions which specify
a variable. In the current graphic, the abstract variable "highway
safety" is given two different operational definitions so that
the statement "Highway safety improved recently" begins
to take on specific scientific meaning.
Operational Definition
A. In operational definition A, highway safety is defined as
the total number of crashes in a calendar year as reported in the
Utah Crash Summary (1995) as prepared by Utah Department of Public
Safety. Notice that the highway safety variable is no longer an
abstraction; it is highly specific. What is more important than
the specificity is that we now have given a set of procedures (or
operations) which any other scientist can repeat to verify and to
replicate our findings. Scientists simply have to contact the Utah
Department of Public Safety and get a copy of the crash report,
look at the table for total crashes in a year and find the data
for those years they are interested in. That's what we mean by an
operational definition.
So our operational definition
has done more than be specific. It has defined a procedure whereby
anyone else can also get data.
Operational Definition
B. In the second operational definition we have defined highway
safety as "the percent of alcohol-related deaths as recorded
in the Utah Crash Summary." Notice that this is a very different
operational definition of highway safety, one that focuses on the
issue of driving under the influence of alcohol.
So the abstraction "highway
safety" can mean as many things as there are operations for
defining it. It is essential for scientists to know the operational
definitions of abstractions in order to evaluate the quality of
research and to be able to reproduce the research.
To complete our operational
definition of the statement "Highway safety improved recently"
we can operationalize "improve" and "recent."
We do this this on the next series of graphics.



So the statement "The
total number of crashes in Utah dropped from 59,272 in 1994 to 57,644
in 1995" operationalizes the abstract statement "Highway
safety improved recently."
Summary. We have
introduced the distinction between speaking of variables abstractly
and speaking of them operationally. In normal conversation we typically
speak abstractly. But variables like memory, highway safety, and
aggression must be specified by operational definitions to allow
the evaluation and replication of research.
Let's go on to the next
topic.

Now we're going to move
on to scientific hypotheses, independent variables and
dependent variables because these will be important when
we talk about statistics.
Scientific hypotheses.
A scientific hypothesis simply proposes a relationship among or
between variables. We will be interested in three general kinds
of relationships among variables: Causal, Predictive, and Correlational.
Back to Menu Locator
Map

Causal Scientific
Hypotheses. The first general kind of scientific hypothesis
which interests us is one in which one variable (Independent Variable)
is proposed to cause changes in a second variable (Dependent Variable).
We often shorten Independent Variable to IV and Dependent Variable
to DV. So the general form of the causal scientific hypothesis is
that the IV causes changes in the DV. [In terms of the discussion
of operational definitions, these DV's are our measurement operations.]
Example. As a
simple example, suppose that a scientist proposes that a pill she
has recently developed will decrease blood pressure. The independent
variable (IV) is the pill (or lack of it). The dependent variable
(DV) is blood pressure. The pill is hypothesized to cause decreases
in blood pressure. The independent variable is defined as the causal
agent, in this case the pill. The dependent variable is the variable
affected the causal agent; in this case the DV is blood pressure.
A second perspective
on IV and DV. Suppose that she runs a simple study. She randomly
assigns volunteers to two groups. One group gets the pill. The other
group gets a placebo (pill without the active ingredient). A second
way to define the IV is that it's the thing that the experimenter
actively manipulates in the experiment. In this case she actively
manipulates for each volunteer whether he or she gets the real pill
or the placebo. Notice that the IV does indeed vary: receiving pill
or placebo varies from volunteer to volunteer. A second way to define
the DV is that it what we measure about the participants in the
research. In this example what we measure is blood pressure.

Predictive Scientific
Hypotheses. The second general category of scientific hypotheses
important to us in statistics are predictions. For example, we can
predict the heights of fathers if we know the heights of sons. Or
we can predict the number of health problems people have later in
life from the amount of smoking they do in early life.
One variable is called
the predictor variable, the other is called the criterion variable.
Sometimes dependent variable is used as a synonym for criterion
variable. In one of our examples the predictor variable is the height
of sons and the criterion (or dependent) variable is the height
of fathers. In our other example, the predictor is amount of of
smoking in early life and the criterion (or dependent) variable
is number of health problems in later life.
Prediction may or may
not imply causality. Surely no one thinks that the heights of sons
cause the heights of their fathers. But you can predict from the
heights of sons to the heights of fathers. On the other hand, smoking
has been identified as a causal agent in many health problems. So
the ability to predict something may or may not involve causality.

Correlational Scientific
Hypotheses. We can have scientific hypotheses which simply propose
that there will be some relationship between two variables. These
are correlational hypotheses. For example, we might propose that
height is associated with weight. We don't propose that one causes
the other nor do we make a prediction from one from to the other.
In this case we might call both variables dependent variables.
Summary. So we
have three degrees of scientific hypotheses which will be important
in choosing statistics later in the course. Right now we have just
introduced the ideas and some jargon that goes with them. 1) Causal
hypothesis: the independent variable causes changes in the dependent
variable 2) Predictive hypotheses: one variable (the predictor)
can be used to predict the other variable (criterion or dependent
variable). 3) Correlational hypotheses: There is an association
between two variables. We will get more explicit and detailed about
these types of hypotheses as we go along in the course.

We will now discuss how
the numbers (data) which we analyze in statistics come from dependent
variables in a research project.

Back to Menu Locator
Map
Dependent Variables
generate data.
Dependent Variables are measurement operations. When we measure
something we get one or more numbers. For example, when we measure
someone's blood pressure we get two numbers (systolic pressure and
diastolic pressure). The systolic pressure is the pressure of blood
in the vessels as the heart beats. The diastolic pressure is the
pressure of the blood between heartbeats. The numbers are usually
written like a fraction with the systolic above or to the left.
A normal blood pressure is around 120/80 mm Hg (millimeters of mercury).
Both numbers count. Your blood pressure is high if the systolic
pressure is 140 or above, or the diastolic pressure is 90 or above,
or both are high.

The Utah Department of
Public Safety has specific operations for measuring the number or
crashes on Utah highways. These numbers are often used as dependent
variables. For example we can look at the effect of an IV (intersections
with stoplights versus intersections with stop signs) to see if
stop lights cause a reduction in the number of crashes (DV).

Theoretical Construct
Validity. Scientists build theories around ideas. In the jargon
of science these ideas are called constructs. Construct validity
refers to how well the ideas or constructs are thought out and designed.
One part of construct validity refers to quality of the operational
definitions of variables.

Back to Menu Locator
Map
Sometimes our operational
definitions, in fact, are poor.

Example of a poor
operational definition. Let's say we operationally define memory,
which of course is an abstract variable. Memory is of interest to
people who study psychology. Let's operationally define memory as
the answer to the question "How many books have you read in
the last month?" Try this measure of memory on yourself right
now. You have ten seconds to count all of the books you read in
the last 30 days. So please do that.
In class I measure about
10 seconds for students. If you are working online, I assume you
gave yourself about 10 seconds. You should have some number of in
mind. Suppose that I'm going to measure how good memory is by this
operational definition. I will say that a person with a higher number
has a better memory than a person with a lower number.

That's a poor operational
definition of memory because it confounds your reading habits
with your memory. One person may read lots of books and another
person very few books. In reality they both may have equal memories.
But our operational definition confuses the actual number of books
a person reads with their memory for the last 30 days. So the person
who reads more books gets a higher score and is said to have a higher
memory.
"Confound"
is jargon used in science. It simply means open to multiple interpretations.
An operational definition that can be interpreted to mean two different
ideas is said to confound those ideas. Our operational definition
of memory confounded memory with reading habits.

So to wrap this topic
up, DV's are measurement operations that generate the numbers we
run through statistical formulas. But the statistical formulas don't
know if the DV measurement operations are high quality or very poor.
The statistics just process numbers. They can't think. They can't
know if the numbers came from an elegant operation or a stupid operation.
It is the consumer of research results who must be careful to notice
what the operational definitions of the DV's are. Then the consumer
must think critically and decide carefully if they are sensible
operations or not.
Highway Safety Again.
Highway safety is an abstract variable. Let's say our operational
definition, which is fairly similar to the one we used previously,
is the total number of fatalities in one year on Utah roads, as
measured and reported by the Utah Department of Public Safety in
their annual Utah Crash Summary. The question is "Is this a
good or poor operational definition of highway safety?" It
has a certain appeal to it, but if we use it we will find that in
1943 there was a total of 103 fatalities and in 1995 there was a
total of 325. Are our roads three times as dangerous in 1995 as
in 1943?
Total fatalities confounds
highway safety with the number of vehicle miles. There weren't very
many vehicles, comparatively speaking, on our highways in 1943.
We could tune up our operational definition of highway safety and
say highway safety is the death rate per 100 million vehicle miles.
This operational definition is, in fact, the one which is typically
used by people who study highway safety. So in 1943 the death rate
per 100 million miles was 7.4 and in 1995 it was 1.74. By this operational
definition, the highways are safer right now than they were in 1943.
By the previous operational definition, the highways are more dangerous
now than they were in 1943. So how you operationally define your
variables can determine the results you get and the conclusion you
make.
Rates. People
are inclined to use rates (like fatalities per 100 million miles)
in their operational definitions rather than totals. A rate (so
much of X per so many units of Y) adjusts for size of population,
amount of use and other important considerations.
An important point is
that whatever conclusions you draw, they could be reversed if you
used a different operational definition. One operational definition
of highway safety says 1943 is safer and the other says that 1995
is safer. Consequently, scientist think and argue a lot about the
operational definition of variables. So operationalizing variables
is very important.
This discussion isn't
technically part of statistics, but I want you to realize that statistics
as a whole just runs numbers through standard formulas. We'll be
doing a lot of that. But what those numbers are and what they mean
is something else. The evaluation of the measurement operations
is the responsibility first of the scientist doing the research
and second of the person reading or otherwise consuming that research.
It's your responsibility
as an educated consumer of information to challenge the measurement
operations and not just believe the statistical results. That's
fairly hard to do because newspaper articles and TV news stories
seldom report enough about the research to make an intelligent judgment
about the operational definitions. At least you can realize that
the numbers and conclusions came from some specific set of measurement
procedures (operations) and that those operations may be good or
poor.


Back
to Menu Locator Map
We are connecting scientific
activities to statistical activities. This gives us a big picture
overview of how all the statistics we will be learning are conceptually
connected to both science and nature. Our next step in this big
picture is to define what we mean by abduction and random
variables.

Back to Menu Locator
Map
Abduction. As
you can see by the graphic, knowledge about the structure and function
of computers has been applied to human mental processes. Scientists
have taken knowledge from computer science and applied it to psychology.
The resulting paradigm is called human information processing. This
process of moving knowledge structures from one area or discipline
to another is called abduction. When abduction is formal it is called
modeling.

Along the bottom of the
next graphic, you can see arrows indicating two-way transactions
between processes in nature, processes in science and statistical
models.
In general, scientists
take the infinite variety of natural processes and model them with
scientific ideas. Then scientific ideas are themselves modeled by
ideas from probability and statistics. It is worthwhile to notice
two steps to this general abductive process. 1) Concepts of probability
and statistics are applied to scientific concepts. 2) Scientific
concepts such as measurement operations are applied to nature. We
move back and forth from processes of nature through scientific
processes to statistical models.
Scientific Processes:
In the column for scientific processes you see listed, as examples,
scientific hypotheses and measurement operations (DVs). We've talked
about both of those in this lecture. Under scientific hypotheses
is listed the plausible competing hypothesis (PCH) of chance. We've
not talked about that yet but we will make a great deal of it later
when we talk about hypothesis testing. So for the moment we are
only foreshadowing how ideas will be organized later in the class.
Statistical Models.
In the column for statistical models you see statistical hypotheses
(Ho and H1) and Random Variables. We will talk about statistical
hypotheses later in the course. For now we want to talk about the
idea of random variables and how they are used to model measurement
operations (DVs).

Next we will look at
two examples of abduction. We will examine how measurement operations
are modeled by probability distributions in the case of the roll
of a die and in the case of the normal probability distribution.


Back to Menu Locator
Map
The roll of a single
die. Suppose we roll a die and wait until it comes to rest.
As simple as it is, the roll of a die is a complex process and we
could focus on an infinite number of different types of measurement
operations about it. We could count the number of times it bounces,
we could describe the rhythm of sounds it makes, we could measure
the amount of time it takes to come to rest, and with sufficiently
sophisticated instruments we could even count the number of molecules
which break off it as it bounces along. The number of things we
could measure is only limited by our imagination.
The actual measurement
operation which is commonly used in our culture for the roll of
a die is to count the number of dots facing upward when the die
comes to rest. But it should be clear that that measurement this
operation is not unique; it's just the one which we use in this
culture when we play with die.
First: Nature to Science.
The main point I want to make here is that DV operations reduce
some kind of infinite process in nature to a single quantitative
aspect. As we've discussed earlier, the DV operations may or may
not make good sense. When we are playing a board game, counting
the number of dots facing up when the die comes to rest makes good
sense.

Second: Science to
Probability. Once we've got our scientific measurement operations
the next thing we do is model those operations with some kind of
probability distribution. This probability distribution is what
is often called a random variable.
In the Basic Probability
Lecture, we have already discussed how we think about the roll of
a die. Since a die is a cube it has six sides. So the number of
dots can vary from 1 through 6. So one thing that should be clear
is that the result of a die roll is a variable. With a fair die,
we assume that all six sides of the cube have an equal chance of
facing up. So there is a 1/6 chance the result will be a one. There
is also a 1/6 chance the result will be a two, and so forth. The
results of a die roll vary from 1 to 6 and the probability of each
result is 1/6.
What we have just done
is model our DV operation (counting the number of dots) in terms
of probability.
Probability Distribution.
The Authorware visual screen also shows that we can summarize all
this probability discussion with a graph. The graph shows along
its horizontal axis that the number of dots varies from 1 to 6.
The vertical axis is probability. The bars above the values of the
die roll (1 through 6) show that the probability of each of them
is the same: 1/6. This graph is called a probability distribution
because it shows how the probability is distributed over the values
of the variable. Later in the class we will sometimes call a probability
distribution a population.
Random Variables.
The idea of random variables is not really different from what we
just said; it is another way of thinking and speaking about probability
distributions. It is also more jargon. When we model a variable
with a probability distribution we call the result a random variable.
By way of contrast, we
can speak of deterministic variables. In an algebraic equation (say
Y = 2X), when I give you a specific value of X you can know a specific
value of Y. In our example, if X = 5, then Y must be equal to 10.
The result is determined.
A random variable is
a variable whose values occur probabilistically. When you roll a
die you don't' know what value it will give you. You can only make
probability statements about the value. You can say the probability
of a 3 is 1/6 or that the probability of an even number is 1/2.
The value that the random variable takes on is not determined. It
is probabilistic.
Synonyms. For
our purposes in this course we will use the terms random variable,
probability distribution, and population interchangeably.

Overview. We have
broken the abductive process down into two steps. First we use our
Dependent Variable measurement operations (counting the number of
dots) to reduce an infinite process in nature (a die roll) to a
single value. Second we model the number of dots as a probability
distribution. We call this a random variable.
Abduction.
In studying the universe, one thing scientists do is measure things;
they reduce them to numbers using dependent variable measurement
operations. This reduction of what we are interested in to numbers
is what leads to the use of statistics. In statistics we model the
numbers we get as random variables or, in different words, as probability
distributions. These probability distributions we will eventually
call populations. This process of moving sideways across ideas and
models from nature to measurement to probability distribution is
called abduction.
By way of contrast with
abduction, induction is to infer upward from information
to a higher order principle. Deduction is to infer downward
from a principle to lower order consequences. Abduction is neither
upward nor downward. It is the sideways movement from one way of
conceptualizing to another way of conceptualizing on the same level.
For statistics we take the scientific idea of measurement and model
it as a probability distribution.

Now we will go on and
begin to think of how processes in nature might be measured and
modeled by an important probability distribution called the normal
distribution. This will just be a brief introduction. Later we will
devote a whole lecture to the normal distribution and it will be
used throughout the course.
Back to Menu Locator
Map

Nature to numbers.
Let's say we have a complex phenomenon in nature such as a person.
Let's also say that we have some kind of DV measurement operation
called an IQ test. We take a particular person and put that person
through the measurement operations. That is, we measure the person
with the IQ test. The measurement operations give us a number, called
an IQ. Say that in our example the person's IQ is equal to 103.
Numbers to probabilities.
It is very common in statistics to assume that our DV's can be modeled
as a particular probability distribution called the normal distribution.
We will learn a great deal about this distribution later.

Very brief introduction
to the Normal Distribution. The normal distribution is bell-shaped,
in fact it is often known as the bell curve. The current graphic
shows the general shape of a normal distribution. As we said, the
normal distribution is a very common way to model dependent variable
measurement operations.

We have given a simple
and clear picture of moving from infinite processes in nature to
numbers by means of dependent variable measurement operations and
then modeling these numbers in terms of probability distributions
like the normal distribution.
Caveat: There
are all kinds of explicit and implicit assumptions underlying the
move from processes in nature to numbers to probability distributions.
For thinking about a particular research question which you might
have, some of these assumptions may be sensible and others may not
be. Statistical procedures simply use probability theory to process
numbers. It's up to the person using statistics to examine and to
think critically about the assumptions leading up statistical analysis.
Back to Menu Locator
Map
|