Interface to Science Web Page

Click above to start an Interactive Visual Presentation (Plugin Required)
Click here to go to our plugin download and plugin tutorial page


©Copyright 1997, 2000 Tom Malloy

This is the text of the in-class lecture which accompanied the Authorware visual graphics on this topic. You may print this text out and use it as a textbook. Or you may read it online. In either case it is coordinated with the online Authorware teaching program.


Topic Locator Map

Go To Roll DieGo To Normal

This map allows you to--

  1. Jump directly to a topic which interests you.
  2. Co-ordinate the dynamic visual Authorware presentations with the corresponding text available on this web page.

1. To find a topic which interests you: Look at the map of menus above. Choose a menu that interests you. Notice that the menu buttons have topics printed on them. Click on any button (topic) on the menu; you will jump directly to the text that corresponds to the topic printed on the button.

2. To coordinate this web page with Authorware presentations: The corresponding Authorware program should already be open. Go to the menu of your choice in the Authorware program and click any button which interests you. Then on the topic locator map above click on the same button on the same menu; you will jump to the text that corresponds to the Authorware presentation.

End of Topic Locator Map


In this lecture we are going to introduce the relationship between science and statistics. We will show how scientific research generates the numbers which will then be used in statistical analysis. This means we will have to discuss important scientific ideas that are often the focus of research methods courses. We will also introduce scientific jargon which we will use throughout the remainder of the course.

Back to Menu Locator Map

In research anything which varies is called a variable. As you can see on the list, psychologists might be interested in memory, aggression, happiness, and mental health, among many other variables. An economist might be interested in profit, a physiologist in heart rate, an engineer in highway safety. Using memory as an example, memory varies within a person depending on health, age, and the type of material being remembered. It also varies from one person to another. So memory is a variable.


Back to Menu Locator Map

Abstractions versus Operations. When we talk about variables conversationally we do so abstractly. For example if we say "Highway safety improved recently," we have not defined what we mean by "highway safety." Highway safety is left as an abstraction, such as truth, justice, and the American way. Without detailed definitions, highway safety could mean almost anything. Moreover, what do we mean by improve and by recent?

An operational definition of a variable is a set of detailed and public actions which specify a variable. In the current graphic, the abstract variable "highway safety" is given two different operational definitions so that the statement "Highway safety improved recently" begins to take on specific scientific meaning.

Operational Definition A. In operational definition A, highway safety is defined as the total number of crashes in a calendar year as reported in the Utah Crash Summary (1995) as prepared by Utah Department of Public Safety. Notice that the highway safety variable is no longer an abstraction; it is highly specific. What is more important than the specificity is that we now have given a set of procedures (or operations) which any other scientist can repeat to verify and to replicate our findings. Scientists simply have to contact the Utah Department of Public Safety and get a copy of the crash report, look at the table for total crashes in a year and find the data for those years they are interested in. That's what we mean by an operational definition.

So our operational definition has done more than be specific. It has defined a procedure whereby anyone else can also get data.

Operational Definition B. In the second operational definition we have defined highway safety as "the percent of alcohol-related deaths as recorded in the Utah Crash Summary." Notice that this is a very different operational definition of highway safety, one that focuses on the issue of driving under the influence of alcohol.

So the abstraction "highway safety" can mean as many things as there are operations for defining it. It is essential for scientists to know the operational definitions of abstractions in order to evaluate the quality of research and to be able to reproduce the research.

To complete our operational definition of the statement "Highway safety improved recently" we can operationalize "improve" and "recent." We do this this on the next series of graphics.

So the statement "The total number of crashes in Utah dropped from 59,272 in 1994 to 57,644 in 1995" operationalizes the abstract statement "Highway safety improved recently."

Summary. We have introduced the distinction between speaking of variables abstractly and speaking of them operationally. In normal conversation we typically speak abstractly. But variables like memory, highway safety, and aggression must be specified by operational definitions to allow the evaluation and replication of research.

Let's go on to the next topic.


Now we're going to move on to scientific hypotheses, independent variables and dependent variables because these will be important when we talk about statistics.

Scientific hypotheses. A scientific hypothesis simply proposes a relationship among or between variables. We will be interested in three general kinds of relationships among variables: Causal, Predictive, and Correlational.

Back to Menu Locator Map

Causal Scientific Hypotheses. The first general kind of scientific hypothesis which interests us is one in which one variable (Independent Variable) is proposed to cause changes in a second variable (Dependent Variable). We often shorten Independent Variable to IV and Dependent Variable to DV. So the general form of the causal scientific hypothesis is that the IV causes changes in the DV. [In terms of the discussion of operational definitions, these DV's are our measurement operations.]

Example. As a simple example, suppose that a scientist proposes that a pill she has recently developed will decrease blood pressure. The independent variable (IV) is the pill (or lack of it). The dependent variable (DV) is blood pressure. The pill is hypothesized to cause decreases in blood pressure. The independent variable is defined as the causal agent, in this case the pill. The dependent variable is the variable affected the causal agent; in this case the DV is blood pressure.

A second perspective on IV and DV. Suppose that she runs a simple study. She randomly assigns volunteers to two groups. One group gets the pill. The other group gets a placebo (pill without the active ingredient). A second way to define the IV is that it's the thing that the experimenter actively manipulates in the experiment. In this case she actively manipulates for each volunteer whether he or she gets the real pill or the placebo. Notice that the IV does indeed vary: receiving pill or placebo varies from volunteer to volunteer. A second way to define the DV is that it what we measure about the participants in the research. In this example what we measure is blood pressure.

Predictive Scientific Hypotheses. The second general category of scientific hypotheses important to us in statistics are predictions. For example, we can predict the heights of fathers if we know the heights of sons. Or we can predict the number of health problems people have later in life from the amount of smoking they do in early life.

One variable is called the predictor variable, the other is called the criterion variable. Sometimes dependent variable is used as a synonym for criterion variable. In one of our examples the predictor variable is the height of sons and the criterion (or dependent) variable is the height of fathers. In our other example, the predictor is amount of of smoking in early life and the criterion (or dependent) variable is number of health problems in later life.

Prediction may or may not imply causality. Surely no one thinks that the heights of sons cause the heights of their fathers. But you can predict from the heights of sons to the heights of fathers. On the other hand, smoking has been identified as a causal agent in many health problems. So the ability to predict something may or may not involve causality.

Correlational Scientific Hypotheses. We can have scientific hypotheses which simply propose that there will be some relationship between two variables. These are correlational hypotheses. For example, we might propose that height is associated with weight. We don't propose that one causes the other nor do we make a prediction from one from to the other. In this case we might call both variables dependent variables.

Summary. So we have three degrees of scientific hypotheses which will be important in choosing statistics later in the course. Right now we have just introduced the ideas and some jargon that goes with them. 1) Causal hypothesis: the independent variable causes changes in the dependent variable 2) Predictive hypotheses: one variable (the predictor) can be used to predict the other variable (criterion or dependent variable). 3) Correlational hypotheses: There is an association between two variables. We will get more explicit and detailed about these types of hypotheses as we go along in the course.


We will now discuss how the numbers (data) which we analyze in statistics come from dependent variables in a research project.

Back to Menu Locator Map

Dependent Variables generate data. Dependent Variables are measurement operations. When we measure something we get one or more numbers. For example, when we measure someone's blood pressure we get two numbers (systolic pressure and diastolic pressure). The systolic pressure is the pressure of blood in the vessels as the heart beats. The diastolic pressure is the pressure of the blood between heartbeats. The numbers are usually written like a fraction with the systolic above or to the left. A normal blood pressure is around 120/80 mm Hg (millimeters of mercury). Both numbers count. Your blood pressure is high if the systolic pressure is 140 or above, or the diastolic pressure is 90 or above, or both are high.

The Utah Department of Public Safety has specific operations for measuring the number or crashes on Utah highways. These numbers are often used as dependent variables. For example we can look at the effect of an IV (intersections with stoplights versus intersections with stop signs) to see if stop lights cause a reduction in the number of crashes (DV).


Theoretical Construct Validity. Scientists build theories around ideas. In the jargon of science these ideas are called constructs. Construct validity refers to how well the ideas or constructs are thought out and designed. One part of construct validity refers to quality of the operational definitions of variables.

Back to Menu Locator Map

Sometimes our operational definitions, in fact, are poor.

Example of a poor operational definition. Let's say we operationally define memory, which of course is an abstract variable. Memory is of interest to people who study psychology. Let's operationally define memory as the answer to the question "How many books have you read in the last month?" Try this measure of memory on yourself right now. You have ten seconds to count all of the books you read in the last 30 days. So please do that.

In class I measure about 10 seconds for students. If you are working online, I assume you gave yourself about 10 seconds. You should have some number of in mind. Suppose that I'm going to measure how good memory is by this operational definition. I will say that a person with a higher number has a better memory than a person with a lower number.

That's a poor operational definition of memory because it confounds your reading habits with your memory. One person may read lots of books and another person very few books. In reality they both may have equal memories. But our operational definition confuses the actual number of books a person reads with their memory for the last 30 days. So the person who reads more books gets a higher score and is said to have a higher memory.

"Confound" is jargon used in science. It simply means open to multiple interpretations. An operational definition that can be interpreted to mean two different ideas is said to confound those ideas. Our operational definition of memory confounded memory with reading habits.

So to wrap this topic up, DV's are measurement operations that generate the numbers we run through statistical formulas. But the statistical formulas don't know if the DV measurement operations are high quality or very poor. The statistics just process numbers. They can't think. They can't know if the numbers came from an elegant operation or a stupid operation. It is the consumer of research results who must be careful to notice what the operational definitions of the DV's are. Then the consumer must think critically and decide carefully if they are sensible operations or not.

Highway Safety Again. Highway safety is an abstract variable. Let's say our operational definition, which is fairly similar to the one we used previously, is the total number of fatalities in one year on Utah roads, as measured and reported by the Utah Department of Public Safety in their annual Utah Crash Summary. The question is "Is this a good or poor operational definition of highway safety?" It has a certain appeal to it, but if we use it we will find that in 1943 there was a total of 103 fatalities and in 1995 there was a total of 325. Are our roads three times as dangerous in 1995 as in 1943?

Total fatalities confounds highway safety with the number of vehicle miles. There weren't very many vehicles, comparatively speaking, on our highways in 1943. We could tune up our operational definition of highway safety and say highway safety is the death rate per 100 million vehicle miles. This operational definition is, in fact, the one which is typically used by people who study highway safety. So in 1943 the death rate per 100 million miles was 7.4 and in 1995 it was 1.74. By this operational definition, the highways are safer right now than they were in 1943. By the previous operational definition, the highways are more dangerous now than they were in 1943. So how you operationally define your variables can determine the results you get and the conclusion you make.

Rates. People are inclined to use rates (like fatalities per 100 million miles) in their operational definitions rather than totals. A rate (so much of X per so many units of Y) adjusts for size of population, amount of use and other important considerations.

An important point is that whatever conclusions you draw, they could be reversed if you used a different operational definition. One operational definition of highway safety says 1943 is safer and the other says that 1995 is safer. Consequently, scientist think and argue a lot about the operational definition of variables. So operationalizing variables is very important.

This discussion isn't technically part of statistics, but I want you to realize that statistics as a whole just runs numbers through standard formulas. We'll be doing a lot of that. But what those numbers are and what they mean is something else. The evaluation of the measurement operations is the responsibility first of the scientist doing the research and second of the person reading or otherwise consuming that research.

It's your responsibility as an educated consumer of information to challenge the measurement operations and not just believe the statistical results. That's fairly hard to do because newspaper articles and TV news stories seldom report enough about the research to make an intelligent judgment about the operational definitions. At least you can realize that the numbers and conclusions came from some specific set of measurement procedures (operations) and that those operations may be good or poor.


Back to Menu Locator Map

We are connecting scientific activities to statistical activities. This gives us a big picture overview of how all the statistics we will be learning are conceptually connected to both science and nature. Our next step in this big picture is to define what we mean by abduction and random variables.

Back to Menu Locator Map

Abduction. As you can see by the graphic, knowledge about the structure and function of computers has been applied to human mental processes. Scientists have taken knowledge from computer science and applied it to psychology. The resulting paradigm is called human information processing. This process of moving knowledge structures from one area or discipline to another is called abduction. When abduction is formal it is called modeling.

Along the bottom of the next graphic, you can see arrows indicating two-way transactions between processes in nature, processes in science and statistical models.

In general, scientists take the infinite variety of natural processes and model them with scientific ideas. Then scientific ideas are themselves modeled by ideas from probability and statistics. It is worthwhile to notice two steps to this general abductive process. 1) Concepts of probability and statistics are applied to scientific concepts. 2) Scientific concepts such as measurement operations are applied to nature. We move back and forth from processes of nature through scientific processes to statistical models.

Scientific Processes: In the column for scientific processes you see listed, as examples, scientific hypotheses and measurement operations (DVs). We've talked about both of those in this lecture. Under scientific hypotheses is listed the plausible competing hypothesis (PCH) of chance. We've not talked about that yet but we will make a great deal of it later when we talk about hypothesis testing. So for the moment we are only foreshadowing how ideas will be organized later in the class.

Statistical Models. In the column for statistical models you see statistical hypotheses (Ho and H1) and Random Variables. We will talk about statistical hypotheses later in the course. For now we want to talk about the idea of random variables and how they are used to model measurement operations (DVs).

Next we will look at two examples of abduction. We will examine how measurement operations are modeled by probability distributions in the case of the roll of a die and in the case of the normal probability distribution.


Back to Menu Locator Map

The roll of a single die. Suppose we roll a die and wait until it comes to rest. As simple as it is, the roll of a die is a complex process and we could focus on an infinite number of different types of measurement operations about it. We could count the number of times it bounces, we could describe the rhythm of sounds it makes, we could measure the amount of time it takes to come to rest, and with sufficiently sophisticated instruments we could even count the number of molecules which break off it as it bounces along. The number of things we could measure is only limited by our imagination.

The actual measurement operation which is commonly used in our culture for the roll of a die is to count the number of dots facing upward when the die comes to rest. But it should be clear that that measurement this operation is not unique; it's just the one which we use in this culture when we play with die.

First: Nature to Science. The main point I want to make here is that DV operations reduce some kind of infinite process in nature to a single quantitative aspect. As we've discussed earlier, the DV operations may or may not make good sense. When we are playing a board game, counting the number of dots facing up when the die comes to rest makes good sense.

Second: Science to Probability. Once we've got our scientific measurement operations the next thing we do is model those operations with some kind of probability distribution. This probability distribution is what is often called a random variable.

In the Basic Probability Lecture, we have already discussed how we think about the roll of a die. Since a die is a cube it has six sides. So the number of dots can vary from 1 through 6. So one thing that should be clear is that the result of a die roll is a variable. With a fair die, we assume that all six sides of the cube have an equal chance of facing up. So there is a 1/6 chance the result will be a one. There is also a 1/6 chance the result will be a two, and so forth. The results of a die roll vary from 1 to 6 and the probability of each result is 1/6.

What we have just done is model our DV operation (counting the number of dots) in terms of probability.

Probability Distribution. The Authorware visual screen also shows that we can summarize all this probability discussion with a graph. The graph shows along its horizontal axis that the number of dots varies from 1 to 6. The vertical axis is probability. The bars above the values of the die roll (1 through 6) show that the probability of each of them is the same: 1/6. This graph is called a probability distribution because it shows how the probability is distributed over the values of the variable. Later in the class we will sometimes call a probability distribution a population.

Random Variables. The idea of random variables is not really different from what we just said; it is another way of thinking and speaking about probability distributions. It is also more jargon. When we model a variable with a probability distribution we call the result a random variable.

By way of contrast, we can speak of deterministic variables. In an algebraic equation (say Y = 2X), when I give you a specific value of X you can know a specific value of Y. In our example, if X = 5, then Y must be equal to 10. The result is determined.

A random variable is a variable whose values occur probabilistically. When you roll a die you don't' know what value it will give you. You can only make probability statements about the value. You can say the probability of a 3 is 1/6 or that the probability of an even number is 1/2. The value that the random variable takes on is not determined. It is probabilistic.

Synonyms. For our purposes in this course we will use the terms random variable, probability distribution, and population interchangeably.

Overview. We have broken the abductive process down into two steps. First we use our Dependent Variable measurement operations (counting the number of dots) to reduce an infinite process in nature (a die roll) to a single value. Second we model the number of dots as a probability distribution. We call this a random variable.

Abduction. In studying the universe, one thing scientists do is measure things; they reduce them to numbers using dependent variable measurement operations. This reduction of what we are interested in to numbers is what leads to the use of statistics. In statistics we model the numbers we get as random variables or, in different words, as probability distributions. These probability distributions we will eventually call populations. This process of moving sideways across ideas and models from nature to measurement to probability distribution is called abduction.

By way of contrast with abduction, induction is to infer upward from information to a higher order principle. Deduction is to infer downward from a principle to lower order consequences. Abduction is neither upward nor downward. It is the sideways movement from one way of conceptualizing to another way of conceptualizing on the same level. For statistics we take the scientific idea of measurement and model it as a probability distribution.


Now we will go on and begin to think of how processes in nature might be measured and modeled by an important probability distribution called the normal distribution. This will just be a brief introduction. Later we will devote a whole lecture to the normal distribution and it will be used throughout the course.

Back to Menu Locator Map

Nature to numbers. Let's say we have a complex phenomenon in nature such as a person. Let's also say that we have some kind of DV measurement operation called an IQ test. We take a particular person and put that person through the measurement operations. That is, we measure the person with the IQ test. The measurement operations give us a number, called an IQ. Say that in our example the person's IQ is equal to 103.

Numbers to probabilities. It is very common in statistics to assume that our DV's can be modeled as a particular probability distribution called the normal distribution. We will learn a great deal about this distribution later.

Very brief introduction to the Normal Distribution. The normal distribution is bell-shaped, in fact it is often known as the bell curve. The current graphic shows the general shape of a normal distribution. As we said, the normal distribution is a very common way to model dependent variable measurement operations.

We have given a simple and clear picture of moving from infinite processes in nature to numbers by means of dependent variable measurement operations and then modeling these numbers in terms of probability distributions like the normal distribution.

Caveat: There are all kinds of explicit and implicit assumptions underlying the move from processes in nature to numbers to probability distributions. For thinking about a particular research question which you might have, some of these assumptions may be sensible and others may not be. Statistical procedures simply use probability theory to process numbers. It's up to the person using statistics to examine and to think critically about the assumptions leading up statistical analysis.

Back to Menu Locator Map

Go To Abduction Go To Theoretical Construct Validity Go To Data Comes from DVs Go To Scientific Hypotheses, IVs, DVs Go To Abstractions vs Operations Go To Variables