Models of Distributions

A model of a frequency distribution is an algebraic expression describing the relative frequency (height of the curve) for every possible score. The questions that sometimes come to the mind of the student is "What is the advantage of this level of abstraction? Why is all this necessary?" The answers to these questions may be found in the following.

For example, suppose that the distribution of shoe sizes collected from a sample of fifteen individuals resulted in the following relative frequency polygon.

Because there are no individuals in the sample who wear size eight shoes, does that mean that the store owner should not stock the shelves with any size eight shoes? If a different sample was taken, would an individual who wore a size eight likely be included? Because it can reasonably be assumed that the reason no size eights were found in the sample was because of chance or sampling error, some method of ordering shoes other than directly from the sample distribution must be used.

In order to better deal with random fluctuations when collecting information from a sample, the statistician has the option of creating a model of the sample frequency distribution. This model is called by different names, including, probability model, theoretical probability distribution, probability density function (pdf), or simply population. A probability model attempts to capture the essential structure of the real world by asking what the world might look like if an infinite number of scores were obtained and each score was measured infinitely precisely. Nothing in the real world is exactly distributed as any given probability model. However, a probability model often describes the world well enough to be useful in making decisions.

If this were the case the proportion (.12) or percentage (12%) of size eight shoes could be computed by finding the relative area between the real limits for a size eight shoe (7.75 to 8.25). The relative area between scores on any probability model is called probability. In this case, the probability of a randomly selected woman wearing a size eight shoe would be .12. The concept of area under a curve will be covered in more detail in a later chapter.

The probability model attempts to capture the essential structure of the real world by asking what the world might look like if an infinite number of scores were obtained and each score was measured infinitely precisely. Nothing in the real world is exactly distributed as a probability model. However, a probability model often describes the world well enough to be useful in making decisions.

Q8.1

Probability models are used in statistics
in order to simplify the world.
because frequency polygons do not adequately describe the sample.
because they are so complex that no one understands them and make it possible to lie with statistics.
to bias results against stigmatized individuals.

Q8.2

Suppose a researcher collected information about the shoe sizes of everyone in a class of thirty students and drew a relative frequency polygon. Suppose the researcher repeated the data collection in a different class of thirty students, with the same distribution of males and females. The second relative frequency polygon
would be similar, but not identical, to the first distribution.
would be identical to the first.
would be closer to the "true" distribution of shoe sizes.
would bear no relationship to the first distribution.

Q8.3

A probability model
attempts to capture the essential structure of the real world.
exactly describes many real world phenomena.
seldom is useful in making decisions about the real world.
is based on a finite number of observations.

Variations of Probability Models

The statistician has at his or her disposal a number of probability models to describe the world. Different models are selected for practical or theoretical reasons. Some example of probability models follow.

The Uniform or Rectangular Distribution

The uniform distribution is shaped like a rectangle, where each score is equally likely. An example is presented below.

If the uniform distribution was used to model shoe size, it would mean that between the two extremes, each shoe size would be equally likely. If the store owner was ordering shoes, it would mean that an equal number of each shoe size would be ordered. In most cases this would be a very poor model of the real world, because at the end of the year a large number of large or small shoe sizes would remain on the shelves and the middle sizes would be sold out.

The uniform distribution is a useful model when the phenomena being modeled is relatively stable over a range of values. For example, the relative frequency of births on any day of the year in United States is relatively constant. In this case a uniform distribution might be an adequate, but not perfect, model

Q8.4

A uniform distribution might be a reasonable model of
month of birth.
shoe sizes.
scores on a test.
number of computers in the USA, from 1950 to present.

The Negative Exponential Distribution

The negative exponential distribution is often used to model real world events which are relatively rare, such as the occurrence of earthquakes. The negative exponential distribution would be a good model of the relative frequency of lottery winnings. An overly optimistic distribution is presented below::

Q8.5

A negative exponential distribution might be a reasonable model of
number of heart attacks per year.
shoe sizes.
cost of textbooks.
distribution of income in the United States.

The Triangular Distribution

Not really a standard distribution, a triangular distribution could be created as follows:

It may be useful for describing some real world phenomena, but exactly what that would be is not known for sure. The statistician has the option of creating a distribution for a particular situation if mathematical equations can be found to describe the model. The statistician is not limited to only distributions that are widely used or that others have already discovered.

The Normal Distribution or Normal Curve

The normal curve is one of a large number of possible distributions. It is very important in the social sciences and will be described in detail in the next chapter. An example of a normal curve was presented earlier as a model of shoe size.

Q8.6

A normal distribution might be a reasonable model of
shoe sizes.
birthdays per month.
percentages of "true" answers on a true/false text.
.

Properties of Probability Distributions

As described earlier, the statistician has the option of creating his or her own probability models. These models must be created using certain mathematical rules. These rules provide the properties of probability distributions.

The models that have been discussed up to this point assume continuous measurement. That is, every score on the continuum of scores is possible, or there are an infinite number of scores. In this case, no single score can have a relative frequency because if it did, the total area would necessarily be greater than one. For that reason probability is defined over a range of scores rather than a single score. Thus a shoe size of 8.00 would not have a specific probability associated with it, although the interval of shoe sizes between 7.75 and 8.25 would.

Q8.7

The area under a probability model between two points is called
probability.
frequency.
algebraic anomaly.
modality area.

Q8.8

Properties of probability models include
the area under the curve is one.
the ability to be adequately described in words.
the use of the summation notation in its description.
bilateral symmetry.