8 Distributions
8.1 Random Variables
Because we first learn about variables in an algebra class, we tend to think of variables as having values that can be solved for—if we have enough information about them. If I say that x is a variable and that x+6=8, we can use algebra to find that x must equal 2.
Random variables are not like algebraic variables. Random variables simply take on values because of some random process. If we say that the outcome of a throw of a six-sided die is a random variable, there is nothing to “solve for.” There is no equation that determines the value of the die. Instead, it is determined by chance and the physical constraints of the die. That is, the outcome must be one of the numbers printed on the die, and the six numbers are equally likely to occur. This illustrates an important point. The word random here does not mean “anything can happen.” On a six-sided die, you will never roll a 7, 3.5, \sqrt{\pi}, −36,000, or any other number that does not appear on the six sides of the die. Random variables have outcomes that are subject to random processes, but those random processes do have constraints on them such that some outcomes are more likely than others—and some outcomes never occur at all.
When we say that the throw of a six-sided die is a random variable, we are not talking about any particular throw of a particular die but, in a sense, every throw (that has ever happened or ever could happen) of every die (that has ever existed or could exist). Imagine an immense, roaring, neverending, cascading flow of dice falling from the sky. As each die lands and disappears, a giant scoreboard nearby records the relative frequencies of ones, twos, threes, fours, fives, and sixes. That’s a random variable.
8.2 Sets
A set refers to a collection of objects. Each distinct object in a set is an element.
8.2.1 Discrete Sets
To show that a list of discrete elements is a discrete set, we can use curly braces. For example, the set of positive single-digit even numbers is \{2, 4, 6, 8\}. With large sets with repeating patterns, it is convenient to use an ellipsis (“…”), the punctuation mark signifying an omission or pause. For example, rather than listing every two-digit positive even number, we can show the pattern like so:
\{10, 12, 14,\ldots, 98\}
If we want the pattern to repeat forever, we can set an ellipsis on the left, right, or both sides. The set of odd integers extends to infinity in both directions:
\{\ldots, -5, -3, -1, 1, 3, 5, \ldots\}
8.2.2 Interval Sets
With continuous variables, we can define sets in terms of intervals. Whereas the discrete set \{0,1\} refers just to the numbers 0 and 1, the interval set (0,1) refers to all the numbers between 0 and 1.
As shown in Figure 8.2, some intervals include their endpoints and others do not. Intervals noted with square brackets include their endpoints and intervals written with parentheses exclude them. Some intervals extend to positive or negative infinity: (-\infty,5] and (-8,+\infty). Use a parenthesis with infinity instead of a square bracket because infinity is not a specific number that can be included in an interval.
8.3 Sample Spaces
The set of all possible outcomes of a random variable is the sample space. Continuing with our example, the sample space of a single throw of a six-sided die is the set \{1,2,3,4,5,6\}. Sample space is a curious term. Why sample and why space? With random variables, populations are infinitely large, at least theoretically. Random variables just keep spitting out numbers forever! So any time we actually observe numbers generated by a random variable, we are always observing a sample; actual infinities cannot be observed in their entirety. A space is a set that has mathematical structure. Most random variables generate either integers or real numbers, both of which are structured in many ways (e.g., order).
Unlike distributions having to do with dice, many distributions have a sample space with an infinite number of elements. Interestingly, there are two kinds of infinity we can consider. A distribution’s sample space might be the set of whole numbers: \{0,1,2,...\}, which extends to positive infinity. The sample space of all integers extends to infinity in both directions: \{...-2,-1,0,1,2,...\}.
The sample space of continuous variables is infinitely large for another reason. Between any two points in a continuous distribution, there is an infinite number of other points. For example, in the beta distribution, the sample space consists of all real numbers between 0 and 1: (0,1). Many continuous distributions have sample spaces that involve both kinds of infinity. For example, the sample space of the normal distribution consists of all real numbers from negative infinity to positive infinity: (-\infty, +\infty).
8.4 Probability Distributions
Each element of a random variable’s sample space occurs with a particular probability. When we list the probabilities of each possible outcome, we have specified the variable’s probability distribution. In other words, if we know the probability distribution of a variable, we know how probable each outcome is. In the case of a throw of a single die, each outcome is equally likely (Figure 8.3).
There is an infinite variety of probability distributions, but a small subset of them have been given names. Now, one can manage one’s affairs quite well without ever knowing what a Bernoulli distribution is, or what a \chi{^2} distribution is, or even what a normal distribution is. However, sometimes life is a little easier if we have names for useful things that occur often. Most of the distributions with names are not really single distributions, but families of distributions. The various members of a family are unique but they are united by the fact that their probability distributions are generated by a particular mathematical function (more on that later). In such cases, the probability distribution is often represented by a graph in which the sample space is on the X-axis and the associated probabilities are on the Y-axis. In Figure 8.4, 16 probability distributions that might be interesting and useful to clinicians are illustrated. Keep in mind that what are pictured are only particular members of the families listed; some family members look quite different from what is shown in Figure 8.4.
8.5 Discrete Distrubitions
The sample spaces in discrete distributions are discrete sets. Thus, in the x-axis of the probability distributions, you will see isolated numbers with gaps between each number (e.g., integers).
8.5.1 Discrete Uniform Distributions
The throw of a single die is a member of a family of distributions called the discrete uniform distribution. It is “discrete” because the elements in the sample space are countable, with evenly spaced gaps between them. For example, there might be a sequence of 8, 9, 10, and 11 in the sample space, but there are no numbers in between. It is “uniform” because all outcomes are equally likely. With dice, the numbers range from a lower bound of 1 to an upper bound of 6. In the family of discrete uniform distributions, the lower and upper bounds are typically integers, mostly likely starting with 1. However, any real number a can be the lower bound and the spacing k between numbers can be any positive real number. For the sake of simplicity and convenience, I will assume that the discrete uniform distribution refers to consecutive integers ranging from a lower bound of a and an upper bound of b.
This kind of discrete uniform distribution has a number of characteristics listed in Table 8.1. I will explain each of them in the sections that follow. As we go, I will also explain the mathematical notation. For example, a \in \{\ldots,-1,0,1,\ldots\} means that a is an integer because \in means is a member of and \{\ldots,-1,0,1,\ldots\} is the set of all integers.1 x \in \{a,a+1,\ldots,b\} means that the each member of the sample space x is a member of the set of integers that include a, b, and all the integers between a and b. The notation for the probability mass function and the cumuluative distribution function function will be explained later in this chapter.
1 Notation note: Sometimes the set of all integers is referred to with the symbol \mathbb{Z}.
Feature | Symbol |
---|---|
Lower Bound | a \in \{\ldots,-1,0,1,\ldots\} |
Upper Bound | b \in \{a + 1, a + 2, \ldots\} |
Sample Space | x \in\{a, a + 1,\ldots,b\} |
Number of points | n=b-a+1 |
Mean | \mu=\frac{a+b}{2} |
Variance | \sigma^2=\frac{n^2-1}{12} |
Skewness | \gamma_1=0 |
Kurtosis | \gamma_2=-\frac{6(n^2+1)}{5(n^2-1)} |
Probability Mass Function | f_X(x;a,b)=\frac{1}{n} |
Cumulative Distribution Function | F_X(x;a,b)=\frac{x-a+1}{n} |
8.5.2 Parameters of Random Variables
The lower bound a and the upper bound b are the discrete uniform distribution’s parameters. The word parameter has many meanings, but here it refers to a characteristic of a distribution family that helps us identify precisely which member of the family we are talking about. Most distribution families have one, two, or three parameters.
If you have taken an algebra class, you have seen parameters before, though the word parameter many not have been used. Think about the formula of a line:
y=mx+b
Both x and y are variables, but what are m and b? Well, you probably remember that m is the slope of the line and that b is the y-intercept. If we know the slope and the intercept of a line, we know exactly which line we are talking about. No additional information is needed to graph the line. Therefore, m and b are the line’s parameters, because they uniquely identify the line.2 All lines have a lot in common but there is an infinite variety of lines because the parameters, the slope and the intercept, can take on the value of any real number. Each unique combination of parameter values (slope and intercept) will produce a unique line. So it is with probability distribution families. All family members are alike in many ways but they also differ because of different parameter values.
2 What about other mathematical functions? Do they have parameters? Yes! Most do! For example, in the equation for a parabola (y=ax^2+bx+c), a, b, and c determine its precise shape.
3 If we allow the lower bound to be any real number and the spacing to be any positive real number, the discrete uniform distribution can be specified by three parameters: the lower bound a, the spacing between numbers k (k>0), and the number of points n (n>1). The upper bound b of such a distribution would be b=a+k(n-1)
The discrete uniform distribution (i.e., the typical variety consisting of consecutive integers) is defined by the lower and upper bound. Once we know the lower bound and the upper bound, we know exactly which distribution we are talking about.3 Not all distributions are defined by their lower and upper bounds. Indeed, many distribution families are unbounded on one or both sides. Therefore, other features are used to characterize the distributions, such as the population mean (\mu).
8.5.3 Probability Mass Functions
Many distribution families are united by the fact that their probability distributions are generated by a particular mathematical function. For discrete distributions, those functions are called probability mass functions. In general, a mathematical function is an expression that takes one or more constants (i.e., parameters) and one or more input variables, which are then transformed according to some sort of rule to yield a single number.
A probability mass function transforms a random variable’s sample space elements into probabilities. In Figure 8.3, the probability mass function can be thought of as the arrows between the sample space and the probabilities. That is, the probability mass function is the thing that was done to the sample space elements to calculate the probabilities. In Figure 8.3, each outcome of a throw of the the die was mapped onto a probability of ⅙. Why ⅙, and not some other number? The probability mass function of the discrete uniform distribution tells us the answer.
The probability mass function of the discrete uniform distribution is fairly simple but the notation can be intimidating at first (Figure 8.5). By convention, a single random variable is denoted by a capital letter X. Any particular value of X in its sample space is represented by a lowercase x. In other words, X represents the variable in its totality whereas x is merely one value that X can take on. Confusing? Yes, statisticians work very hard to confuse us—and most of the time they succeed!
The probability mass function of random variable X is denoted by f_X(x). This looks strange at first. It means, “When random variable X generates a number, what is the probability that the outcome will be a particular value x?” That is, f_X(x)=P(X=x), where P means “What is the probability that…?” Thus, P(X=x) reads, “What is the probability that random variable X will generate a number equal to a particular value x?” So, f_X(7) reads, “When random variable X generates a number, what is the probability that the outcome will be 7?”
Most probability mass functions also have parameters, which are listed after a semi-colon. In the case of the discrete uniform distribution consisting of consecutive integers, the lower and upper bounds a and b are included in the function’s notation like so: f_X(x;a,b). This reads, “For random variable X with parameters a and b, what is the probability that the outcome will be x?” Some parameters can be derived from other parameters, as was the case with the number of points n in the sample space of a discrete uniform distribution: n=b-a+1. The probability for each outcome in the sample space is the same and there are n possible outcomes. Therefore, the probability associated with each outcome is \frac{1}{n}.
Putting all of this together, if a and b are integers and a<b, for all n integers x between a and b, inclusive:
\begin{aligned} f_X\left(x;a,b\right)&=\frac{1}{b-a+1}\\[2ex] &=\frac{1}{n} \end{aligned}
Symbol | Meaning |
---|---|
X | A random variable with a discrete uniform distribution |
f_X | The probability mass function of X |
x | Any particular member of the sample space of X |
a | The lower bound of the sample space |
b | The upper bound of the sample space |
n | b-a+1 (The number of points in the sample space) |
You might notice that x is not needed to calculate the probability. Why? Because this is a uniform distribution. No matter which sample space element x we are talking about, the probability associated with it is always the same. In distributions that are not uniform, the position of x matters and thus influences the probability of its occurrence.
8.5.4 Cumulative Distribution Functions
The cumulative distribution function tells us where a sample space element ranks in a distribution. Whereas the probability mass function tells us the probability that a random variable will generate a particular number, the cumulative distribution function tells us the probability that a random variable will generate a particular number or less.
F_X(x) = P(X \le x)=p
The cumulative distribution function of the roll of a die (Figure 8.6) tells us that the probability of rolling at least a 4 is 4⁄6 (i.e., ⅔).
The cumulative distribution function is often distinguished from the probability mass function with a capital F instead of a lowercase f. In the case of a discrete uniform distribution consisting of n consecutive integers from a to b, the cumulative distribution function is:
\begin{align*} F_X(x;a,b)=\frac{x-a+1}{b-a+1}\\[2ex] =\frac{x-a+1}{n} \end{align*}
Symbol | Meaning |
---|---|
X | A random variable with a discrete uniform distribution |
F_X | The cumulative distribution function of X |
x | Any particular member of the sample space of X |
a | The lower bound of the sample space |
b | The upper bound of the sample space |
n | b-a+1 (The number of points in the sample space) |
In the case of the the six-sided die, the cumulative distribution function is
\begin{aligned} F_X(x;a=1,b=6)&=\frac{x-a+1}{b-a+1}\\[2ex] &=\frac{x-1+1}{6-1+1}\\[2ex] &=\frac{x}{6} \end{aligned}
The cumulative distribution function is so-named because it adds all the probabilities in the probability mass function up to and including a particular member of the sample space. Figure 8.7 shows how the each probability in the cumulative distribution function of the roll of a six-sided die is the sum of the current and all previous probabilities in the probability mass function.
8.5.5 Quantile functions
The inverse of the cumulative distribution function is the quantile function. The cumulative distribution starts with a value x in the sample space and tells us p, the proportion of values in that distribution that are less than or equal to x. A quantile function starts with a proportion p and tells us the value x that splits the distribution such that the proportion p of the distribution is less than or equal to x.
As seen in Figure 8.8, if you see a graph of a continuous distribution function, just flip the X and Y axes, and you have a graph of a quantile function.
8.5.6 Generating a Random Sample in R
In R, the sample
function generates numbers from the discrete uniform distribution.
# n = the sample size
<- 6000
n # a = the lower bound
<- 1
a # b = the upper bound
<- 6
b # The sample space is the sequence of integers from a to b
<- seq(a, b)
sample_space # X = the sample with a discrete uniform distribution
# The sample function selects n values
# from the sample space with replacement at random
<- sample(sample_space,
X size = n,
replace = TRUE)
The frequencies of the random sample can be seen in Figure 8.9. Because of sampling error, the frequencies are approximately the same, but not exactly the same. If the sample is larger, the sampling error is smaller, meaning that the sample’s characteristics will tend to more closely resemble the population characteristics. In this case, a larger sample size will produce frequency counts that will appear more even in their magnitude. However, as long as the sample is smaller than the population, sampling error will always be present. With random distributions, the population is assumed to be infinitely large, and thus sampling error at best becomes negligibly small.
8.5.7 Bernoulli Distributions
Feature | Symbol |
---|---|
Sample Space: | x \in \{0,1\} |
Probability that x=1 | p \in {[0,1]} |
Probability that x=0 | q = 1 - p |
Mean | \mu = p |
Variance | \sigma^2 = pq |
Skewness | \gamma_1 = \frac{1 - 2p}{\sqrt{pq}} |
Kurtosis | \gamma_2 = \frac{1}{pq} - 6 |
Probability Mass Function | f_X(x;p) = p^xq^{1 - x} |
Cumulative Distribution Function | F_X(x;p) = x+p(1 - x) |
Notation note: Whereas {a,b} is the set of just two numbers, a and b, [a,b] is the set of all real numbers between a and b.
The toss of a single coin has the simplest probability distribution that I can think of—there are only two outcomes and each outcome is equally probable (Figure 8.10). This is a special case of the Bernoulli distribution. Jakob Bernoulli (Figure 8.11) was a famous mathematician from a famous family of mathematicians. The Bernoulli distribution is just one of the ideas that made Jakob and the other Bernoullis famous.
The Bernoulli distribution can describe any random variable that has two outcomes, one of which has a probability p and the other has a probability q=1-p. In the case of a coin flip, p=0.5. For other variables with a Bernoulli distribution, p can range from 0 to 1.
In psychological assessment, many of the variables we encounter have a Bernoulli distribution. In ability test items in which there is no partial credit, examinees either succeed or fail. The probability of success on an item (in the whole population) is p. In other words, p is the proportion of the entire population that correctly answers the question. Some ability test items are very easy and the probability of success is high. In such cases, p is close to 1. When p is close to 0, few people succeed and items are deemed hard. Thus, in the context of ability testing, p is called the difficulty parameter. This is confusing because when p is high, the item is easy, not difficult. Many people have suggested that it would make more sense to call it the “easiness parameter” but the idea has never caught on.
True/False and Yes/No items on questionnaires also have Bernoulli distributions. If an item is frequently endorsed as true (“I like ice cream.”), p is high. If an item is infrequently endorsed (“I like black licorice and mayonnaise in my ice cream.”), p is very low. Oddly, the language of ability tests prevails even here. Frequently endorsed questionnaire items are referred to as “easy” and infrequently endorsed items are referred to as “difficult,” even though there is nothing particularly easy or difficult about answering them either way.
8.5.7.1 Generating a Random Sample from the Bernoulli Distribution
In R, there is no specialized function for the Bernoulli distribution because it turns out that the Bernoulli distribution is a special case of the binomial distribution, which will be described in the next section. With the function rbinom
, we can generate data with a Bernoulli distribution by setting the size
parameter equal to 1.
# n = sample size
<- 1000
n # p = probability
<- 0.8
p # X = sample
<- rbinom(n, size = 1, prob = p)
X # Make a basic plot
barplot(table(X))
In Figure 8.12, we can see that the random variable generated a sequence that consists of about 80% ones and 20% zeroes. However, because of sampling error, the results are rarely exactly what the population parameter specifies.
8.5.8 Binomial Distributions
Feature | Symbol |
---|---|
Number of Trials | n \in \{1,2,3,\ldots\} |
Sample Space | x \in \{0,...,n\} |
Probability of success in each trial | p \in [0,1] |
Probability of failure in each trial | q = 1 - p |
Mean | \mu = np |
Variance | \sigma = npq |
Skewness | \gamma_1 = \frac{1-2p}{\sqrt{npq}} |
Kurtosis | \gamma_2 = \frac{1}{npq} - \frac{6}{n} |
Probability Mass Function | f_X(x;n,p)=\frac{n!}{x!\left(n-x\right)!}p^x q^{n-x} |
Cumulative Distribution Function | F_X(x;n,p)=\sum_{i=0}^{x}{\frac{n!}{i!(n-i)!} p^i q^{n-i}} |
Let’s extend the idea of coin tosses and see where it leads. Imagine that two coins are tossed at the same time and we count how many heads there are. The outcome we might observe will be zero, one, or two heads. Thus, the sample space for the outcome of the tossing of two coins is the set \{0,1,2\} heads. There is only one way that we will observe no heads (both coins tails) and only one way that we will observe two heads (both coins heads). In contrast, as seen in Figure 8.13, there are two ways that we can observe one head (heads-tails & tails-heads).
The probability distribution of the number of heads observed when two coins are tossed at the same time is a member of the binomial distribution family. The binomial distribution occurs when independent random variables with the same Bernoulli distribution are added together. In fact, Bernoulli discovered the binomial distribution as well as the Bernoulli distribution.
Imagine that a die is rolled 10 times and we count how often a 6 occurs.4 Each roll of the die is called a trial. The sample space of this random variable is \{0,1,2,...,10\}. What is the probability that a 6 will occur 5 times? or 1 time? or not at all? Such questions are answered by the binomial distribution’s probability mass function:
4 Wait! Hold on! I thought that throwing dice resulted in a (discrete) uniform distribution. Well, it still does. However, now we are asking a different question. We are only concerned with two outcomes each time the die is thrown: 6 and not 6. This is a Bernoulli distribution, not a uniform distribution, because the probability of the two events is unequal: {⅙,⅚}
f_X(x;n,p)=\frac{n!}{x!\left(n-x\right)!}p^x\left(1-p\right)^{n-x}
Symbol | Meaning |
---|---|
X | The random variable (the number of sixes from 10 throws of the die) |
x | Any particular member of the sample space (i.e., x \in \{0,1,2,...,10\}) |
n | The number of times that the die is thrown (i.e., n=10) |
p | The probability that a six will occur on a single throw of the die (i.e., p=\frac{1}{6}) |
Because n=10 and p=\frac{1}{6}, the probability mass function simplifies to:
\begin{equation*} f_X(x)=\frac{n!}{x!\left(n-x\right)!}\left(\frac{1}{6}\right)^x\left(\frac{5}{6}\right)^{10-x} \end{equation*}
If we take each element x of the sample space from 0 to 10 and plug it into the equation above, the probability distribution will look like Figure 8.14.
8.5.8.1 Clinical Applications of the Binomial Distribution
When would a binomial distribution be used by a clinician? One particularly important use of the binomial distribution is in the detection of malingering. Sometimes people pretend to have memory loss or attention problems in order to win a lawsuit or collect insurance benefits. There are a number of ways to detect malingering but a common method is to give a very easy test of memory in which the person has at least a 50% chance of getting each test item correct even if the person guesses randomly.
Suppose that there are 20 questions. Even if a person has the worst memory possible, that person is likely to get about half the questions correct. However, it is possible for someone with a legitimate memory problem to guess randomly and by bad luck answer fewer than half of the questions correctly. Suppose that a person gets 4 questions correct. How likely is it that a person would, by random guessing, only answer 4 or fewer questions correctly?
We can use the binomial distribution’s cumulative distribution function. However, doing so by hand is rather tedious. Using R, the answer is found with the pbinom
function:
<- pbinom(4,20,0.5) p
We can see that the probability of randomly guessing and getting 4 or fewer items correct out of 20 items total is approximately 0.006, which is so low that the hypothesis that the person is malingering seems plausible. Note here that there is a big difference between these two questions:
- If the person is guessing at random (i.e., not malingering), what is the probability of answering correctly 4 questions or fewer out of 20?
- If the person answers 4 out of 20 questions correctly, what is the probability that the person is guessing at random (and therefore not malingering)?
Here we answer only the first question. It is an important question, but the answer to the second question is probably the one that we really want to know. We will answer it in another chapter when we discuss positive predictive power. For now, we should just remember that the questions are different and that the answers can be quite different.
8.5.8.2 Graphing the binomial distribution
Suppose that there are n=10 trials, each of which have a probability of p=0.8. The sample space is the sequence of integers from 0 to 10, which can be generated with the seq
function (i.e., seq(0,10)
) or with the colon operator 0:10
. First, the sample space is generated (a sequence from 0 to 10.), using the seq
function. The associated probability mass function probabilities are found using the dbinom
function. The cumulative distribution function probabilities are found using the pbinom
function.
# Make a sequence of numbers from 0 to 10
<- seq(0, 10)
SampleSpace # Probability mass distribution for
# binomial distribution (n = 10, p = 0.8)
<- dbinom(SampleSpace,
pmfBinomial size = 10,
prob = 0.8)
# Generate a basic plot of the
# probability mass distribution
plot(pmfBinomial ~ SampleSpace,
type = "b")
# Cumulative distribution function
# for binomial distribution (n = 10, p = 0.8)
<- pbinom(SampleSpace,
cdfBinomial size = 10,
prob = 0.8)
# Cumulative distribution function
# for binomial distribution (n = 10, p = 0.8)
<- pbinom(SampleSpace,
cdfBinomial size = 10,
prob = 0.8)
# Generate a basic plot of the
# binomial cumulative distribution function
plot(cdfBinomial ~ SampleSpace,
type = "b")
However, making the graph look professional involves quite a bit of code that can look daunting at first. However, the results are often worth the effort. Try running the code below to see the difference. For presentation-worthy graphics, export the graph to the .pdf or .svg format. An .svg file can be imported directly into MS Word or MS PowerPoint.
8.5.9 Poisson Distributions
Feature | Symbol |
---|---|
Parameter | \lambda \in (0,\infty) |
Sample Space | x\in \{0,1,2,\ldots\} |
Mean | \mu = \lambda |
Variance | \sigma^2 = \lambda |
Skewness | \gamma_1 = \frac{1}{\sqrt{\lambda}} |
Kurtosis | \gamma_2 = \frac{1}{\lambda} |
Probability Mass Function | f_X(x;\lambda) = \frac{\lambda^x}{e^{\lambda} x!} |
Cumulative Distribution Function | F_X(x;\lambda) = \sum_{i=0}^{x}{\frac{\lambda^i}{e^{\lambda} i!}} |
Notation note: The notation (0,\infty) means all real numbers greater than 0.
Imagine that an event happens sporadically at random and we measure how often it occurs in regular time intervals (e.g., events per hour). Sometimes the event does not occur in the interval, sometimes just once, and sometimes more than once. However, we notice that over many intervals, the average number of events is constant. The distribution of the number of events in each interval will follow a Poisson distribution. Although “Poisson” means “fish” in French, fish have nothing to do with it. This distribution was named after Siméon Denis Poisson, whose work on the distribution made it famous.
The Poisson distribution has a single parameter \lambda, the average number of events per time interval. Interestingly, \lambda is both the mean and the variance of this distribution. The distribution shape will differ depending on how long our interval is. If an event occurs on average 30 times per hour, \lambda = 30. If we count how often the event occurs in 10-minute intervals, the same event will occur about 5 times per interval, on average (i.e., \lambda = 1). If we choose to count how often the same event occurs every minute, then \lambda = 0.5.
8.5.9.1 A clinical application of the the Poisson distribution
Suppose that you begin treating an adult male client who has panic attacks that come at unpredictable times. Some weeks there are no panic attacks and some weeks there are many, but on average he has 2 panic attacks each week. The client knows this because he has kept detailed records in a spreadsheet for the last 5 years. The client had sought treatment once before, but terminated early and abruptly because, according to him, “It wasn’t working.” After sensitive querying, you discover that he expected that treatment should have quickly reduced the frequency of panic attacks to zero. When that did not happen, he became discouraged and stopped the treatment.
Because your client is well educated and quantitatively inclined, you decide to to use the data he has collected as part of the intervention and also to help set a more realistic set of expectations. Obviously, you and your client both would prefer 0 panic attacks per week, but sometimes it takes more time to get to the final goal. We do not want to terminate treatment that is working just because the final goal has not yet been achieved.
You plot the frequency of how often he had 0 panic attacks in a week, 1 panic attack in a week, 2 panic attacks in a week, and so forth, as shown in red in Figure 8.20. Because you have read this book, you immediately recognize that this is a Poisson distribution with \lambda = 2. When you graph an actual Poison distribution and compare it with your client’s data, you see that it is almost a perfect match.5 Then you explain that although the goal is permanent cessation of the panic attacks, sometimes an intervention can be considered successful if the frequency of panic attacks is merely reduced. For example, suppose that in the early stages of treatment the frequency of panic attacks were reduced from twice per week to once every other week (\lambda = 0.5), on average. If such a reduction were achieved, there would still be weeks in which two or more panic attacks occur. According to Figure 8.23, this will occur about 9% of the time.
5 Note that I am not claiming that all clients’ panic attack frequencies have this kind of distribution. It just so happens to apply in this instance.
In R, you can use the dpois
function to plot the Poisson probability mass function. For example, if the average number of events per time period is λ = 2, then the probability that there will be 0 events is dpois(x = 0, lambda = 2)
, which evaluates to 0.1353.
To calculate the cumulative distribution function of Poisson distribution in R, use the ppois
function. For example, if we want to estimate the probability of having 4 panic attacks or more in a week if λ = 2, we must subtract the probability of having 3 panic attacks or less from 1, like so:
1 - ppois(q = 3, lambda = 2)
p = 0.143
Here is a simple way to plot the probability mass function and the cumulative distribution function using the dpois
and ppois
functions:
# Make a sequence of integers from 0 to 7
<- seq(0, 7)
PanicAttacks
# Generate the probability mass function with lambda = 2
<- dpois(PanicAttacks, 2)
Probability
# Basic plot of the Poisson
# distribution's probability mass function
plot(Probability ~ PanicAttacks, type = "b")
# Generate the cumulative
# distribution function with lambda = 2
<- ppois(PanicAttacks, 2)
CumulativeProbability
# Basic plot of the Poisson distribution's
# cumulative distribution function
plot(CumulativeProbability ~ PanicAttacks, type = "b")
With an additional series with \lambda = 0.5, the plot can look like Figure 8.23.
8.5.10 Geometric Distributions
Feature | Symbol |
---|---|
Probability of success in each trial | p\in[0,1] |
Sample Space | x \in \{1,2,3,\ldots\} |
Mean | \mu = \frac{1}{p} |
Variance | \sigma^2 = \frac{1-p}{p^2} |
Skewness | \gamma_1 = \frac{2-p}{\sqrt{1-p}} |
Kurtosis | \gamma_2 = 6 + \frac{p^2}{1-p} |
Probability Mass Function | f_X(x;p) = (1-p)^{x-1}p^x |
Cumulative Distribution Function | F_X(x;p) = 1-(1-p)^x |
Atul Gawande (2007, pp. 219–223) tells a marvelous anecdote about how a doctor used some statistics to help a young patient with cystic fibrosis to return to taking her medication more regularly. Because the story is full of pathos and masterfully told, I will not repeat a clumsy version of it here. However, unlike Gawande, I will show how the doctor’s statistics were calculated.
According to the story, if a patient fails to take medication, the probability that a person with cystic fibrosis will develop a bad lung illness on any particular day is .005. If medication is taken, the risk is .0005. Although these probabilities are both close to zero, over the the course of a year, they result in very different levels of risk. Off medication, the patient has about an 84% chance of getting sick within a year’s time. On medication, the patient’s risk falls to 17%. As seen in Figure 8.24, the cumulative risk over the course of 10 years is quite different. Without medication, the probability of becoming seriously ill within 10 years at least once is almost certain. With medication, however, a small but substantial percentage (~16%) of patients will go at least 10 years without becoming ill.
Such calculations make use of the geometric distribution. Consider a series of Bernoulli trials in which an event has a probability p of occurring on any particular trial. The probability mass function of the geometric distribution will tell us the probability that the xth trial will be the first time the event occurs.
f_X(x;p)=(1-p)^{x-1}p^x
Symbol | Meaning |
---|---|
X | A random variable with a geometric distribution |
f_X | The probability mass function of X |
x | The number of Bernoulli trials on which the event first occurs |
p | The probability of an event occurring on a single Bernoulli trial |
In R, the probability mass function of the geometric distribution is calculated with the dgeom
function:
# Make a sequence of integers from 1 to 10
<- seq(1, 10)
x
# Generate the probability mass
# function with p = 0.6
<- dgeom(x, prob = 0.6)
Probability
# Basic plot of the geometric
# distribution's probability mass function
plot(Probability ~ x, type = "b")
The cumulative distribution function of the geometric distribution was used to create Figure 8.24. It tells us the probability that the event will occur on the x^{th} trial or earlier:
F_X(x;p)=1-(1-p)^x
In R, the cumulative distribution function of the geometric distribution uses the pgeom
function:
# Generate the cumulative
# distribution function with p = 0.6
<- pgeom(x, prob = 0.6)
CumulativeProbability
# Basic plot of the geometric
# distribution's cumulative distribution function
plot(CumulativeProbability ~ x, type = "b")
8.6 Continuous Distributions
8.6.1 Probability Density Functions
Although there are many more discrete distribution families, we will now consider some continuous distribution families. Most of what we have learned about discrete distributions applies to continuous distributions. However, there is a need of a name change for the probability mass function. In a discrete distribution, we can calculate an actual probability for a particular value in the sample space. In continuous distributions, doing so can be tricky. We can always calculate the probability that a score in a particular interval will occur. However, in continuous distributions, the intervals can become very small, approaching a width of 0. When that happens, the probability associated with that interval also approaches 0. Yet, some parts of the distribution are more probable than others. Therefore, we need a measure of probability that tells us the probability of a value relative to other values: the probability density function
Considering the entire sample space of a discrete distribution, all of the associated probabilities from the probability mass function sum to 1. In a probability density function, it is the area under the curve that must sum to 1. That is, there is a 100% probability that a value generated by the random variable will be somewhere under the curve. There is nowhere else for it to go!
However, unlike probability mass functions, probability density functions do not generate probabilities. Remember, the probability of any value in the sample space of a continuous variable is infinitesimal. We can only compare the probabilities to each other. To see this, compare the discrete uniform distribution and continuous uniform distribution in Figure 8.4. Both distributions range from 1 to 4. In the discrete distribution, there are 4 points, each with a probability of ¼. It is easy to see that these 4 probabilities of ¼ sum to 1. Because of the scale of the figure, it is not easy to see exactly how high the probability density function is in the continuous distribution. It happens to be ⅓. Why? First, it does not mean that each value has a ⅓ probability. There are an infinite number of points between 1 and 4 and it would be absurd if each of them had a ⅓ probability. The distance between 1 and 4 is 3. In order for the rectangle to have an area of 1, its height must be ⅓. What does that ⅓ mean, then? In the case of a single value in the sample space, it does not mean much at all. It is simply a value that we can compare to other values in the sample space. It could be scaled to any value, but for the sake of convenience it is scaled such that the area under the curve is 1.
Note that some probability density functions can produce values greater than 1. If the range of a continuous uniform distribution is less than 1, at least some portions of the curve must be greater than 1 to make the area under the curve equal 1. For example, if the bounds of a continuous distribution are 0 and ⅓, the average height of the probability density function would need to be 3 so that the total area is equal to 1.
8.6.2 Continuous Uniform Distributions
Feature | Symbol |
---|---|
Lower Bound | a \in (-\infty,\infty) |
Upper Bound | b \in (a,\infty) |
Sample Space | x \in \lbrack a,b\rbrack |
Mean | \mu = \frac{a+b}{2} |
Variance | \sigma^2 = \frac{(b-a)^2-1}{12} |
Skewness | \gamma_1 = 0 |
Kurtosis | \gamma_2 = -\frac{6}{5} |
Probability Density Function | f_X(x;a,b) = \frac{1}{b-a} |
Cumulative Distribution Function | F_X(x;a,b) = \frac{x-a}{b-a} |
Unlike the discrete uniform distribution, the uniform distribution is continuous.6 In both distributions, there is an upper and lower bound and all members of the sample space are equally probable.
6 For the sake of clarity, the uniform distribution is often referred to as the continuous uniform distribution.
8.6.2.1 Generating random samples from the continuous uniform distribution
To generate a sample of n numbers with a continuous uniform distribution between a and b, use the runif
function like so:
# Sample size
<- 1000
n # Lower and upper bounds
<- 10
a <- 30
b # Sample
<- runif(n, min = a, max = b) x
8.6.2.2 Using the continuous uniform distribution to generate random samples from other distributions
Uniform distributions can begin and end at any real number but one member of the uniform distribution family is particularly important—the uniform distribution between 0 and 1. If you need to use Excel instead of a statistical package, you can use this distribution to generate random numbers from many other distributions.
The cumulative distribution function of any continuous distribution converts into a continuous uniform distribution. A distribution’s quantile function converts a continuous uniform distribution into that distribution. Most of the time, this process also works for discrete distributions. This process is particularly useful for generating random numbers with an unusual distribution. If the distribution’s quantile function is known, a sample with a continuous uniform distribution can easily be generated and converted.
For example, the RAND
function in Excel generates random numbers between 0 and 1 with a continuous uniform distribution. The BINOM.INV
function is the binomial distribution’s quantile function. Suppose that n (number of Bernoulli trials) is 5 and p (probability of success on each Bernoulli trial) is 0.6. A randomly generated number from the binomial distribution with n=5 and p=0.6 is generated like so:
=BINOM.INV(5,0.6,RAND())
Excel has quantile functions for many distributions (e.g., BETA.INV, BINOM.INV, CHISQ.INV, F.INV, GAMMA.INV, LOGNORM.INV, NORM.INV, T.INV
). This method of combining RAND
and a quantile function works reasonably well in Excel for quick-and-dirty projects, but when high levels of accuracy are needed, random samples should be generated in a dedicated statistical program like R, Python (via the numpy package), Julia, STATA, SAS, or SPSS.
8.6.3 Normal Distributions
(Unfinished)
Feature | Symbol |
---|---|
Sample Space | x \in (-\infty,\infty) |
Mean | \mu = \mathcal{E}\left(X\right) |
Variance | \sigma^2 = \mathcal{E}\left(\left(X - \mu\right)^2\right) |
Skewness | \gamma_1 = 0 |
Kurtosis | \gamma_2 = 0 |
Probability Density Function | f_X(x;\mu,\sigma^2) = \frac{1}{\sqrt{2 \pi \sigma ^ 2}} e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2} |
Cumulative Distribution Function | F_X(x;\mu,\sigma^2) = \frac{1}{\sqrt{2\pi\sigma^2}} {\displaystyle \int_{-\infty}^{x} e ^ {-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}dx} |
The normal distribution is sometimes called the Gaussian distribution after its discoverer, Carl Friedrich Gauss Figure 8.28. It is a small injustice that most people do not use Gauss’s name to refer to the normal distribution. Thankfully, Gauss is not exactly languishing in obscurity. He made so many discoveries that his name is all over mathematics and statistics.
The normal distribution is probably the most important distribution in statistics and in psychological assessment. In the absence of other information, assuming that an individual difference variable is normally distributed is a good bet. Not a sure bet, of course, but a good bet. Why? What is so special about the normal distribution?
To get a sense of the answer to this question, consider what happens to the binomial distribution as the number of events (n) increases. To make the example more concrete, let’s assume that we are tossing coins and counting the number of heads (p=0.5). In Figure 8.29, the first plot shows the probability mass function for the number of heads when there is a single coin (n=1)). In the second plot, n=2 coins. That is, if we flip 2 coins, there will be 0, 1, or 2 heads. In each subsequent plot, we double the number of coins that we flip simultaneously. Even with as few as 4 coins, the distribution begins to resemble the normal distribution, although the resemblance is very rough. With 128 coins, however, the resemblance is very close.
This resemblance to the normal distribution in the example is not coincidental to the fact that p=0.5, making the binomial distribution symmetric. If p is extreme (close to 0 or 1), the binomial distribution is asymmetric. However, if n is large enough, the binomial distribution eventually becomes very close to normal.
Many other distributions, such as the Poisson, Student’s T, F, and \chi^2 distributions, have distinctive shapes under some conditions but approximate the normal distribution in others (See Figure 8.30). Why? In the conditions in which non-normal distributions approximate the normal distribution, it is because, like in Figure 8.29, many independent events are summed.
8.6.3.1 Notation for Normal Variates
Statisticians write about variables with normal distributions so often that a compact notation for specifying a normal variable’s parameters was useful to develop. If I want to specify that X is a normally variable with a mean of \mu and a variance of \sigma^2, I will use this notation:
X \sim \mathcal{N}(\mu, \sigma^2)
Symbol | Meaning |
---|---|
X | A random variable. |
\sim | Is distributed as |
\mathcal{N} | Has a normal distribution |
\mu | The population mean |
\sigma^2 | The population variance |
Many authors list the standard deviation \sigma instead of the variance \sigma^2. When I specify normal distributions with specific means and variances, I will avoid ambiguity by always showing the variance as the standard deviation squared. For example, a normal variate with a mean of 10 and a standard deviation of 3 will be written as X \sim \mathcal{N}(10,3^2).
8.6.3.2 Half-Normal Distribution
(Unfinished)
Feature | Symbol |
---|---|
Sample Space | x \in [\mu,\infty) |
Mu | \mu \in (-\infty,\infty) |
Sigma | \sigma \in [0,\infty) |
Mean | \mu + \sigma\sqrt{\frac{2}{\pi}} |
Variance | \sigma^2\left(1-\frac{2}{\pi}\right) |
Skewness | \sqrt{2}(4-\pi)(\pi-2)^{-\frac{3}{2}} |
Kurtosis | 8(\pi-3)(\pi-2)^{-2} |
Probability Density Function | f_X(x;\mu,\sigma) = \sqrt{\frac{2}{\pi \sigma ^ 2}} e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2} |
Cumulative Distribution Function | F_X(x;\mu,\sigma) = \sqrt{\frac{2}{\pi\sigma}} {\displaystyle \int_{\mu}^{x} e ^ {-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}dx} |
Suppose that X is a normally distributed variable such that
X \sim \mathcal{N}(\mu, \sigma^2)
Variable Y then has a half-normal distribution such that Y = |X-\mu|+\mu. In other words, imagine that a normal distribution is folded at the mean with the left half of the distribution now stacked on top of the right half of the distribution (See Figure 8.34).
8.6.3.3 Truncated Normal Distributions
(Unfinished)
8.6.3.4 Multivariate Normal Distributions
(Unfinished)
8.6.4 Chi Square Distributions
(Unfinished)
Feature | Symbol |
---|---|
Sample Space | x \in [0,\infty) |
Degrees of freedom | \nu \in [0,\infty) |
Mean | \nu |
Variance | 2\nu |
Skewness | \sqrt{8/\nu} |
Kurtosis | 12/\nu |
Probability Density Function | f_X(x;\nu) = \frac{x^{\nu/2-1}}{2^{\nu/2}\;\Gamma(\nu/2)\,\sqrt{e^x}} |
Cumulative Distribution Function | F_X(x;\nu) = \frac{\gamma\left(\frac{\nu}{2},\frac{x}{2}\right)}{\Gamma(\nu/2 )} |
I have always thought that the \chi^2 distribution has an unusual name. The chi part is fine, but why square? Why not call it the \chi distribution?7 As it turns out, the \chi^2 distribution is formed from squared quantities.
7 Actually, there is a \chi distribution. It is simply the square root of the \chi^2 distribution. The half-normal distribution happens to be a \chi distribution with 1 degree of freedom.
The \chi^2 distribution has a straightforward relationship with the normal distribution. It is the sum of multiple independent squared normal variates. That is, if z is a standard normal variate— z\sim\mathcal{N}(0,1^2)—then z^2 has a \chi^2 distribution with 1 degree of freedom (\nu):
z^2\sim \chi^2_1
If z_1 and z_2 are independent standard normal variates, the sum of their squares has a \chi^2 distribution with 2 degrees of freedom:
z_1^2+z_2^2 \sim \chi^2_2 If \{z_1,z_2,\ldots,z_{\nu} \} is a series of \nu independent standard normal variates, the sum of their squares has a \chi^2 distribution with \nu degrees of freedom:
\sum^\nu_{i=1}{z_i^2} \sim \chi^2_\nu
8.6.4.1 Clinical Uses of the \chi^2 distribution
The \chi^2 distribution has many applications, but the mostly likely of these to be used in psychological assessment is the \chi^2 Test of Goodness of Fit and the \chi^2 Test of Independence.
Suppose we suspect that a child’s temper tantrums are more likely to occur on weekdays than on weekends. The child’s mother has kept a record of each tantrum for the past year and was able to count the frequency of tantrums. If tantrums were equally likely to occur on any day, 5 of 7 tantrums should occur on weekdays, and 2 of 7 tantrums should occur on weekends. The observed frequencies are compared with the expected frequencies below.
\begin{array}{r|c|c|c} & \text{Weekday} & \text{Weekend} & \text{Total} \\ \hline \text{Observed Frequency}\, (o) & 14 & 13 & n=27\\ \text{Expected Proportion}\,(p) & \frac{5}{7} & \frac{2}{7} & 1\\ \text{Expected Frequency}\, (e = np)& 27\times \frac{5}{7}= 19.2857& 27\times \frac{2}{7}= 7.7143& 27\\ \text{Difference}\,(o-e) & -5.2857 & 5.2857&0\\ \frac{(o-e)^2}{e} & 1.4487 & 3.6217 & \chi^2 = 5.07 \end{array}
In the table above, if the observed frequencies (o_i) are compared to their respective expected frequencies (e_i), then:
\chi^2_{k-1}=\sum_{i=1}^k{\frac{(o_i-e_i)^2}{e_i}}=5.07
Using the \chi^2 cumulative distribution function, we find that the probability of observing the frequencies listed is low under the assumption that tantrums are equally likely each day.
<- c(14, 13)
observed_frequencies <- c(5,2) / 7
expected_probabilities
<- chisq.test(observed_frequencies, p = expected_probabilities)
fit fit
Chi-squared test for given probabilities
data: observed_frequencies
X-squared = 5.0704, df = 1, p-value = 0.02434
# View expected frequencies and residuals
::augment(fit) broom
# A tibble: 2 × 6
Var1 .observed .prop .expected .resid .std.resid
<fct> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A 14 0.519 19.3 -1.20 -2.25
2 B 13 0.481 7.71 1.90 2.25
B | 0 | 1 |
---|---|---|
0 | 36 | 39 |
1 | 5 | 20 |
# A tibble: 4 × 9
A B .observed .prop .row.prop .col.prop .expected .resid .std.resid
<fct> <fct> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0 0 36 0.36 0.878 0.48 30.8 0.947 2.47
2 1 0 39 0.39 0.661 0.52 44.2 -0.789 -2.47
3 0 1 5 0.05 0.122 0.2 10.2 -1.64 -2.47
4 1 1 20 0.2 0.339 0.8 14.8 1.37 2.47
8.6.5 Student’s t Distributions
Feature | Symbol |
---|---|
Sample Space | x \in (-\infty,\infty) |
Degrees of Freedom | \nu \in (0,\infty) |
Mean | \left\{ \begin{array}{ll} 0 & \nu \gt 1 \\ \text{Undefined} & \nu \le 1 \\ \end{array} \right. |
Variance | \left\{ \begin{array}{ll} \frac{\nu}{\nu-2} & \nu\gt 2 \\ \infty & 1 \lt \nu \le 2\\ \text{Undefined} & \nu \le 1 \\ \end{array} \right. |
Skewness | \left\{ \begin{array}{ll} 0 & \nu \gt 3 \\ \text{Undefined} & \nu \le 3 \\ \end{array} \right. |
Kurtosis | \left\{ \begin{array}{ll} \frac{6}{\nu-4} & \nu \gt 4 \\ \infty & 2 \lt \nu \le 4\\ \text{Undefined} & \nu \le 2 \\ \end{array} \right. |
Probability Density Function | f_X(x; \nu) = \frac{\Gamma(\frac{\nu+1}{2})} {\sqrt{\nu\pi}\,\Gamma(\frac{\nu}{2})} \left(1+\frac{x^2}{\nu} \right)^{-\frac{\nu+1}{2}} |
Cumulative Distribution Function | F_X(x; \nu)=\frac{1}{2} + x \Gamma \left( \frac{\nu+1}{2} \right) \frac{\phantom{\,}_{2}F_1 \left(\frac{1}{2},\frac{\nu+1}{2};\frac{3}{2};-\frac{x^2}{\nu} \right)} {\sqrt{\pi\nu}\,\Gamma \left(\frac{\nu}{2}\right)} |
Notation note: \Gamma is the gamma function. _2F_1 is the hypergeometric function.
(Unfinished)
Guinness Beer gets free advertisement every time the origin story of the Student t distribution is retold, and statisticians retell the story often. The fact that the original purpose of the t distribution was to brew better beer seems too good to be true.
William Sealy Gosset (1876–1937), self-trained statistician and head brewer at Guinness Brewery in Dublin, continually experimented on small batches to improve and standardize the brewing process. With some help from statistician Karl Pearson, Gosset used then-current statistical methods to analyze his experimental results. Gosset found that Pearson’s methods required small adjustments when applied to small samples. With Pearson’s help and encouragement (and later from Ronald Fisher), Gosset published a series of innovative papers about a wide range of statistical methods, including the t distribution, which can be used to describe the distribution of sample means.
Worried about having its trade secrets divulged, Guinness did not allow its employees to publish scientific papers related to their work at Guinness. Thus, Gosset published his papers under the pseudonym, “A Student.” The straightforward names of most statistical concepts need no historical treatment. Few of us who regularly use the Bernoulli, Pareto, Cauchy, and Gumbell distributions could tell you anything about the people who discovered them. But the oddly named “Student’s t distribution” cries out for explanation. Thus, in the long run, it was Gosset’s anonymity that made him famous.
8.6.5.1 The t distribution’s relationship Relationship to the normal distribution.
Suppose we have two independent standard normal variates Z_0 \sim \mathcal{N}(0, 1^2) and Z_1 \sim \mathcal{N}(0, 1^2).
A t distribution with one degree of freedom is created like so:
T_1 = z_0\sqrt{\frac{1}{z_1^2}}
A t distribution with two degrees of freedom is created like so:
T_2 = z_0\sqrt{\frac{2}{z_1^2 + z_2^2}}
Where z_0, z_1 and z_2 are independent standard normal variates.
A t distribution with \nu degrees of freedom is created like so:
T_v = z_0\sqrt{\frac{\nu}{\sum_{i=1}^\nu z_i^2}}
The sum of \nu squared standard normal variates \left(\sum_{i=1}^\nu z_i^2\right) has a \chi^2 distribution with \nu degrees of freedom, which has a mean of \nu. Therefore, \sqrt{\frac{\nu}{\sum_{i=1}^\nu z_i^2}}, on average, equals one. However, the expression \sqrt{\frac{\nu}{\sum_{i=1}^\nu z_i^2}} has a variability approaches 0 as \nu increases. When \nu is high, z_0 is being multiplied by a value very close to 1. Thus, T_\nu is nearly normal at high levels of nu.
8.7 Additional Distributions
8.7.1 F Distributions
Suppose that X is the ratio of two independent \chi^2 variates U_1 and U_2 scaled by their degrees of freedom \nu_1 and \nu_2, respectively:
X=\frac{\frac{U_1}{\nu_1}}{\frac{U_2}{\nu_2}}
The random variate X will have an F distribution with parameters, \nu_1 and \nu_2.
The primary application of the F distribution is to test the equality of variances in ANOVA. I am unaware of any direct applications of the F distribution in psychological assessment.
8.7.2 Weibull Distributions
How long do we have to wait before an event occurs? With Weibull distributions, we model wait times in which the probability of the event changes depending on how long we have waited. Some machines are designed to last a long time, but defects in a part might cause it fail quickly. If the machine is going to fail, it is likely to fail early. If the machine works flawlessly in the early period, we worry about it less. Of course, all physical objects wear out eventually, but a good design and regular maintenance might allow a machine to operate for decades. The longer machine has been working well, the less risk that it will irreparably fail on any particular day.
For some things, the risk of failure on any particular day becomes increasingly likely the longer it has been used. Biological aging causes increasing risk of death over time such that the historical records have no instances of anyone living beyond
For some events, there is a constant probability that the event will occur. For others, the probability is higher at first but becomes steadily less likely over time
the longer we wait the greater the probability will occur. For example, as animals age the probability of death accelerates such that beyond a certain age no individual as been observed to survive.
8.7.3 Unfinished
- Gumbel Distributions
- Beta Distributions
- Exponential Distributions
- Pareto Distributions