8  Distributions

8.1 Random Variables

Because we first learn about variables in an algebra class, we tend to think of variables as having values that can be solved for—if we have enough information about them. If I say that x is a variable and that x+6=8, we can use algebra to find that x must equal 2.

Random variables are not like algebraic variables. Random variables simply take on values because of some random process. If we say that the outcome of a throw of a six-sided die is a random variable, there is nothing to “solve for.” There is no equation that determines the value of the die. Instead, it is determined by chance and the physical constraints of the die. That is, the outcome must be one of the numbers printed on the die, and the six numbers are equally likely to occur. This illustrates an important point. The word random here does not mean “anything can happen.” On a six-sided die, you will never roll a 7, 3.5, \sqrt{\pi}, −36,000, or any other number that does not appear on the six sides of the die. Random variables have outcomes that are subject to random processes, but those random processes do have constraints on them such that some outcomes are more likely than others—and some outcomes never occur at all.

Random variables have values that are determined by a random process.
Figure 8.1: Rolling a six-sided die is a process that creates a randomly ordered series of integers from 1 to 6.

When we say that the throw of a six-sided die is a random variable, we are not talking about any particular throw of a particular die but, in a sense, every throw (that has ever happened or ever could happen) of every die (that has ever existed or could exist). Imagine an immense, roaring, neverending, cascading flow of dice falling from the sky. As each die lands and disappears, a giant scoreboard nearby records the relative frequencies of ones, twos, threes, fours, fives, and sixes. That’s a random variable.

# Function to make dice
makedice <- function(i, id) {
  x = switch(
           i,
           `1` = 0,
           `2` = c(-1, 1),
           `3` = c(-1, 1, 0),
           `4` = c(-1, 1, -1, 1),
           `5` = c(-1, 1, -1, 1, 0),
           `6` = c(-1, 1, -1, 1, -1, 1)
         )
  y = switch(
           i,
           `1` = 0,
           `2` = c(1,-1),
           `3` = c(1,-1, 0),
           `4` = c(1,-1,-1, 1),
           `5` = c(1,-1,-1, 1, 0),
           `6` = c(1,-1,-1, 1, 0, 0))
  
  tibble(id = id * 1,
         i = i,
         x = x,
         y = y) %>% 
    add_case(id = id + 0.5,
             i = 0,
             x = NA,
             y = NA)
}

# Die radius
r <- 0.35
dpos <- 1.5

# Die round arcs
dround <- tibble(x0 = c(-dpos, dpos, -dpos, dpos), 
                 y0 = c(dpos,dpos,-dpos,-dpos), 
                 r = r, 
                 start = c(-pi / 2, pi / 2, -pi, pi / 2) , 
                 end = c(0,0, -pi / 2,pi) )

# Line segments
dsegments <- tibble(x = c(-dpos, dpos + r, dpos, -dpos - r), 
                    y = c(dpos + r,dpos,-dpos - r,-dpos),
                    xend = c(dpos, dpos + r, -dpos,-dpos - r), 
                    yend = c(dpos + r,-dpos,-dpos - r, dpos))
# Number of throws
k <- 50

# Dot positions
d <- map2_df(sample(1:6,k, replace = T), 1:k, makedice)  

# Plot
p <- ggplot(d) + 
  geom_point(pch = 16, size = 50, aes(x,y))  + 
  geom_arc(aes(x0 = x0,
               y0 = y0,
               r = r,
               start = start,
               end = end,
               linetype = factor(r)),
           data = dround,
           linewidth = 2) + 
  geom_segment(data = dsegments,
               aes(x = x, y = y, xend = xend, yend = yend),
               linewidth = 2) +
  coord_equal() + 
  theme_void() + 
  theme(legend.position = "none") + 
  transition_manual(id) 

# Render animation
animate(p, 
        fps = 1,
        device = "svg",
        renderer = magick_renderer(), 
        width = 8,
        height = 8)

8.2 Sets

A set refers to a collection of objects. Each distinct object in a set is an element.

A set is a collection of distinct objects.An element is a distinct member of a set.

8.2.1 Discrete Sets

To show that a list of discrete elements is a discrete set, we can use curly braces. For example, the set of positive single-digit even numbers is \{2, 4, 6, 8\}. With large sets with repeating patterns, it is convenient to use an ellipsis (“…”), the punctuation mark signifying an omission or pause. For example, rather than listing every two-digit positive even number, we can show the pattern like so:

A discrete set has numbers that are isolated, meaning that each number has a range in which it is the only number in the overall set.

\{10, 12, 14,\ldots, 98\}

If we want the pattern to repeat forever, we can set an ellipsis on the left, right, or both sides. The set of odd integers extends to infinity in both directions:

\{\ldots, -5, -3, -1, 1, 3, 5, \ldots\}

8.2.2 Interval Sets

With continuous variables, we can define sets in terms of intervals. Whereas the discrete set \{0,1\} refers just to the numbers 0 and 1, the interval set (0,1) refers to all the numbers between 0 and 1.

Intervals are a continous range of numbers.
Figure 8.2: Interval Notation

As shown in Figure 8.2, some intervals include their endpoints and others do not. Intervals noted with square brackets include their endpoints and intervals written with parentheses exclude them. Some intervals extend to positive or negative infinity: (-\infty,5] and (-8,+\infty). Use a parenthesis with infinity instead of a square bracket because infinity is not a specific number that can be included in an interval.

# Interval notation
tibble(lb = 1L,
       ub = 5L,
       y = 1:4,
       meaning = c("includes 1 and 5",
                   "excludes 1 and 5",
                   "includes 1 but not 5",
                   "includes 5 but not 1"),
       l_bracket = c("[", "(", "[", "("),
       u_bracket = c("]", ")", ")", "]")) %>% 
  mutate(Interval = paste0(l_bracket, lb, ",", ub, u_bracket) %>% 
           fct_inorder() %>% 
           fct_rev,
         l_fill = ifelse(l_bracket == "[", myfills[1], "white"),
         u_fill = ifelse(u_bracket == "]", myfills[1], "white")) %>% 
  ggplot(aes(lb, Interval)) + 
  geom_segment(aes(xend = ub, yend = Interval), 
               linewidth = 2, 
               color = myfills[1]) +
  geom_point(aes(fill = l_fill), 
             size = 5, 
             pch = 21, 
             stroke = 2, color = myfills[1]) + 
  geom_point(aes(fill = u_fill, x = ub), 
             size = 5, 
             pch = 21, 
             stroke = 2, 
             color = myfills[1]) +
  geom_label(aes(label = paste(Interval, meaning), x = 3), 
             vjust = -.75, 
             label.padding = unit(0,"lines"), 
             label.size = 0, 
             family = bfont, 
             size = ggtext_size(27)) +
  scale_fill_identity() +
  scale_y_discrete(NULL, expand = expansion(c(0.08, 0.25))) +
  scale_x_continuous(NULL, minor_breaks = NULL) +
  theme_minimal(base_size = 27, 
                base_family = bfont) + 
  theme(axis.text.y = element_blank(), 
        panel.grid.major = element_blank(),
        axis.line.x = element_line(linewidth = .5),
        axis.ticks.x = element_line(linewidth = .25), 
        plot.margin = margin())

8.3 Sample Spaces

The set of all possible outcomes of a random variable is the sample space. Continuing with our example, the sample space of a single throw of a six-sided die is the set \{1,2,3,4,5,6\}. Sample space is a curious term. Why sample and why space? With random variables, populations are infinitely large, at least theoretically. Random variables just keep spitting out numbers forever! So any time we actually observe numbers generated by a random variable, we are always observing a sample; actual infinities cannot be observed in their entirety. A space is a set that has mathematical structure. Most random variables generate either integers or real numbers, both of which are structured in many ways (e.g., order).

A sample space is the set of all possible values that a random variable can assume.A population consists of all entities under consideration.A sample is a subset of a population.

Unlike distributions having to do with dice, many distributions have a sample space with an infinite number of elements. Interestingly, there are two kinds of infinity we can consider. A distribution’s sample space might be the set of whole numbers: \{0,1,2,...\}, which extends to positive infinity. The sample space of all integers extends to infinity in both directions: \{...-2,-1,0,1,2,...\}.

The sample space of continuous variables is infinitely large for another reason. Between any two points in a continuous distribution, there is an infinite number of other points. For example, in the beta distribution, the sample space consists of all real numbers between 0 and 1: (0,1). Many continuous distributions have sample spaces that involve both kinds of infinity. For example, the sample space of the normal distribution consists of all real numbers from negative infinity to positive infinity: (-\infty, +\infty).

8.4 Probability Distributions

Each element of a random variable’s sample space occurs with a particular probability. When we list the probabilities of each possible outcome, we have specified the variable’s probability distribution. In other words, if we know the probability distribution of a variable, we know how probable each outcome is. In the case of a throw of a single die, each outcome is equally likely (Figure 8.3).

In a probability distribution, there is an assignment of a probability to each possible element in a variable’s sample space.
Figure 8.3: The probability distribution of a throw of a single die
% Dice PMF

\documentclass[tikz = true, border = 2pt]{standalone}
\usepackage{tikz}
\usepackage{xfrac}
\usetikzlibrary{shapes,calc}
\usepackage{fontspec}
\setmainfont{Equity Text A}
\tikzset{
    dot hidden/.style={},
    line hidden/.style={},
    dot colour/.style={dot hidden/.append style={color=#1}},
    dot colour/.default=black,
    line colour/.style={line hidden/.append style={color=#1}},
    line colour/.default=black
}

\usepackage{xparse}
\NewDocumentCommand{\drawdie}{O{}m}{
    \begin{tikzpicture}[x=1em,y=1em,radius=0.1,#1]
    \draw[rounded corners=2,line hidden] (0,0) rectangle (1,1);
    \ifodd#2
    \fill[dot hidden] (0.5,0.5) circle;
    \fi
    \ifnum#2>1
    \fill[dot hidden] (0.25,0.25) circle;
    \fill[dot hidden] (0.75,0.75) circle;
    \ifnum#2>3
    \fill[dot hidden] (0.25,0.75) circle;
    \fill[dot hidden] (0.75,0.25) circle;
    \ifnum#2>5
    \fill[dot hidden] (0.75,0.5) circle;
    \fill[dot hidden] (0.25,0.5) circle;
    \ifnum#2>7
    \fill[dot hidden] (0.5,0.75) circle;
    \fill[dot hidden] (0.5,0.25) circle;
    \fi
    \fi
    \fi
    \fi
    \end{tikzpicture}
}

\begin{document}
    \begin{tikzpicture}
    \foreach \n in {1,...,6} {
        \node at ($(0,7)-(0,\n)$) {\drawdie [scale = 2]{\n}};
        \node [fill=gray!50,
               minimum height = 2cm,
               minimum width = 0.1cm,
               single arrow,
               single arrow head extend =.15cm,
               single arrow head indent =.08cm,
               inner sep=1mm] at ($(1.55,7)-(0,\n)$) {};
        \node  (p1) at (3,\n) {\large{$\sfrac{\text{1}}{\text{6}}$}};
    }
    \node [text centered,
           anchor=south,
           text height = 1.5ex,
           text depth = .25ex] (p3) at (0,6.6) {\large{{Sample Space}}};
    \node [text centered,
           anchor = south,
           text height = 1.5ex,
           text depth = .25ex] (p4) at (3,6.6) {\large{{Probability}}};
    \end{tikzpicture}
\end{document}

There is an infinite variety of probability distributions, but a small subset of them have been given names. Now, one can manage one’s affairs quite well without ever knowing what a Bernoulli distribution is, or what a \chi{^2} distribution is, or even what a normal distribution is. However, sometimes life is a little easier if we have names for useful things that occur often. Most of the distributions with names are not really single distributions, but families of distributions. The various members of a family are unique but they are united by the fact that their probability distributions are generated by a particular mathematical function (more on that later). In such cases, the probability distribution is often represented by a graph in which the sample space is on the X-axis and the associated probabilities are on the Y-axis. In Figure 8.4, 16 probability distributions that might be interesting and useful to clinicians are illustrated. Keep in mind that what are pictured are only particular members of the families listed; some family members look quite different from what is shown in Figure 8.4.

Figure 8.4: A gallery of useful distributions
# A gallery of useful distributions
# Run output file pdfIllustration.tex in LaTeX
# pdflatex --enable-write18 --extra-mem-bot=10000000 --synctex=1 pdfIllustration.tex
tikzpackages <- paste(
  "\\usepackage{tikz}",
  "\\usepackage{amsmath}",
  "\\usepackage[active,tightpage,psfixbb]{preview}",
  "\\PreviewEnvironment{pgfpicture}",
  collapse = "\n"
)

tikzDevice::tikz('pdfIllustration.tex',
                 standAlone = TRUE, 
                 packages = tikzpackages, 
                 width = 11, 
                 height = 11)
par(
  mar = c(1.75, 1.3, 1.75, 0),
  mfrow = c(4, 4),
  las = 1,
  xpd = TRUE,
  family = 'serif',
  pty = "s",
  mgp = c(2, 0.5, 0),
  tcl = -0.3,
  cex = 1.35
)

# Bernoulli
plot(
  c(0.2, 0.8) ~ seq(0, 1),
  type = "b",
  ylim = c(0, 1),
  bty = "n",
  col = myfills[1],
  lwd = 1,
  xlab = "",
  ylab = "",
  main = "Bernoulli",
  lty = 3,
  pch = 19,
  xaxp = c(0, 1, 1),
  xlim = c(-0.1, 1)
)
text(x = .7, y = .8, "$p=0.8$")

# Binomial
plot(
  dbinom(seq(0, 5), 5, 0.2) ~ seq(0, 5),
  type = "b",
  xlim = c(-0.1, 5),
  bty = "n",
  col = myfills[1],
  lwd = 1,
  xlab = "",
  ylab = "",
  main = "Binomial",
  lty = 3,
  pch = 19
)
text(
  x = c(3.5, 3.5),
  y = c(.35, .25),
  c("$p=0.2$", "$n=5$"),
  adj = 0
)

# Poisson
plot(
  dpois(0:10, 3) ~ seq(0, 10),
  type = "b",
  xlim = c(-0.1, 10),
  bty = "n",
  col = myfills[1],
  lwd = 1,
  xlab = "",
  ylab = "",
  main = "Poisson",
  lty = 3,
  pch = 19
)
text(x = 7, y = .15, r"($\lambda=3$)")

# Geometric
plot(
  dgeom(0:4, prob = 0.8) ~ seq(0, 4),
  type = "b",
  ylim = c(0, .8),
  xlim = c(-0.1, 4),
  bty = "n",
  col = myfills[1],
  lwd = 1,
  xlab = "",
  ylab = "",
  main = "Geometric",
  lty = 3,
  pch = 19
)
text(x = 2, y = .6, "$p=0.8$")

# Discrete Uniform
plot(
  rep(1 / 4, 4) ~ seq(1, 4),
  type = "b",
  ylim = c(0, 1),
  xlim = c(0, 5),
  bty = "n",
  col = myfills[1],
  lwd = 1,
  xlab = "",
  ylab = "",
  main = "Discrete Uniform",
  lty = 3,
  pch = 19
)
text(
  x = c(1, 4),
  y = c(.5, .5),
  c("$a=1$", "$b=4$"),
  adj = 0.5
)

# Continuous
plot(
  c(0, 1 / 3, 1 / 3, 0) ~ c(1, 1, 4, 4),
  type = "n",
  ylim = c(0, 1),
  xlim = c(0, 5),
  bty = "n",
  col = myfills[1],
  lwd = 2,
  xlab = "",
  ylab = "",
  main = "Continuous Uniform"
)
polygon(
  c(1, 1, 4, 4),
  c(0, 1 / 3, 1 / 3, 0),
  col = myfills[1],
  xpd = FALSE,
  border = NA
)
text(
  x = c(1, 4),
  y = c(.5, .5),
  c("$a=1$", "$b=4$"),
  adj = 0.5
)

# Normal
x <- seq(-4, 4, 0.02)
plot(
  dnorm(x) ~ x,
  type = "n",
  col = myfills[1],
  xlab = "",
  ylab = "",
  main = "Normal",
  lwd = 2,
  bty = "n",
  axes = F
)
polygon(
  c(min(x), x, max(x)),
  c(0, dnorm(x), 0),
  col = myfills[1],
  lwd = 1,
  xpd = FALSE,
  border = NA
)
axis(2)

text(
  x = c(-1.5, -1.5),
  y = c(.35, .25),
  c("$\\mu=0$", "$\\sigma^2=1$"),
  adj = 1
)


center_neg <- function(x) {
  signs <- sign(x)
  paste0(ifelse(signs < 0,"$",""), x, ifelse(signs < 0,"\\phantom{-}$",""))
}

all_tick_labels <- function(side = 1, at, labels = at) {
  axis(side, labels = rep("",length(at)), at = at)
for (i in 1:length(at)) {
  axis(side, 
       at = at[i], 
       labels = labels[i],
       tick = F)
  }
}
axis_ticks <- seq(-4,4,2)
axis_labs <- center_neg(axis_ticks)
all_tick_labels(1, at = axis_ticks, labels = axis_labs)


# Student t
x <- seq(-6, 6, 0.02)
plot(
  dt(x, 2) ~ x,
  type = "n",
  col = myfills[1],
  xlab = "",
  ylab = "",
  main = "Student's $\\boldsymbol{t}$",
  lwd = 2,
  bty = "n",
  ylim = c(0, 0.4),
  axes = F
)
polygon(
  c(min(x), x, max(x)),
  c(0, dt(x, 2), 0),
  col = myfills[1],
  lwd = 1,
  xpd = FALSE,
  border = NA
)
text(x = 3, y = .3, "$\\nu=2$")
axis(2)
axis_ticks <- seq(-6,6,2)
axis_labs <- center_neg(axis_ticks)
all_tick_labels(1, at = axis_ticks, labels = axis_labs)

# Chi-Square
x <- seq(0, 40, 0.05)
plot(
  dchisq(x, 13) ~ x,
  type = "n",
  col = myfills[1],
  xlab = "",
  ylab = "",
  main = "$\\boldsymbol{\\chi^2}$",
  lwd = 2,
  bty = "n",
  ylim = c(0, 0.1)
)
polygon(
  c(min(x), x, max(x)),
  c(0, dchisq(x, 13), 0),
  col = myfills[1],
  lwd = 1,
  xpd = FALSE,
  border = NA
)
text(x = 20, y = .08, "$k=2$", adj = 0)

# F
x <- seq(0, 6, 0.01)
plot(
  df(x, 3, 120) ~ x,
  type = "n",
  col = myfills[1],
  xlab = "",
  ylab = "",
  main = "$\\boldsymbol{F}$",
  lwd = 2,
  bty = "n",
  ylim = c(0, 0.8)
)
polygon(
  c(min(x), x, max(x)),
  c(0, df(x, 3, 120), 0),
  col = myfills[1],
  lwd = 1,
  xpd = TRUE,
  border = NA
)
text(
  x = c(2, 2),
  y = c(.6, .4),
  c("$d_1=3$", "$d_2=120$"),
  adj = 0
)

# Weibull
x <- seq(6.5, 11.5, 0.01)
plot(
  dweibull(x, 20, 10) ~ x,
  type = "n",
  col = myfills[1],
  xlab = "",
  ylab = "",
  main = "Weibull",
  lwd = 2,
  bty = "n",
  ylim = c(0, 0.8)
)
polygon(
  c(min(x), x, max(x)),
  c(0, dweibull(x, 20, 10), 0),
  col = myfills[1],
  lwd = 1,
  xpd = TRUE,
  border = NA
)
text(
  x = c(7, 7),
  y = c(.6, .4),
  c("$k=20$", "$\\lambda=10$"),
  adj = 0
)

# Beta
x <- seq(0, 1, 0.01)
plot(
  dbeta(x, 2, 2.5) ~ x,
  type = "n",
  col = myfills[1],
  xlab = "",
  ylab = "",
  main = "Beta",
  lwd = 2,
  bty = "n",
  ylim = c(0, 2),
  xaxp = c(0, 1, 1)
)
polygon(
  c(min(x), x, max(x)),
  c(0, dbeta(x, 2, 2.5), 0),
  col = myfills[1],
  lwd = 1,
  xpd = TRUE,
  border = NA
)
text(
  x = c(0.75, 0.75),
  y = c(1.75, 1.25),
  c("$\\alpha=2$", "$\\beta=2.5$"),
  adj = 0
)

# Log Normal
x <- c(seq(0, .999, 0.001), seq(1, 15, .05))
plot(
  dlnorm(x, 2, .5) ~ x,
  type = "n",
  col = myfills[1],
  xlab = "",
  ylab = "",
  main = "Log Normal",
  lwd = 2,
  bty = "n",
  ylim = c(0, 0.4)
)
polygon(
  c(min(x), x, max(x)),
  c(0, dlnorm(x, 1, 0.5), 0),
  col = myfills[1],
  lwd = 1,
  xpd = TRUE,
  border = NA
)
text(
  x = c(7, 7),
  y = c(.3, .2),
  c("$\\mu=2$", "$\\sigma=0.5$"),
  adj = 0
)


# Skew Normal
x <- seq(-4, 4, 0.01)
plot(
  sn::dsn(x, 2, 2.5) ~ x,
  type = "n",
  col = myfills[1],
  xlab = "",
  ylab = "",
  main = "Skew Normal",
  lwd = 2,
  bty = "n",
  ylim = c(0, 0.5),
  axes = F
)
polygon(
  c(min(x), x, max(x)),
  c(0, sn::dsn(
    x,
    xi = 1,
    omega = 1.5,
    alpha = -4
  ), 0),
  col = myfills[1],
  lwd = 1,
  xpd = TRUE,
  border = NA
)
text(
  x = c(-3.9, -3.9, -3.9),
  y = c(.45, .35, .25),
  c("$\\xi=1$", "$\\omega=1.5$", "$\\alpha=-4$"),
  adj = 0
)

axis(2)
axis_ticks <- seq(-4,4,2)
axis_labs <- center_neg(axis_ticks)
all_tick_labels(1, at = axis_ticks, labels =axis_labs)

# Normal Mixture
x <- seq(-4, 4, 0.01)
y <- (dnorm(x, -2, 0.5) * .25 + dnorm(x))
plot(
  y ~ x,
  type = "n",
  col = "violet",
  xlab = "",
  ylab = "",
  main = "Normal Mixture",
  lwd = 4,
  bty = "n",
  ylim = c(0, 0.5),
  axes = F
)
polygon(
  c(min(x), x, max(x)),
  c(0, y, 0),
  col = myfills[1],
  lwd = 1,
  xpd = TRUE,
  border = NA
)
polygon(
  c(min(x), x, max(x)),
  c(0, dnorm(x, -2, 0.5) * .25, 0),
  col = myfills[2],
  lwd = 1,
  xpd = TRUE,
  border = NA
)
text(
  x = c(-0.85, -0.85),
  y = c(.45, .35),
  c("$\\mu_1=-2$", "$\\sigma_1^2=0.5$"),
  adj = 1
)
text(
  x = c(3.5, 3.5),
  y = c(.45, .35),
  c("$\\mu_2=0$", "$\\sigma_2^2=1$"),
  adj = 1
)
axis(2)
axis_ticks <- seq(-4,4,2)
axis_labs <- center_neg(axis_ticks)
all_tick_labels(1, at = axis_ticks, labels =axis_labs)
# Bivariate Normal

x = seq(-4, 4, 0.1)
X = fMultivar::grid2d(x)
z = fMultivar::dnorm2d(X$x, X$y, rho = 0.6)
Z = list(x = x,
         y = x,
         z = matrix(z, ncol = length(x)))
persp(
  Z,
  theta = 20,
  phi = 25,
  col = "royalblue1",
  xlab = "\nX",
  ylab = "\nY",
  zlab = "",
  zlim = c(0, .20),
  border = NA,
  expand = .7,
  box = FALSE,
  ticktype = "simple",
  ltheta = 0,
  shade = 0.5,
  main = "Bivariate Normal",
  lwd = 0.5
)
text(c(-.11, .32), c(-.42, -.25), c("X", "Y"))
text(0, 0.25, "$\\rho_{XY}=0.6$")
dev.off()

8.5 Discrete Distrubitions

The sample spaces in discrete distributions are discrete sets. Thus, in the x-axis of the probability distributions, you will see isolated numbers with gaps between each number (e.g., integers).

8.5.1 Discrete Uniform Distributions

The throw of a single die is a member of a family of distributions called the discrete uniform distribution. It is “discrete” because the elements in the sample space are countable, with evenly spaced gaps between them. For example, there might be a sequence of 8, 9, 10, and 11 in the sample space, but there are no numbers in between. It is “uniform” because all outcomes are equally likely. With dice, the numbers range from a lower bound of 1 to an upper bound of 6. In the family of discrete uniform distributions, the lower and upper bounds are typically integers, mostly likely starting with 1. However, any real number a can be the lower bound and the spacing k between numbers can be any positive real number. For the sake of simplicity and convenience, I will assume that the discrete uniform distribution refers to consecutive integers ranging from a lower bound of a and an upper bound of b.

A discrete uniform distribution is a family of random variable distributions in which the sample space is an evenly spaced sequence of numbers, each of which is equally likely to occur.

This kind of discrete uniform distribution has a number of characteristics listed in Table 8.1. I will explain each of them in the sections that follow. As we go, I will also explain the mathematical notation. For example, a \in \{\ldots,-1,0,1,\ldots\} means that a is an integer because \in means is a member of and \{\ldots,-1,0,1,\ldots\} is the set of all integers.1 x \in \{a,a+1,\ldots,b\} means that the each member of the sample space x is a member of the set of integers that include a, b, and all the integers between a and b. The notation for the probability mass function and the cumuluative distribution function function will be explained later in this chapter.

1 Notation note: Sometimes the set of all integers is referred to with the symbol \mathbb{Z}.

Feature Symbol
Lower Bound a \in \{\ldots,-1,0,1,\ldots\}
Upper Bound b \in \{a + 1, a + 2, \ldots\}
Sample Space x \in\{a, a + 1,\ldots,b\}
Number of points n=b-a+1
Mean \mu=\frac{a+b}{2}
Variance \sigma^2=\frac{n^2-1}{12}
Skewness \gamma_1=0
Kurtosis \gamma_2=-\frac{6(n^2+1)}{5(n^2-1)}
Probability Mass Function f_X(x;a,b)=\frac{1}{n}
Cumulative Distribution Function F_X(x;a,b)=\frac{x-a+1}{n}
Table 8.1: Features of Discrete Uniform Distributions

8.5.2 Parameters of Random Variables

The lower bound a and the upper bound b are the discrete uniform distribution’s parameters. The word parameter has many meanings, but here it refers to a characteristic of a distribution family that helps us identify precisely which member of the family we are talking about. Most distribution families have one, two, or three parameters.

A parameter is a defining feature of a random variable’s probability distribution.

If you have taken an algebra class, you have seen parameters before, though the word parameter many not have been used. Think about the formula of a line:

y=mx+b

Both x and y are variables, but what are m and b? Well, you probably remember that m is the slope of the line and that b is the y-intercept. If we know the slope and the intercept of a line, we know exactly which line we are talking about. No additional information is needed to graph the line. Therefore, m and b are the line’s parameters, because they uniquely identify the line.2 All lines have a lot in common but there is an infinite variety of lines because the parameters, the slope and the intercept, can take on the value of any real number. Each unique combination of parameter values (slope and intercept) will produce a unique line. So it is with probability distribution families. All family members are alike in many ways but they also differ because of different parameter values.

2 What about other mathematical functions? Do they have parameters? Yes! Most do! For example, in the equation for a parabola (y=ax^2+bx+c), a, b, and c determine its precise shape.

3 If we allow the lower bound to be any real number and the spacing to be any positive real number, the discrete uniform distribution can be specified by three parameters: the lower bound a, the spacing between numbers k (k>0), and the number of points n (n>1). The upper bound b of such a distribution would be b=a+k(n-1)

The discrete uniform distribution (i.e., the typical variety consisting of consecutive integers) is defined by the lower and upper bound. Once we know the lower bound and the upper bound, we know exactly which distribution we are talking about.3 Not all distributions are defined by their lower and upper bounds. Indeed, many distribution families are unbounded on one or both sides. Therefore, other features are used to characterize the distributions, such as the population mean (\mu).

8.5.3 Probability Mass Functions

Many distribution families are united by the fact that their probability distributions are generated by a particular mathematical function. For discrete distributions, those functions are called probability mass functions. In general, a mathematical function is an expression that takes one or more constants (i.e., parameters) and one or more input variables, which are then transformed according to some sort of rule to yield a single number.

A probability mass function is a mathematical expression that gives the probability that a discrete random variable will equal a particular element of the variable’s sample space.

A probability mass function transforms a random variable’s sample space elements into probabilities. In Figure 8.3, the probability mass function can be thought of as the arrows between the sample space and the probabilities. That is, the probability mass function is the thing that was done to the sample space elements to calculate the probabilities. In Figure 8.3, each outcome of a throw of the the die was mapped onto a probability of ⅙. Why ⅙, and not some other number? The probability mass function of the discrete uniform distribution tells us the answer.

Figure 8.5: Probability mass functions tell us how probable each sample space element is. That is, they are functions that convert samples spaces (\boldsymbol{x}) into probabilities (\boldsymbol{p}) according to specific parameters (\boldsymbol{\theta}).
% Probability mass functions tell us how probable each sample space element is

\documentclass[tikz = true, border = 2pt]{standalone}

\usepackage{amsmath}
\usepackage{tikz}
\usetikzlibrary{decorations.text, arrows}
\usetikzlibrary{shapes}
\usepackage{fontspec}
\setmainfont{Equity Text A}

\begin{document}
\begin{tikzpicture}[>=stealth,scale=0.9]
\definecolor{royalblue2}{RGB}{39,64,139};
\node [rectangle,
       draw,
       rounded corners= 2pt,
       text depth=0.25ex,
       minimum height = 7mm](ss) at (0,0) {$\boldsymbol{x}=x_1,x_2,x_3,\ldots,x_n$};
\node [rectangle,
       draw,
       rounded corners= 2pt,
       text depth=0.25ex,
       minimum height = 7mm](ps) at (7,0) {$\boldsymbol{p}=p_1,p_2,p_3,\ldots,p_n$};
\node [single arrow,
       fill=royalblue2,
       single arrow head extend=1.1ex,
       transform shape,
       minimum height=0.9cm,
       text depth=0.25ex, text=white] (fx) at (3.5,0) {$\quad\;\; f_X\quad\;\;$};
\node [text depth=2.25ex,
       text height= 5ex,
       anchor=south,
       yshift=-3.5ex](sst) at (0,0.75) {\textbf{Sample Space}};
\node [text depth=2.25ex,
       text height= 5ex,
       anchor=south,
       yshift=-3.5ex](pst) at (7,0.75) {\textbf{Probabilities}};
\node [shape=rectangle,
       text depth=2.25ex,
       color=royalblue2,
       align=center,
       text height= 5ex,
       anchor=south,
       yshift=-3.5ex](pmf) at (3.5,0.75) {\textbf{Probability}\\
    \textbf{Mass Function}};
\node [text depth=2.25ex,
       text height= 5ex,
       anchor=south,
       yshift=-3.5ex] (pt) at (3.5,-1.5) {\textbf{Parameters}};
\node [rectangle,
       draw,
       rounded corners = 2pt,
       text depth=0.25ex,
       minimum height = 7mm] (pts) at (3.5,-2.25) {$\boldsymbol{\theta}=\theta_1,\theta_2,\theta_3,\ldots,\theta_k$};
%\node [align=center,
%       shape=rectangle,
%       rounded corners = 3pt,
%       draw](formula) at (3.5,4) {\textbf{\Large{Scary~Math!}}\\
%   $f_X\!\left(\boldsymbol{x};\boldsymbol{\theta}\right)=\boldsymbol{p}$};
%\draw [rounded corners=5pt] (-2.5,-3) rectangle (9.5,2.65);
\node at (3.5,2) {$f_X\!\left(\boldsymbol{x};\boldsymbol{\theta}\right)=\boldsymbol{p}$};
\draw[->,>=latex', very thick] (3.5,-1.15) to (3.5,-0.5);
\end{tikzpicture}
\end{document}

The probability mass function of the discrete uniform distribution is fairly simple but the notation can be intimidating at first (Figure 8.5). By convention, a single random variable is denoted by a capital letter X. Any particular value of X in its sample space is represented by a lowercase x. In other words, X represents the variable in its totality whereas x is merely one value that X can take on. Confusing? Yes, statisticians work very hard to confuse us—and most of the time they succeed!

The probability mass function of random variable X is denoted by f_X(x). This looks strange at first. It means, “When random variable X generates a number, what is the probability that the outcome will be a particular value x?” That is, f_X(x)=P(X=x), where P means “What is the probability that…?” Thus, P(X=x) reads, “What is the probability that random variable X will generate a number equal to a particular value x?” So, f_X(7) reads, “When random variable X generates a number, what is the probability that the outcome will be 7?”

Most probability mass functions also have parameters, which are listed after a semi-colon. In the case of the discrete uniform distribution consisting of consecutive integers, the lower and upper bounds a and b are included in the function’s notation like so: f_X(x;a,b). This reads, “For random variable X with parameters a and b, what is the probability that the outcome will be x?” Some parameters can be derived from other parameters, as was the case with the number of points n in the sample space of a discrete uniform distribution: n=b-a+1. The probability for each outcome in the sample space is the same and there are n possible outcomes. Therefore, the probability associated with each outcome is \frac{1}{n}.

Putting all of this together, if a and b are integers and a<b, for all n integers x between a and b, inclusive:

\begin{aligned} f_X\left(x;a,b\right)&=\frac{1}{b-a+1}\\[2ex] &=\frac{1}{n} \end{aligned}

Symbol Meaning
X A random variable with a discrete uniform distribution
f_X The probability mass function of X
x Any particular member of the sample space of X
a The lower bound of the sample space
b The upper bound of the sample space
n b-a+1 (The number of points in the sample space)

You might notice that x is not needed to calculate the probability. Why? Because this is a uniform distribution. No matter which sample space element x we are talking about, the probability associated with it is always the same. In distributions that are not uniform, the position of x matters and thus influences the probability of its occurrence.

8.5.4 Cumulative Distribution Functions

The cumulative distribution function tells us where a sample space element ranks in a distribution. Whereas the probability mass function tells us the probability that a random variable will generate a particular number, the cumulative distribution function tells us the probability that a random variable will generate a particular number or less.

A cumulative distribution function is a mathematical expression that gives the probability that a random variable will equal a particular element of the variable’s sample space or less.

F_X(x) = P(X \le x)=p

Figure 8.6: The cumulative distribution function of the roll of a die is F_X(x)=\frac{x}{6}

The cumulative distribution function of the roll of a die (Figure 8.6) tells us that the probability of rolling at least a 4 is 4⁄6 (i.e., ⅔).

% CDF Dice

\documentclass[tikz = true, border = 2pt]{standalone}
\usepackage{tikz}
\usepackage{xfrac}
\usepackage{fontspec}
\setmainfont{Equity Text A}

\definecolor{firebrick}{RGB}{205,38,38}
\definecolor{royalblue}{RGB}{67,110,238}

\tikzset{
    dot hidden/.style={},
    line hidden/.style={},
    dot colour/.style={dot hidden/.append style={color=#1}},
    dot colour/.default=black,
    line colour/.style={line hidden/.append style={color=#1}},
    line colour/.default=black
}

\NewDocumentCommand{\drawdie}{O{}m}{
    \begin{tikzpicture}[x=1em,y=1em,radius=0.1,#1]
    \draw[rounded corners=2,line hidden] (0,0) rectangle (1,1);
    \ifodd#2
    \fill[dot hidden] (0.5,0.5) circle;
    \fi
    \ifnum#2>1
    \fill[dot hidden] (0.25,0.25) circle;
    \fill[dot hidden] (0.75,0.75) circle;
    \ifnum#2>3
    \fill[dot hidden] (0.25,0.75) circle;
    \fill[dot hidden] (0.75,0.25) circle;
    \ifnum#2>5
    \fill[dot hidden] (0.75,0.5) circle;
    \fill[dot hidden] (0.25,0.5) circle;
    \ifnum#2>7
    \fill[dot hidden] (0.5,0.75) circle;
    \fill[dot hidden] (0.5,0.25) circle;
    \fi
    \fi
    \fi
    \fi
    \end{tikzpicture}
}

\begin{document}

\begin{tikzpicture}
\foreach \i in {1,...,6} {
    \node at (0.2,\i){$\sfrac{\text{\i}}{\text{6}}$};
    \node at (\i,0.2){\large{\i}};
    \foreach \j in {1,...,6} {
        \ifnum \j>\i
        \node at (\i,\j) {\drawdie [scale=2]{\j}};
        \else
        \node at (\i,\j) {\drawdie [scale=2,line colour=royalblue,dot colour=royalblue]{\j}};
        \fi
    }
}

\draw [firebrick,
       very thick,
       rounded corners]
       (0.55,0.5)-- ++
       (0,1)-- ++
       (1,0)-- ++
       (0,1)-- ++
       (1,0)-- ++
       (0,1)-- ++
       (1,0)-- ++
       (0,1)-- ++
       (1,0)-- ++
       (0,1)-- ++
       (1,0)-- ++
       (0,1)-- ++
       (1,0)-- ++
       (0,-6)--cycle;
\node[rotate=90] at (-0.3,3.5) {{Probability}};
\node at (3.5,-0.2) {{Die roll is this value or less}};
\end{tikzpicture}

\end{document}

The cumulative distribution function is often distinguished from the probability mass function with a capital F instead of a lowercase f. In the case of a discrete uniform distribution consisting of n consecutive integers from a to b, the cumulative distribution function is:

\begin{align*} F_X(x;a,b)=\frac{x-a+1}{b-a+1}\\[2ex] =\frac{x-a+1}{n} \end{align*}

Symbol Meaning
X A random variable with a discrete uniform distribution
F_X The cumulative distribution function of X
x Any particular member of the sample space of X
a The lower bound of the sample space
b The upper bound of the sample space
n b-a+1 (The number of points in the sample space)

In the case of the the six-sided die, the cumulative distribution function is

\begin{aligned} F_X(x;a=1,b=6)&=\frac{x-a+1}{b-a+1}\\[2ex] &=\frac{x-1+1}{6-1+1}\\[2ex] &=\frac{x}{6} \end{aligned}

The cumulative distribution function is so-named because it adds all the probabilities in the probability mass function up to and including a particular member of the sample space. Figure 8.7 shows how the each probability in the cumulative distribution function of the roll of a six-sided die is the sum of the current and all previous probabilities in the probability mass function.

Figure 8.7: The cumulative distribution function is the sum of the current and all previous elements of the probability mass function.
# The cumulative distribution function is the sum of the current 
# and all previous elements of the probability mass function
p <- crossing(id = 1:6,
         x = 1:6) %>% 
  mutate(pmf = 1 / 6) %>% 
  mutate(cdf = ifelse(id < x, 1 / 6, x / 6)) %>% 
  ggplot(aes(x = x, y = cdf)) + 
  geom_segment(aes(yend = cdf - pmf, xend = x), 
               color = myfills[2]) + 
  geom_segment(aes(y = 0, yend = pmf, xend = x), 
               color = myfills[1]) + 
  geom_line(aes(y = pmf), lty = "dotted", color = myfills[1]) +
  geom_line(data = . %>% dplyr::filter(x <= id), 
            lty = "dotted", 
            color = myfills[2] ) +
  geom_point(aes(y = pmf),
             color = myfills[1],
             size = 5) +
  geom_point(data = . %>% dplyr::filter(x <= id), 
             color = myfills[2],
             size = 3.5) +
  scale_x_continuous("Sample Space", 
                     breaks = 1:6,
                     expand = c(0.03,0),
                     minor_breaks = NULL) +
  scale_y_continuous("Probability",
                     breaks = 0:6 / 6, 
                     labels = c(0, paste0(1:5, "/", 6),1),
                     expand = c(0.03,0),
                     minor_breaks = NULL) +
  theme_minimal(base_size = 18, base_family = bfont) +
  coord_fixed(6) + 
  annotate("point",
           size = 5, 
           x = 1.2,  
           y = 5.33 / 6, 
           color = myfills[1]) + 
  annotate("point",
           size = 3.5, 
           x = 1.2,  
           y = 5.66 / 6, 
           color = myfills[2])  + 
  annotate("label",
           size = 6, 
           x = 1.33,  
           y = 5.33 / 6, 
           color = myfills[1],
           label = "Probability Mass Function",
           hjust = 0,
           family = bfont, 
           label.padding = unit(0, "lines"), 
           label.size = 0
           ) + 
  annotate("label",
           size = 6, 
           x = 1.33,  
           y = 5.66 / 6, 
           color = myfills[2],
           label = "Cumulative Distribution Function",
           hjust = 0,
           family = bfont, 
           label.padding = unit(0, "lines"), 
           label.size = 0) +
  transition_states(id,2,1) +
  ease_aes("sine-in-out")

animate(p, 
        device = "svg",
        renderer = magick_renderer(), 
        width = 8,
        height = 8)

8.5.5 Quantile functions

The inverse of the cumulative distribution function is the quantile function. The cumulative distribution starts with a value x in the sample space and tells us p, the proportion of values in that distribution that are less than or equal to x. A quantile function starts with a proportion p and tells us the value x that splits the distribution such that the proportion p of the distribution is less than or equal to x.

A quantile function tells us which value in the sample space of a random variable is greater than a particular proportion of the values the random variable generates.
Figure 8.8: The quantile function is the inverse of the cumulative distribution function: Just flip the X and Y axes!

As seen in Figure 8.8, if you see a graph of a continuous distribution function, just flip the X and Y axes, and you have a graph of a quantile function.

# The quantile function is the inverse of the cumulative distribution function

d <- tibble(x = seq(-4, 4, 0.01)) %>%
  mutate(p = pnorm(x))

p1 <- ggplot(d, aes(x, p)) +
  geom_arrow(
    arrow_head = arrow_head_deltoid(),
    arrow_fins = arrow_head_deltoid(),
    color = myfills[1],
    lwd = 1
  ) +
  scale_x_continuous("Sample Space (*x*)", labels = \(x) signs::signs(x, accuracy = 1)) +
  scale_y_continuous("Proportion (*p*)", labels = prob_label) +
  theme_minimal(base_size = 26, base_family = bfont) +
  coord_fixed(8) +
  annotate(
    "richtext",
    x = -2,
    y = .5 + .25 / 4,
    label = paste0("*X* ~ ",
                   span_style("N", style = "font-family:'Lucida Calligraphy'"),
                   "(0, 1<sup>2</sup>)"),
    family = bfont,
    label.size = 0,
    label.padding = unit(0, "mm"),
    size = ggtext_size(26)
  ) +
  ggtitle("Cumulative Distribution Function *F<sub>X</sub>*(*x*) = *p*") +
  theme(
    plot.title = ggtext::element_markdown(size = 26 * 0.8), 
    plot.title.position = "plot",
    axis.title.x = ggtext::element_markdown(),
    axis.title.y = ggtext::element_markdown(),
    axis.text.x = element_text(hjust = c(0.7, 0.7, 0.5, 0.5, 0.5))
  )

p2 <- ggplot(d, aes(p, x)) +
  geom_arrow(
    arrow_head = arrow_head_deltoid(),
    arrow_fins = arrow_head_deltoid(),
    color = myfills[1],
    lwd = 1
  ) +
  scale_y_continuous("Sample Space (*x*)", labels = \(x) signs::signs(x, accuracy = 1)) +
  scale_x_continuous("Proportion (*p*)", labels = prob_label) +
  theme_minimal(base_size = 26, base_family = bfont) +
  coord_fixed(1 / 8) +
  annotate(
    "richtext",
    y = 0.5,
    x = .25,
    label = paste0("*X* ~ ",
                   span_style("N", style = "font-family:'Lucida Calligraphy'"),
                   "(0, 1<sup>2</sup>)"),
    family = bfont,
    label.size = 0,
    label.padding = unit(0, "mm"),
    size = ggtext_size(26)
  ) +
  ggtitle("Quantile Function *Q*<sub>*X*</sub>(*p*) = *x*") +
  theme(plot.title = ggtext::element_markdown(size = 26 * 0.8), 
        plot.title.position = "plot",
        axis.title.x = ggtext::element_markdown(),
        axis.title.y = ggtext::element_markdown())

p1 + p2

8.5.6 Generating a Random Sample in R

In R, the sample function generates numbers from the discrete uniform distribution.

# n = the sample size
n <- 6000
# a = the lower bound
a <- 1
# b = the upper bound
b <- 6
# The sample space is the sequence of integers from a to b
sample_space <- seq(a, b)
# X = the sample with a discrete uniform distribution
# The sample function selects n values
# from the sample space with replacement at random
X <- sample(sample_space, 
            size = n, 
            replace = TRUE)
Figure 8.9: Frequency distribution of a discrete uniform random variable from 1 to 6 (n = 6,000)
# Frequency distribution of a discrete uniform random variable from 1 to 6 

tibble(X = factor(X)) %>% 
  group_by(X) %>% 
  summarise(Frequency = n()) %>% 
  ggplot(aes(X,Frequency, fill = X)) + 
  geom_col(width = 0.7, fill = myfills[1]) + 
  geom_label(aes(label = Frequency), 
             vjust = -0.3, 
             label.size = 0,
             label.padding = unit(0,"mm"), 
             family = bfont,
             size = 8,
             color = "gray30",
             fill = "white") + 
  theme_minimal(base_size = 28, 
                base_family = bfont) + 
  scale_y_continuous("Count", 
                     expand = expansion(c(0,0.075)), 
                     breaks = seq(0,1000,200)) + 
  scale_x_discrete(NULL) +
  theme(panel.grid.major.x = element_blank(), 
        legend.position = "none") 

The frequencies of the random sample can be seen in Figure 8.9. Because of sampling error, the frequencies are approximately the same, but not exactly the same. If the sample is larger, the sampling error is smaller, meaning that the sample’s characteristics will tend to more closely resemble the population characteristics. In this case, a larger sample size will produce frequency counts that will appear more even in their magnitude. However, as long as the sample is smaller than the population, sampling error will always be present. With random distributions, the population is assumed to be infinitely large, and thus sampling error at best becomes negligibly small.

Samples imperfectly represent the population from which they are drawn. Sampling error refers to differences between sample statistics and population parameters.

8.5.7 Bernoulli Distributions

Feature Symbol
Sample Space: x \in \{0,1\}
Probability that x=1 p \in {[0,1]}
Probability that x=0 q = 1 - p
Mean \mu = p
Variance \sigma^2 = pq
Skewness \gamma_1 = \frac{1 - 2p}{\sqrt{pq}}
Kurtosis \gamma_2 = \frac{1}{pq} - 6
Probability Mass Function f_X(x;p) = p^xq^{1 - x}
Cumulative Distribution Function F_X(x;p) = x+p(1 - x)
Table 8.2: Features of Bernoulli Distributions
Notation note: Whereas {a,b} is the set of just two numbers, a and b, [a,b] is the set of all real numbers between a and b.

The toss of a single coin has the simplest probability distribution that I can think of—there are only two outcomes and each outcome is equally probable (Figure 8.10). This is a special case of the Bernoulli distribution. Jakob Bernoulli (Figure 8.11) was a famous mathematician from a famous family of mathematicians. The Bernoulli distribution is just one of the ideas that made Jakob and the other Bernoullis famous.

In the Bernoulli distribution, there are only two outcomes: a “success” (1) and a “failure” (0). If a success has a probability p then a failure has a probability of q = 1 - p.
% Coin toss Bernoulli
\documentclass[tikz = true, border = 2pt]{standalone}
\usepackage{fontspec}
\setmainfont{Equity Text A}
\usepackage{tikz}
\usetikzlibrary{shapes}
\usepackage{xfrac}

\begin{document}
        \begin{tikzpicture}[scale=0.9]
        \node (H) at (0,2) {
            \includegraphics [width=1.5cm]{../images/QuarterHeads.png}
        };
        \node (T) at (0,0) {
            \includegraphics [width=1.5cm]{../images/QuarterTails.png}
        };
        \node [fill=gray!50,
               minimum height=1.5cm,
               minimum width=0.1cm,
               single arrow,
               single arrow head extend=.15cm,
               single arrow head indent=.08cm,
               inner sep=1mm] (arrowtails1) at (1.9,2) {};
        \node [fill=gray!50,
               minimum height=1.5cm,
               minimum width=0.1cm,
               single arrow,
               single arrow head extend=.15cm,
               single arrow head indent=.08cm,
               inner sep=1mm] (arrowheads2) at (1.9,0) {};
        \node  (p1) at (3.4,2) {\huge{$\sfrac{\text{1}}{\text{2}}$}};
        \node  (p2) at (3.4,0) {\huge{$\sfrac{\text{1}}{\text{2}}$}};
        \node [text centered,
               anchor=south,
               text height=1.5ex,
               text depth=.25ex] (p3) at (0,3) {\large{Sample Space}};
        \node [text centered,
               anchor=south,
               text height=1.5ex,
               text depth=.25ex] (p4) at (3.4,3) {\large{Probability}};
        \end{tikzpicture}

\end{document}
Figure 8.10: The probability distribution of a coin toss
Figure 8.11: Jakob Bernoulli (1654–1705)
Image Credits

The Bernoulli distribution can describe any random variable that has two outcomes, one of which has a probability p and the other has a probability q=1-p. In the case of a coin flip, p=0.5. For other variables with a Bernoulli distribution, p can range from 0 to 1.

In psychological assessment, many of the variables we encounter have a Bernoulli distribution. In ability test items in which there is no partial credit, examinees either succeed or fail. The probability of success on an item (in the whole population) is p. In other words, p is the proportion of the entire population that correctly answers the question. Some ability test items are very easy and the probability of success is high. In such cases, p is close to 1. When p is close to 0, few people succeed and items are deemed hard. Thus, in the context of ability testing, p is called the difficulty parameter. This is confusing because when p is high, the item is easy, not difficult. Many people have suggested that it would make more sense to call it the “easiness parameter” but the idea has never caught on.

The difficulty parameter is the proportion of people who succeed on an ability item or endorse a yes/no questionnaire item.

True/False and Yes/No items on questionnaires also have Bernoulli distributions. If an item is frequently endorsed as true (“I like ice cream.”), p is high. If an item is infrequently endorsed (“I like black licorice and mayonnaise in my ice cream.”), p is very low. Oddly, the language of ability tests prevails even here. Frequently endorsed questionnaire items are referred to as “easy” and infrequently endorsed items are referred to as “difficult,” even though there is nothing particularly easy or difficult about answering them either way.

8.5.7.1 Generating a Random Sample from the Bernoulli Distribution

Figure 8.12: Counts of a Random Variable with a Bernoulli Distribution (p = 0.8, n = 1000)

In R, there is no specialized function for the Bernoulli distribution because it turns out that the Bernoulli distribution is a special case of the binomial distribution, which will be described in the next section. With the function rbinom, we can generate data with a Bernoulli distribution by setting the size parameter equal to 1.

# n = sample size
n <- 1000
# p = probability
p <- 0.8
# X = sample
X <- rbinom(n, size = 1, prob = p)
# Make a basic plot
barplot(table(X))

In Figure 8.12, we can see that the random variable generated a sequence that consists of about 80% ones and 20% zeroes. However, because of sampling error, the results are rarely exactly what the population parameter specifies.

# Counts of a Random Variable with a Bernoulli Distribution
set.seed(4)
# n = sample size
n <- 1000
# p = probability
p <- 0.8
# X = sample
X <- rbinom(n, size = 1, prob = p)
# Make a basic plot
ggplot(tibble(X = factor(X)), aes(X)) +
  geom_bar(fill = myfills[1]) +
  scale_x_discrete(NULL) + 
  scale_y_continuous(NULL, expand = expansion(c(0.01,0.1))) +
  geom_text(aes(label = after_stat(count)), 
            stat = "count", 
            vjust = -0.4, 
            size = ggtext_size(18),
            family = bfont,
            color = "gray10") + 
    theme_minimal(18, bfont) + 
  theme(panel.grid.major.x = element_blank())

8.5.8 Binomial Distributions

Feature Symbol
Number of Trials n \in \{1,2,3,\ldots\}
Sample Space x \in \{0,...,n\}
Probability of success in each trial p \in [0,1]
Probability of failure in each trial q = 1 - p
Mean \mu = np
Variance \sigma = npq
Skewness \gamma_1 = \frac{1-2p}{\sqrt{npq}}
Kurtosis \gamma_2 = \frac{1}{npq} - \frac{6}{n}
Probability Mass Function f_X(x;n,p)=\frac{n!}{x!\left(n-x\right)!}p^x q^{n-x}
Cumulative Distribution Function F_X(x;n,p)=\sum_{i=0}^{x}{\frac{n!}{i!(n-i)!} p^i q^{n-i}}
Table 8.3: Features of Binomial Distributions
Figure 8.13: Probability distribution of the number of heads observed when two coins are tossed

Let’s extend the idea of coin tosses and see where it leads. Imagine that two coins are tossed at the same time and we count how many heads there are. The outcome we might observe will be zero, one, or two heads. Thus, the sample space for the outcome of the tossing of two coins is the set \{0,1,2\} heads. There is only one way that we will observe no heads (both coins tails) and only one way that we will observe two heads (both coins heads). In contrast, as seen in Figure 8.13, there are two ways that we can observe one head (heads-tails & tails-heads).

% Probability distribution of the number of heads observed when two coins are tossed
\documentclass[tikz = true,border = 2pt]{standalone}
\usepackage{tikz}
\usepackage{fontspec}
\setmainfont{Equity Text A}
\definecolor[named]{fillColor}{RGB}{39,64,139}
\begin{document}
\begin{tikzpicture}[x=1pt,
                      y=1pt,
                      xscale=.4,
                      yscale=.66,
                      axisline/.style={
                        draw=black!70,
                        line width= 0.4pt,
                        line join=round,
                        line cap=round},
                      axislabel/.style={
                        text=black!70,
                        inner sep=0pt,
                        font=\footnotesize,
                        outer sep=0pt,
                        anchor=east},
                      xaxislabel/.style={
                        text=black!70,
                        inner sep=0pt,
                        font=\footnotesize,
                        outer sep=0pt},
                       bar/.style={
                        draw=white,
                        line width= 0.4pt,
                        line join=round,
                        line cap=round,
                        fill=fillColor,
                        rounded corners = 1.5pt}                ]

% X-axis
\node[xaxislabel] at (100, 42) {0};
\node[xaxislabel] at (200, 42) {1};
\node[xaxislabel] at (300, 42) {2};
\node[black!70] at (204.07, 28) {Number of Heads};

% Y-axis
\path[axisline] ( 42, 50) -- ( 42,250);
\path[axisline] ( 38, 50) -- ( 42, 50);
\path[axisline] ( 38,150) -- ( 42,150);
\path[axisline] ( 38,250) -- ( 42,250);
\node[axislabel] at ( 36, 50) {0};
\node[axislabel] at ( 36,150) {.25};
\node[axislabel] at ( 36,250) {.50};

% Bars
\path[bar] (50, 50.00) rectangle (150,150);
\path[bar] (150, 50.00) rectangle (250,250);
\path[bar] (250, 50.00) rectangle (350,150);
\node at (100,  80) {\includegraphics [width=36pt]{../images/QuarterTails.png}};
\node at (100, 120) {\includegraphics [width=36pt]{../images/QuarterTails.png}};
\node at (200,  80) {\includegraphics [width=36pt]{../images/QuarterHeads.png}};
\node at (200, 120) {\includegraphics [width=36pt]{../images/QuarterTails.png}};
\node at (200, 180) {\includegraphics [width=36pt]{../images/QuarterTails.png}};
\node at (200, 220) {\includegraphics [width=36pt]{../images/QuarterHeads.png}};
\node at (300,  80) {\includegraphics [width=36pt]{../images/QuarterHeads.png}};
\node at (300, 120) {\includegraphics [width=36pt]{../images/QuarterHeads.png}};

\end{tikzpicture}
\end{document}

The probability distribution of the number of heads observed when two coins are tossed at the same time is a member of the binomial distribution family. The binomial distribution occurs when independent random variables with the same Bernoulli distribution are added together. In fact, Bernoulli discovered the binomial distribution as well as the Bernoulli distribution.

Two random variable are said to be independent if the outcome of one variable does not alter the probability of any outcome in the other variable.

Imagine that a die is rolled 10 times and we count how often a 6 occurs.4 Each roll of the die is called a trial. The sample space of this random variable is \{0,1,2,...,10\}. What is the probability that a 6 will occur 5 times? or 1 time? or not at all? Such questions are answered by the binomial distribution’s probability mass function:

Every time a random variable generates a number, that instance of the variable is called a trial, which is also known as an experiment.

4 Wait! Hold on! I thought that throwing dice resulted in a (discrete) uniform distribution. Well, it still does. However, now we are asking a different question. We are only concerned with two outcomes each time the die is thrown: 6 and not 6. This is a Bernoulli distribution, not a uniform distribution, because the probability of the two events is unequal: {⅙,⅚}

f_X(x;n,p)=\frac{n!}{x!\left(n-x\right)!}p^x\left(1-p\right)^{n-x}

Symbol Meaning
X The random variable (the number of sixes from 10 throws of the die)
x Any particular member of the sample space (i.e., x \in \{0,1,2,...,10\})
n The number of times that the die is thrown (i.e., n=10)
p The probability that a six will occur on a single throw of the die (i.e., p=\frac{1}{6})
Figure 8.14: The probability distribution of the number of sixes observed when a six-sided die is thrown 10 times.

Because n=10 and p=\frac{1}{6}, the probability mass function simplifies to:

\begin{equation*} f_X(x)=\frac{n!}{x!\left(n-x\right)!}\left(\frac{1}{6}\right)^x\left(\frac{5}{6}\right)^{10-x} \end{equation*}

If we take each element x of the sample space from 0 to 10 and plug it into the equation above, the probability distribution will look like Figure 8.14.

# Probability of sixes

tibble(sample_space = 0:10) %>% 
  mutate(
    probability = dbinom(
      sample_space, 
      size = max(sample_space), 
      prob = 1 / 6),
    probabiltiy_label = prob_label(probability, 
                                   digits = 2)) %>% 
  ggplot(aes(factor(sample_space), probability)) +
  geom_col(fill = myfills[1]) + 
  ggtext::geom_richtext(
    aes(label = probabiltiy_label),
    size = ggtext_size(30),
    angle = 90, 
    hjust = 0,
    family = bfont, 
    label.margin = unit(c(0,0,0,1), "mm"),
    label.padding = unit(c(1.6,0,0,0),"mm"), 
    label.colour = NA,
    color = "gray40") +
  theme_minimal(base_family = bfont, base_size = 30) + 
  scale_y_continuous("Probability", 
                     expand = expansion(c(0,.09)), 
                     labels = prob_label) + 
  scale_x_discrete("Sample Space (Number of Sixes)") + 
  theme(panel.grid.major.x = element_blank()) 

8.5.8.1 Clinical Applications of the Binomial Distribution

When would a binomial distribution be used by a clinician? One particularly important use of the binomial distribution is in the detection of malingering. Sometimes people pretend to have memory loss or attention problems in order to win a lawsuit or collect insurance benefits. There are a number of ways to detect malingering but a common method is to give a very easy test of memory in which the person has at least a 50% chance of getting each test item correct even if the person guesses randomly.

A person who malingers is pretending to be sick to avoid work or some other responsibility.

Suppose that there are 20 questions. Even if a person has the worst memory possible, that person is likely to get about half the questions correct. However, it is possible for someone with a legitimate memory problem to guess randomly and by bad luck answer fewer than half of the questions correctly. Suppose that a person gets 4 questions correct. How likely is it that a person would, by random guessing, only answer 4 or fewer questions correctly?

We can use the binomial distribution’s cumulative distribution function. However, doing so by hand is rather tedious. Using R, the answer is found with the pbinom function:

p <- pbinom(4,20,0.5)

We can see that the probability of randomly guessing and getting 4 or fewer items correct out of 20 items total is approximately 0.006, which is so low that the hypothesis that the person is malingering seems plausible. Note here that there is a big difference between these two questions:

  • If the person is guessing at random (i.e., not malingering), what is the probability of answering correctly 4 questions or fewer out of 20?
  • If the person answers 4 out of 20 questions correctly, what is the probability that the person is guessing at random (and therefore not malingering)?

Here we answer only the first question. It is an important question, but the answer to the second question is probably the one that we really want to know. We will answer it in another chapter when we discuss positive predictive power. For now, we should just remember that the questions are different and that the answers can be quite different.

8.5.8.2 Graphing the binomial distribution

Suppose that there are n=10 trials, each of which have a probability of p=0.8. The sample space is the sequence of integers from 0 to 10, which can be generated with the seq function (i.e., seq(0,10)) or with the colon operator 0:10. First, the sample space is generated (a sequence from 0 to 10.), using the seq function. The associated probability mass function probabilities are found using the dbinom function. The cumulative distribution function probabilities are found using the pbinom function.

# Make a sequence of numbers from 0 to 10
SampleSpace <- seq(0, 10)
# Probability mass distribution for 
# binomial distribution (n = 10, p = 0.8)
pmfBinomial <- dbinom(SampleSpace, 
                      size = 10, 
                      prob = 0.8)
# Generate a basic plot of the 
# probability mass distribution
plot(pmfBinomial ~ SampleSpace, 
     type = "b")
# Cumulative distribution function 
# for binomial distribution (n = 10, p = 0.8)
cdfBinomial <- pbinom(SampleSpace, 
                      size = 10, 
                      prob = 0.8)
Figure 8.15: Basic plot of a binomial probability mass function
# Cumulative distribution function 
# for binomial distribution (n = 10, p = 0.8)
cdfBinomial <- pbinom(SampleSpace, 
                      size = 10, 
                      prob = 0.8)
# Generate a basic plot of the
# binomial cumulative distribution function
plot(cdfBinomial ~ SampleSpace, 
     type = "b")
Figure 8.16: Basic plot of a binomial cumulative distribution function

However, making the graph look professional involves quite a bit of code that can look daunting at first. However, the results are often worth the effort. Try running the code below to see the difference. For presentation-worthy graphics, export the graph to the .pdf or .svg format. An .svg file can be imported directly into MS Word or MS PowerPoint.

Figure 8.17: Probability Mass Function and Cumulative Distribution Function of the Binomial Distribution (n = 10,~p = 0.8)
# Probability Mass Function and Cumulative Distribution 
# Function of the Binomial Distribution
tibble(SampleSpace = 0:10,
       pmf = dbinom(SampleSpace, 10, 0.8),
       cdf = pbinom(SampleSpace, 10, 0.8)) %>%
  pivot_longer(-SampleSpace, 
               names_to = "Function", 
               values_to = "Proportion") %>% 
  mutate(Function = factor(Function, 
                           levels = c("pmf", "cdf")) ) %>%
  arrange(desc(Function)) %>%
  ggplot(aes(x = SampleSpace,
             y = Proportion,
             color = Function)) +
  geom_pointline(lty = "dotted", distance = unit(3, "mm")) +
  geom_point(aes(size = Function)) +
  theme_minimal(base_family = "serif",
                base_size = 18) +
  scale_color_manual(values = myfills) +
  scale_x_continuous("Sample Space",
                     breaks = seq(0, 10, 2),
                     expand = expansion(0.02)) +
  scale_y_continuous(expand = expansion(0.02),
                     labels = prob_label) +
  scale_size_manual(values = c(3, 4.5)) +
  theme(legend.position = "none") +
  annotate(
    x = 10 - 0.16,
    y = dbinom(10, 10, 0.8),
    geom = "label",
    label = "Probability Mass Function",
    hjust = 1,
    size = 4.75,
    color = myfills[1],
    label.size = 0,
    family = "serif",
    label.padding = unit(0, "mm")
  ) +
  annotate(
    x = 10 - 0.2,
    y = 1,
    geom = "label",
    label = "Cumulative Distribution Function",
    hjust = 1,
    size = 4.75,
    color = myfills[2],
    label.size = 0,
    family = "serif",
    label.padding = unit(0, "mm")
  )

8.5.9 Poisson Distributions

Feature Symbol
Parameter \lambda \in (0,\infty)
Sample Space x\in \{0,1,2,\ldots\}
Mean \mu = \lambda
Variance \sigma^2 = \lambda
Skewness \gamma_1 = \frac{1}{\sqrt{\lambda}}
Kurtosis \gamma_2 = \frac{1}{\lambda}
Probability Mass Function f_X(x;\lambda) = \frac{\lambda^x}{e^{\lambda} x!}
Cumulative Distribution Function F_X(x;\lambda) = \sum_{i=0}^{x}{\frac{\lambda^i}{e^{\lambda} i!}}
Table 8.4: Features of Poisson Distributions
Notation note: The notation (0,\infty) means all real numbers greater than 0.
Figure 8.18: Siméon Denis Poisson (1781–1840)
Image Credits

Imagine that an event happens sporadically at random and we measure how often it occurs in regular time intervals (e.g., events per hour). Sometimes the event does not occur in the interval, sometimes just once, and sometimes more than once. However, we notice that over many intervals, the average number of events is constant. The distribution of the number of events in each interval will follow a Poisson distribution. Although “Poisson” means “fish” in French, fish have nothing to do with it. This distribution was named after Siméon Denis Poisson, whose work on the distribution made it famous.

The Poisson distribution is a discrete distribution used to model how often an event will occur during a particular interval of time.

The Poisson distribution has a single parameter \lambda, the average number of events per time interval. Interestingly, \lambda is both the mean and the variance of this distribution. The distribution shape will differ depending on how long our interval is. If an event occurs on average 30 times per hour, \lambda = 30. If we count how often the event occurs in 10-minute intervals, the same event will occur about 5 times per interval, on average (i.e., \lambda = 1). If we choose to count how often the same event occurs every minute, then \lambda = 0.5.

Figure 8.19: The shape of the Poisson Distribution depends on the interval used for counting events. Here, the event occurs once per minute, on average.
# The shape of the Poisson Distribution
crossing(lambda = c(0.5, 5, 30), x = 0:60) %>%
  mutate(p = dpois(x, lambda)) %>%
  mutate(lambda = factor(lambda, labels = paste0(
    "Every ",
    c("30 Seconds", "5 Minutes", "Half Hour"),
    " (\u03BB = ",
    c(0.5, 5, 30),
    ")"
  ))) %>%
  ggplot(aes(x, p, color = lambda)) +
  geom_line(linetype = 3, linewidth = 0.1) +
  geom_point(size = 2.5, color = "white") +
  geom_point(size = 1) +
  facet_grid(lambda ~ ., scales = "free") +
  scale_x_continuous("Number of Events", breaks = seq(0, 120, 20)) +
  scale_color_manual(values = c(myfills[1], "darkorchid", myfills[2])) +
  theme_light(base_size = 20, base_family = bfont) +
  theme(legend.position = "none") +
  scale_y_continuous("Probability", labels = prob_label)

8.5.9.1 A clinical application of the the Poisson distribution

Suppose that you begin treating an adult male client who has panic attacks that come at unpredictable times. Some weeks there are no panic attacks and some weeks there are many, but on average he has 2 panic attacks each week. The client knows this because he has kept detailed records in a spreadsheet for the last 5 years. The client had sought treatment once before, but terminated early and abruptly because, according to him, “It wasn’t working.” After sensitive querying, you discover that he expected that treatment should have quickly reduced the frequency of panic attacks to zero. When that did not happen, he became discouraged and stopped the treatment.

Because your client is well educated and quantitatively inclined, you decide to to use the data he has collected as part of the intervention and also to help set a more realistic set of expectations. Obviously, you and your client both would prefer 0 panic attacks per week, but sometimes it takes more time to get to the final goal. We do not want to terminate treatment that is working just because the final goal has not yet been achieved.

You plot the frequency of how often he had 0 panic attacks in a week, 1 panic attack in a week, 2 panic attacks in a week, and so forth, as shown in red in Figure 8.20. Because you have read this book, you immediately recognize that this is a Poisson distribution with \lambda = 2. When you graph an actual Poison distribution and compare it with your client’s data, you see that it is almost a perfect match.5 Then you explain that although the goal is permanent cessation of the panic attacks, sometimes an intervention can be considered successful if the frequency of panic attacks is merely reduced. For example, suppose that in the early stages of treatment the frequency of panic attacks were reduced from twice per week to once every other week (\lambda = 0.5), on average. If such a reduction were achieved, there would still be weeks in which two or more panic attacks occur. According to Figure 8.23, this will occur about 9% of the time.

5 Note that I am not claiming that all clients’ panic attack frequencies have this kind of distribution. It just so happens to apply in this instance.

Figure 8.20: The variability of a hypothetical client’s panic attack frequency before and after treatment
# The variability of a hypothetical 
# client's panic attack frequency
d_label <- tibble(
  x = c(0, 2),
  Time = c("After", "Before"),
  Proportion = dpois(x, lambda = c(0.5, 2)),
  Label = c("After Treatment: \u03BB = 0.5",
            "Before Treatment: \u03BB = 2")
)

tibble(
  x = seq(0, 7),
  Before = dpois(x, lambda = 2),
  After = dpois(x, lambda = 0.5)
) %>%
  gather(key = Time, value = Proportion,-x) %>%
  ggplot(aes(x, Proportion, color = Time)) +
  geom_pointline(linetype = "dotted", linesize = .5, size = 3, distance = unit(2, "mm")) +
  # geom_point(size = 3) +
  geom_label(
    data = d_label,
    aes(label = Label),
    hjust = 0,
    nudge_x = 0.2,
    family = bfont,
    label.padding = unit(0, "lines"),
    label.size = 0,
    size = 6
  ) +
  scale_color_manual(values = myfills) +
  scale_x_continuous("Panic Attacks Per Week",
                     breaks = 0:7,
                     minor_breaks = NULL) +
  scale_y_continuous(labels = . %>% prob_label(., 0.1)) +
  theme_minimal(base_size = 20, base_family = bfont) +
  theme(legend.position = "none")

In R, you can use the dpois function to plot the Poisson probability mass function. For example, if the average number of events per time period is λ = 2, then the probability that there will be 0 events is dpois(x = 0, lambda = 2), which evaluates to 0.1353.

To calculate the cumulative distribution function of Poisson distribution in R, use the ppois function. For example, if we want to estimate the probability of having 4 panic attacks or more in a week if λ = 2, we must subtract the probability of having 3 panic attacks or less from 1, like so:

1 - ppois(q = 3, lambda = 2)

p = 0.143

Here is a simple way to plot the probability mass function and the cumulative distribution function using the dpois and ppois functions:

# Make a sequence of integers from 0 to 7
PanicAttacks <- seq(0, 7)

# Generate the probability mass function with lambda = 2
Probability <- dpois(PanicAttacks, 2)

# Basic plot of the Poisson 
# distribution's probability mass function
plot(Probability ~ PanicAttacks, type = "b") 
Figure 8.21: Poisson Probability Mass Function (\lambda=2)
# Generate the cumulative 
# distribution function with lambda = 2
CumulativeProbability <- ppois(PanicAttacks, 2)

# Basic plot of the Poisson distribution's 
# cumulative distribution function
plot(CumulativeProbability ~ PanicAttacks, type = "b") 
Figure 8.22: Poisson Cumulative Distribution Function (\lambda=2)

With an additional series with \lambda = 0.5, the plot can look like Figure 8.23.

Figure 8.23: The cumulative distribution function of a hypothetical client’s panic attack frequency before and after treatment
# The cumulative distribution function 
# of a hypothetical client's
# panic attack frequency before 
# and after treatment

d_label <- tibble(
  x = c(0, 0),
  Time = c("After", "Before"),
  Proportion = ppois(x, lambda = c(0.5, 2)),
  Label = c("After Treatment: *&lambda;* = 0.5",
            "Before Treatment: *&lambda;* = 2")
)

tibble(
  x = seq(0, 7),
  Before = ppois(x, lambda = 2),
  After = ppois(x, lambda = 0.5)
) %>%
  gather(key = Time, value = Proportion,-x) %>%
  ggplot(aes(x, Proportion, color = Time)) +
  geom_pointline(linetype = "dotted", linesize = .5, size = 3, distance = unit(2, "mm")) +
  geom_richtext(
    data = d_label,
    aes(label = Label),
    hjust = 0,
    nudge_x = 0.2,
    family = bfont,
    label.padding = unit(0, "lines"),
    label.size = 0,
    size = ggtext_size(20)
  ) +
  scale_color_manual(values = myfills) +
  scale_y_continuous(breaks = seq(0, 1, 0.2), 
                     limits = c(0, 1),
                     labels = . %>% prob_label(., 0.1)) +
  scale_x_continuous("Panic Attacks Per Week",
                     breaks = 0:7,
                     minor_breaks = NULL) +
  theme_minimal(base_size = 20, base_family = bfont) +
  theme(legend.position = "none")

8.5.10 Geometric Distributions

Feature Symbol
Probability of success in each trial p\in[0,1]
Sample Space x \in \{1,2,3,\ldots\}
Mean \mu = \frac{1}{p}
Variance \sigma^2 = \frac{1-p}{p^2}
Skewness \gamma_1 = \frac{2-p}{\sqrt{1-p}}
Kurtosis \gamma_2 = 6 + \frac{p^2}{1-p}
Probability Mass Function f_X(x;p) = (1-p)^{x-1}p^x
Cumulative Distribution Function F_X(x;p) = 1-(1-p)^x
Table 8.5: Features of Geometric Distributions

Atul Gawande (2007, pp. 219–223) tells a marvelous anecdote about how a doctor used some statistics to help a young patient with cystic fibrosis to return to taking her medication more regularly. Because the story is full of pathos and masterfully told, I will not repeat a clumsy version of it here. However, unlike Gawande, I will show how the doctor’s statistics were calculated.

Gawande, A. (2007). Better: A surgeon’s notes on performance. Metropolitan Books.

According to the story, if a patient fails to take medication, the probability that a person with cystic fibrosis will develop a bad lung illness on any particular day is .005. If medication is taken, the risk is .0005. Although these probabilities are both close to zero, over the the course of a year, they result in very different levels of risk. Off medication, the patient has about an 84% chance of getting sick within a year’s time. On medication, the patient’s risk falls to 17%. As seen in Figure 8.24, the cumulative risk over the course of 10 years is quite different. Without medication, the probability of becoming seriously ill within 10 years at least once is almost certain. With medication, however, a small but substantial percentage (~16%) of patients will go at least 10 years without becoming ill. 

Figure 8.24: The cumulative risk of serious lung disease with and without medication
# The cumulative risk of serious lung 
# disease with and without medication

total_years <- 10

tibble(Days = seq(0, total_years * 365, 10),
       WithoutMeds = pgeom(Days, 0.005),
       WithMeds = pgeom(Days, 0.0005)) %>% 
  gather(Meds, p, -Days) %>% 
  mutate(Years = Days / 365) %>% 
  ggplot(aes(Years, p, color = Meds)) + 
  geom_line(linewidth = 1) + 
  theme_minimal(base_size = 18, base_family = bfont) + 
  theme(legend.position = "none") +
  scale_x_continuous(breaks = seq(0,total_years,2)) + 
  scale_y_continuous("Cumulative Risk", breaks = seq(0,1,0.2),
                     labels = . %>% prob_label(., 0.1)) +
  scale_color_manual(values = myfills) +
  coord_fixed(ratio = 10) +
  annotate(x = 4, y = 0.93, 
           label = "Without Medication\nDaily Risk = .005",
           geom = "label", 
           color = myfills[2],
           label.padding = unit(0,"lines"),
           label.size = 0,
           family = bfont,
           size = 5.2
           ) +
  annotate(x = 6.25, y = 0.535, 
           label = "With Medication\nDaily Risk = .0005",
           geom = "label", 
           color = myfills[1],
           label.padding = unit(0,"lines"),
           label.size = 0,
           family = bfont,
           size = 5.2
           )

Such calculations make use of the geometric distribution. Consider a series of Bernoulli trials in which an event has a probability p of occurring on any particular trial. The probability mass function of the geometric distribution will tell us the probability that the xth trial will be the first time the event occurs.

f_X(x;p)=(1-p)^{x-1}p^x

Symbol Meaning
X A random variable with a geometric distribution
f_X The probability mass function of X
x The number of Bernoulli trials on which the event first occurs
p The probability of an event occurring on a single Bernoulli trial

In R, the probability mass function of the geometric distribution is calculated with the dgeom function:

# Make a sequence of integers from 1 to 10
x <- seq(1, 10)

# Generate the probability mass 
# function with p = 0.6
Probability <- dgeom(x, prob = 0.6)

# Basic plot of the geometric 
# distribution's probability mass function
plot(Probability ~ x, type = "b") 
Figure 8.25: Geometric Probability Mass Function (p=.6)

The cumulative distribution function of the geometric distribution was used to create Figure 8.24. It tells us the probability that the event will occur on the x^{th} trial or earlier:

F_X(x;p)=1-(1-p)^x

In R, the cumulative distribution function of the geometric distribution uses the pgeom function:

# Generate the cumulative 
# distribution function with p = 0.6
CumulativeProbability <- pgeom(x, prob = 0.6)

# Basic plot of the geometric
# distribution's cumulative distribution function
plot(CumulativeProbability ~ x, type = "b") 
Figure 8.26: Geometric Cumulative Distribution Function (p=.6)

8.6 Continuous Distributions

8.6.1 Probability Density Functions

Although there are many more discrete distribution families, we will now consider some continuous distribution families. Most of what we have learned about discrete distributions applies to continuous distributions. However, there is a need of a name change for the probability mass function. In a discrete distribution, we can calculate an actual probability for a particular value in the sample space. In continuous distributions, doing so can be tricky. We can always calculate the probability that a score in a particular interval will occur. However, in continuous distributions, the intervals can become very small, approaching a width of 0. When that happens, the probability associated with that interval also approaches 0. Yet, some parts of the distribution are more probable than others. Therefore, we need a measure of probability that tells us the probability of a value relative to other values: the probability density function

The probability density function is function that can show relative likelihoods of sample space elements of a continuous random variable.

Considering the entire sample space of a discrete distribution, all of the associated probabilities from the probability mass function sum to 1. In a probability density function, it is the area under the curve that must sum to 1. That is, there is a 100% probability that a value generated by the random variable will be somewhere under the curve. There is nowhere else for it to go!

However, unlike probability mass functions, probability density functions do not generate probabilities. Remember, the probability of any value in the sample space of a continuous variable is infinitesimal. We can only compare the probabilities to each other. To see this, compare the discrete uniform distribution and continuous uniform distribution in Figure 8.4. Both distributions range from 1 to 4. In the discrete distribution, there are 4 points, each with a probability of ¼. It is easy to see that these 4 probabilities of ¼ sum to 1. Because of the scale of the figure, it is not easy to see exactly how high the probability density function is in the continuous distribution. It happens to be ⅓. Why? First, it does not mean that each value has a ⅓ probability. There are an infinite number of points between 1 and 4 and it would be absurd if each of them had a ⅓ probability. The distance between 1 and 4 is 3. In order for the rectangle to have an area of 1, its height must be ⅓. What does that ⅓ mean, then? In the case of a single value in the sample space, it does not mean much at all. It is simply a value that we can compare to other values in the sample space. It could be scaled to any value, but for the sake of convenience it is scaled such that the area under the curve is 1.

Note that some probability density functions can produce values greater than 1. If the range of a continuous uniform distribution is less than 1, at least some portions of the curve must be greater than 1 to make the area under the curve equal 1. For example, if the bounds of a continuous distribution are 0 and ⅓, the average height of the probability density function would need to be 3 so that the total area is equal to 1.

8.6.2 Continuous Uniform Distributions

Feature Symbol
Lower Bound a \in (-\infty,\infty)
Upper Bound b \in (a,\infty)
Sample Space x \in \lbrack a,b\rbrack
Mean \mu = \frac{a+b}{2}
Variance \sigma^2 = \frac{(b-a)^2-1}{12}
Skewness \gamma_1 = 0
Kurtosis \gamma_2 = -\frac{6}{5}
Probability Density Function f_X(x;a,b) = \frac{1}{b-a}
Cumulative Distribution Function F_X(x;a,b) = \frac{x-a}{b-a}
Table 8.6: Features of Continuous Discrete Distributions

Unlike the discrete uniform distribution, the uniform distribution is continuous.6 In both distributions, there is an upper and lower bound and all members of the sample space are equally probable.

6 For the sake of clarity, the uniform distribution is often referred to as the continuous uniform distribution.

8.6.2.1 Generating random samples from the continuous uniform distribution

To generate a sample of n numbers with a continuous uniform distribution between a and b, use the runif function like so:

# Sample size
n <- 1000
# Lower and upper bounds
a <- 10
b <- 30
# Sample
x <- runif(n, min = a, max = b)
Figure 8.27: Random sample (n = 1000) of a continuous uniform distribution between 10 and 30. Points are randomly jittered to show the distribution more clearly.
# Plot
tibble(x) %>% 
ggplot(aes(x, y = 0.5)) + 
  geom_jitter(size = 0.5, 
              pch = 16,
              color = myfills[1], 
              height = 0.45) +
  scale_x_continuous(NULL) +
  scale_y_continuous(NULL, 
                     breaks = NULL, 
                     limits = c(0,1), expand = expansion()) + 
  theme_minimal(base_family = bfont, base_size = bsize)

8.6.2.2 Using the continuous uniform distribution to generate random samples from other distributions

Uniform distributions can begin and end at any real number but one member of the uniform distribution family is particularly important—the uniform distribution between 0 and 1. If you need to use Excel instead of a statistical package, you can use this distribution to generate random numbers from many other distributions.

The cumulative distribution function of any continuous distribution converts into a continuous uniform distribution. A distribution’s quantile function converts a continuous uniform distribution into that distribution. Most of the time, this process also works for discrete distributions. This process is particularly useful for generating random numbers with an unusual distribution. If the distribution’s quantile function is known, a sample with a continuous uniform distribution can easily be generated and converted.

For example, the RAND function in Excel generates random numbers between 0 and 1 with a continuous uniform distribution. The BINOM.INV function is the binomial distribution’s quantile function. Suppose that n (number of Bernoulli trials) is 5 and p (probability of success on each Bernoulli trial) is 0.6. A randomly generated number from the binomial distribution with n=5 and p=0.6 is generated like so:

=BINOM.INV(5,0.6,RAND())

Excel has quantile functions for many distributions (e.g., BETA.INV, BINOM.INV, CHISQ.INV, F.INV, GAMMA.INV, LOGNORM.INV, NORM.INV, T.INV). This method of combining RAND and a quantile function works reasonably well in Excel for quick-and-dirty projects, but when high levels of accuracy are needed, random samples should be generated in a dedicated statistical program like R, Python (via the numpy package), Julia, STATA, SAS, or SPSS.

8.6.3 Normal Distributions

(Unfinished)

Feature Symbol
Sample Space x \in (-\infty,\infty)
Mean \mu = \mathcal{E}\left(X\right)
Variance \sigma^2 = \mathcal{E}\left(\left(X - \mu\right)^2\right)
Skewness \gamma_1 = 0
Kurtosis \gamma_2 = 0
Probability Density Function f_X(x;\mu,\sigma^2) = \frac{1}{\sqrt{2 \pi \sigma ^ 2}} e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}
Cumulative Distribution Function F_X(x;\mu,\sigma^2) = \frac{1}{\sqrt{2\pi\sigma^2}} {\displaystyle \int_{-\infty}^{x} e ^ {-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}dx}
Table 8.7: Features of Normal Distributions
Figure 8.28: Carl Friedrich Gauss (1777–1855)
Image Credits

The normal distribution is sometimes called the Gaussian distribution after its discoverer, Carl Friedrich Gauss Figure 8.28. It is a small injustice that most people do not use Gauss’s name to refer to the normal distribution. Thankfully, Gauss is not exactly languishing in obscurity. He made so many discoveries that his name is all over mathematics and statistics.

The normal distribution is probably the most important distribution in statistics and in psychological assessment. In the absence of other information, assuming that an individual difference variable is normally distributed is a good bet. Not a sure bet, of course, but a good bet. Why? What is so special about the normal distribution?

To get a sense of the answer to this question, consider what happens to the binomial distribution as the number of events (n) increases. To make the example more concrete, let’s assume that we are tossing coins and counting the number of heads (p=0.5). In Figure 8.29, the first plot shows the probability mass function for the number of heads when there is a single coin (n=1)). In the second plot, n=2 coins. That is, if we flip 2 coins, there will be 0, 1, or 2 heads. In each subsequent plot, we double the number of coins that we flip simultaneously. Even with as few as 4 coins, the distribution begins to resemble the normal distribution, although the resemblance is very rough. With 128 coins, however, the resemblance is very close.

Figure 8.29: The binomial distribution begins to resemble the normal distribution when the number of events is large.

This resemblance to the normal distribution in the example is not coincidental to the fact that p=0.5, making the binomial distribution symmetric. If p is extreme (close to 0 or 1), the binomial distribution is asymmetric. However, if n is large enough, the binomial distribution eventually becomes very close to normal.

Many other distributions, such as the Poisson, Student’s T, F, and \chi^2 distributions, have distinctive shapes under some conditions but approximate the normal distribution in others (See Figure 8.30). Why? In the conditions in which non-normal distributions approximate the normal distribution, it is because, like in Figure 8.29, many independent events are summed.

Figure 8.30: Many distributions become nearly normal when their parameters are high.

8.6.3.1 Notation for Normal Variates

Statisticians write about variables with normal distributions so often that a compact notation for specifying a normal variable’s parameters was useful to develop. If I want to specify that X is a normally variable with a mean of \mu and a variance of \sigma^2, I will use this notation:

X \sim \mathcal{N}(\mu, \sigma^2)

Symbol Meaning
X A random variable.
\sim Is distributed as
\mathcal{N} Has a normal distribution
\mu The population mean
\sigma^2 The population variance
Table 8.8: Features of Half-Normal Distributions

Many authors list the standard deviation \sigma instead of the variance \sigma^2. When I specify normal distributions with specific means and variances, I will avoid ambiguity by always showing the variance as the standard deviation squared. For example, a normal variate with a mean of 10 and a standard deviation of 3 will be written as X \sim \mathcal{N}(10,3^2).

Figure 8.31: Percentiles convert a distribution into a uniform distribution
Figure 8.32: Evenly spaced percentile ranks are associated with unevenly spaced scores.
Figure 8.33: Evenly spaced scores are associated with unevenly spaced percentiles

8.6.3.2 Half-Normal Distribution

(Unfinished)

Feature Symbol
Sample Space x \in [\mu,\infty)
Mu \mu \in (-\infty,\infty)
Sigma \sigma \in [0,\infty)
Mean \mu + \sigma\sqrt{\frac{2}{\pi}}
Variance \sigma^2\left(1-\frac{2}{\pi}\right)
Skewness \sqrt{2}(4-\pi)(\pi-2)^{-\frac{3}{2}}
Kurtosis 8(\pi-3)(\pi-2)^{-2}
Probability Density Function f_X(x;\mu,\sigma) = \sqrt{\frac{2}{\pi \sigma ^ 2}} e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}
Cumulative Distribution Function F_X(x;\mu,\sigma) = \sqrt{\frac{2}{\pi\sigma}} {\displaystyle \int_{\mu}^{x} e ^ {-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}dx}
Table 8.9: Features of Half-Normal Distributions
Figure 8.34: The half-normal distribution is the normal distribution with the left half of the distribution stacked on top of the right half of the distribution.
# Half normal distribution
xlim <- 4
n <- length(seq(-xlim, 0, 0.01))
t1 <- tibble(
  x = c(0,-xlim,
        seq(-xlim, 0, 0.01),
        0,
        0,
        seq(0, xlim, 0.01),
        xlim,
        0),
  y = c(0,
        0,
        dnorm(seq(-xlim, 0, 0.01)),
        0,
        0,
        dnorm(seq(0, xlim, 0.01)),
        0,
        0),
  side = c(rep(F, n + 3), rep(T, n + 3)),
  Type = 1
)
t2 <- t1 %>%
  mutate(y = if_else(side, y, 2 * y)) %>%
  mutate(x = abs(x),
         Type = 2)

bind_rows(t1, t2) %>%
  mutate(Type = factor(Type)) %>%
  ggplot(aes(x, y, fill = side)) +
  geom_polygon() +
  geom_text(
    data = tibble(
      x = 0,
      y = dnorm(0) * c(1, 2) + 0.14,
      Type = factor(c(1,2)),
      label = c(
        "Normal",
        "Half-Normal"),
      side = T),
    aes(label = label),
    family = bfont, fontface = "bold",
    size = ggtext_size(30), 
    vjust = 1
  ) +
  geom_richtext(
    data = tibble(
      x = 0,
      y = dnorm(0) * c(1, 2) + 0,
      Type = factor(c(1,2)),
      label = c(
        paste0("*X* ~ ",
               span_style("N", style = "font-family:'Lucida Calligraphy'"),
               "(*",
               span_style("&mu;"),
               "*, *",
               span_style("&sigma;"),
               "*<sup>2</sup>)"),
        paste0("*X* ~ |",
               span_style("N", style = "font-family:'Lucida Calligraphy'"),
               "(0, *",
               span_style("&sigma;"),
               "*<sup>2</sup>)| + *",
               span_style("&mu;"),
               "*")),
      side = T),
    aes(label = label),
    family = c("Equity Text A"),
    size = ggtext_size(30), 
    vjust = 0, 
    label.padding = unit(0,"lines"), 
    label.color = NA,
    fill = NA) +
  theme_void(base_size = 30,
                base_family = bfont) +
  theme(
    legend.position = "none",
    strip.text = element_blank()
  ) +
  scale_fill_manual(values = myfills) +
  facet_grid(rows = vars(Type), space = "free_y", scales = "free_y") 

Suppose that X is a normally distributed variable such that

X \sim \mathcal{N}(\mu, \sigma^2)

Variable Y then has a half-normal distribution such that Y = |X-\mu|+\mu. In other words, imagine that a normal distribution is folded at the mean with the left half of the distribution now stacked on top of the right half of the distribution (See Figure 8.34).

8.6.3.3 Truncated Normal Distributions

(Unfinished)

8.6.3.4 Multivariate Normal Distributions

(Unfinished)

8.6.4 Chi Square Distributions

(Unfinished)

Feature Symbol
Sample Space x \in [0,\infty)
Degrees of freedom \nu \in [0,\infty)
Mean \nu
Variance 2\nu
Skewness \sqrt{8/\nu}
Kurtosis 12/\nu
Probability Density Function f_X(x;\nu) = \frac{x^{\nu/2-1}}{2^{\nu/2}\;\Gamma(\nu/2)\,\sqrt{e^x}}
Cumulative Distribution Function F_X(x;\nu) = \frac{\gamma\left(\frac{\nu}{2},\frac{x}{2}\right)}{\Gamma(\nu/2 )}
Table 8.10: Features of Chi-Square Distributions

I have always thought that the \chi^2 distribution has an unusual name. The chi part is fine, but why square? Why not call it the \chi distribution?7 As it turns out, the \chi^2 distribution is formed from squared quantities.

7 Actually, there is a \chi distribution. It is simply the square root of the \chi^2 distribution. The half-normal distribution happens to be a \chi distribution with 1 degree of freedom.

Notation note: A \chi^2 distribution with \nu degrees of freedom can be written as \chi^2_\nu or \chi^2(\nu).

The \chi^2 distribution has a straightforward relationship with the normal distribution. It is the sum of multiple independent squared normal variates. That is, if z is a standard normal variate— z\sim\mathcal{N}(0,1^2)—then z^2 has a \chi^2 distribution with 1 degree of freedom (\nu):

z^2\sim \chi^2_1

If z_1 and z_2 are independent standard normal variates, the sum of their squares has a \chi^2 distribution with 2 degrees of freedom:

z_1^2+z_2^2 \sim \chi^2_2 If \{z_1,z_2,\ldots,z_{\nu} \} is a series of \nu independent standard normal variates, the sum of their squares has a \chi^2 distribution with \nu degrees of freedom:

\sum^\nu_{i=1}{z_i^2} \sim \chi^2_\nu

8.6.4.1 Clinical Uses of the \chi^2 distribution

The \chi^2 distribution has many applications, but the mostly likely of these to be used in psychological assessment is the \chi^2 Test of Goodness of Fit and the \chi^2 Test of Independence.

Suppose we suspect that a child’s temper tantrums are more likely to occur on weekdays than on weekends. The child’s mother has kept a record of each tantrum for the past year and was able to count the frequency of tantrums. If tantrums were equally likely to occur on any day, 5 of 7 tantrums should occur on weekdays, and 2 of 7 tantrums should occur on weekends. The observed frequencies are compared with the expected frequencies below.

\begin{array}{r|c|c|c} & \text{Weekday} & \text{Weekend} & \text{Total} \\ \hline \text{Observed Frequency}\, (o) & 14 & 13 & n=27\\ \text{Expected Proportion}\,(p) & \frac{5}{7} & \frac{2}{7} & 1\\ \text{Expected Frequency}\, (e = np)& 27\times \frac{5}{7}= 19.2857& 27\times \frac{2}{7}= 7.7143& 27\\ \text{Difference}\,(o-e) & -5.2857 & 5.2857&0\\ \frac{(o-e)^2}{e} & 1.4487 & 3.6217 & \chi^2 = 5.07 \end{array}

In the table above, if the observed frequencies (o_i) are compared to their respective expected frequencies (e_i), then:

\chi^2_{k-1}=\sum_{i=1}^k{\frac{(o_i-e_i)^2}{e_i}}=5.07

Using the \chi^2 cumulative distribution function, we find that the probability of observing the frequencies listed is low under the assumption that tantrums are equally likely each day.

observed_frequencies <- c(14, 13)
expected_probabilities <- c(5,2) / 7

fit <- chisq.test(observed_frequencies, p = expected_probabilities)
fit

    Chi-squared test for given probabilities

data:  observed_frequencies
X-squared = 5.0704, df = 1, p-value = 0.02434
# View expected frequencies and residuals
broom::augment(fit)
# A tibble: 2 × 6
  Var1  .observed .prop .expected .resid .std.resid
  <fct>     <dbl> <dbl>     <dbl>  <dbl>      <dbl>
1 A            14 0.519     19.3   -1.20      -2.25
2 B            13 0.481      7.71   1.90       2.25
A
B 0 1
0 36 39
1 5 20
# A tibble: 4 × 9
  A     B     .observed .prop .row.prop .col.prop .expected .resid .std.resid
  <fct> <fct>     <int> <dbl>     <dbl>     <dbl>     <dbl>  <dbl>      <dbl>
1 0     0            36  0.36     0.878      0.48      30.8  0.947       2.47
2 1     0            39  0.39     0.661      0.52      44.2 -0.789      -2.47
3 0     1             5  0.05     0.122      0.2       10.2 -1.64       -2.47
4 1     1            20  0.2      0.339      0.8       14.8  1.37        2.47

8.6.5 Student’s t Distributions

Feature Symbol
Sample Space x \in (-\infty,\infty)
Degrees of Freedom \nu \in (0,\infty)
Mean \left\{ \begin{array}{ll} 0 & \nu \gt 1 \\ \text{Undefined} & \nu \le 1 \\ \end{array} \right.
Variance \left\{ \begin{array}{ll} \frac{\nu}{\nu-2} & \nu\gt 2 \\ \infty & 1 \lt \nu \le 2\\ \text{Undefined} & \nu \le 1 \\ \end{array} \right.
Skewness \left\{ \begin{array}{ll} 0 & \nu \gt 3 \\ \text{Undefined} & \nu \le 3 \\ \end{array} \right.
Kurtosis \left\{ \begin{array}{ll} \frac{6}{\nu-4} & \nu \gt 4 \\ \infty & 2 \lt \nu \le 4\\ \text{Undefined} & \nu \le 2 \\ \end{array} \right.
Probability Density Function f_X(x; \nu) = \frac{\Gamma(\frac{\nu+1}{2})} {\sqrt{\nu\pi}\,\Gamma(\frac{\nu}{2})} \left(1+\frac{x^2}{\nu} \right)^{-\frac{\nu+1}{2}}
Cumulative Distribution Function F_X(x; \nu)=\frac{1}{2} + x \Gamma \left( \frac{\nu+1}{2} \right) \frac{\phantom{\,}_{2}F_1 \left(\frac{1}{2},\frac{\nu+1}{2};\frac{3}{2};-\frac{x^2}{\nu} \right)} {\sqrt{\pi\nu}\,\Gamma \left(\frac{\nu}{2}\right)}
Table 8.11: Features of t Distributions
Notation note: \Gamma is the gamma function. _2F_1 is the hypergeometric function.
Figure 8.35: “Student” statistician, William Sealy Gosset (1876–1937)
Image Credit

(Unfinished)

Guinness Beer gets free advertisement every time the origin story of the Student t distribution is retold, and statisticians retell the story often. The fact that the original purpose of the t distribution was to brew better beer seems too good to be true.

William Sealy Gosset (1876–1937), self-trained statistician and head brewer at Guinness Brewery in Dublin, continually experimented on small batches to improve and standardize the brewing process. With some help from statistician Karl Pearson, Gosset used then-current statistical methods to analyze his experimental results. Gosset found that Pearson’s methods required small adjustments when applied to small samples. With Pearson’s help and encouragement (and later from Ronald Fisher), Gosset published a series of innovative papers about a wide range of statistical methods, including the t distribution, which can be used to describe the distribution of sample means.

Worried about having its trade secrets divulged, Guinness did not allow its employees to publish scientific papers related to their work at Guinness. Thus, Gosset published his papers under the pseudonym, “A Student.” The straightforward names of most statistical concepts need no historical treatment. Few of us who regularly use the Bernoulli, Pareto, Cauchy, and Gumbell distributions could tell you anything about the people who discovered them. But the oddly named “Student’s t distribution” cries out for explanation. Thus, in the long run, it was Gosset’s anonymity that made him famous.

Figure 8.36: The t distribution approaches the standard normal distribution as the degrees of freedom (df) parameter increases.
# The t distribution approaches the normal distribution
d <- crossing(x = seq(-6,6,0.02), 
         df = c(seq(1,15,1),
                seq(20,45,5),
                seq(50,100,10),
                seq(200,700,100))) %>%
  mutate(y = dt(x,df),
         Normal = dnorm(x)) 

t_size <- 40

d_label <- d %>% 
  select(df) %>% 
  unique() %>% 
  mutate(lb = qt(.025, df),
         ub = qt(0.975, df)) %>% 
  pivot_longer(c(lb, ub), values_to = "x", names_to = "bounds") %>% 
  mutate(label_x = signs::signs(x, accuracy = .01),
         y = 0,
         yend = dt(x, df))

p <- ggplot(d, aes(x, y)) + 
  geom_area(aes(y = Normal), alpha = 0.25, fill = myfills[1]) +
  geom_line() +
  geom_area(data = . %>% filter(x >= 1.96), 
            alpha = 0.25, 
            fill = myfills[1],
            aes(y = Normal)) +
  geom_area(data = . %>% filter(x <= -1.96), 
            alpha = 0.25, 
            fill = myfills[1],
            aes(y = Normal)) +
  geom_text(data = d_label, 
            aes(label = label_x), 
            family = bfont, 
            vjust = 1.25,
            size = ggtext_size(t_size)) + 
  geom_text(data = d_label %>% select(df) %>% unique,
            aes(x = 0, y = 0, label = paste0("df = ", df)), 
            vjust = 1.25, 
            family = bfont,
            size = ggtext_size(t_size)) + 
  geom_segment(data = d_label, aes(xend = x, yend = yend)) +
  transition_states(states = df, 
                    transition_length =  1, 
                    state_length = 2) +
  theme_void(base_size = t_size, base_family = bfont) +
  # labs(title = "df = {closest_state}") +
  annotate(x = qnorm(c(0.025, 0.975)), 
           y = 0, 
           label = signs::signs(qnorm(c(0.025, 0.975)), accuracy = .01), 
           geom = "text", 
           size = ggtext_size(t_size),
           color = myfills[1],
           vjust = 2.6, 
           family = bfont) + 
  coord_cartesian(xlim = c(-6,6), ylim = c(-.045, NA)) 

animate(p, 
        renderer = magick_renderer(), 
        device = "svglite", 
        fps = 2, 
        height = 6, 
        width = 10)
gganimate::anim_save("tdist_norm.gif")

8.6.5.1 The t distribution’s relationship Relationship to the normal distribution.

Suppose we have two independent standard normal variates Z_0 \sim \mathcal{N}(0, 1^2) and Z_1 \sim \mathcal{N}(0, 1^2).

A t distribution with one degree of freedom is created like so:

T_1 = z_0\sqrt{\frac{1}{z_1^2}}

A t distribution with two degrees of freedom is created like so:

T_2 = z_0\sqrt{\frac{2}{z_1^2 + z_2^2}}

Where z_0, z_1 and z_2 are independent standard normal variates.

A t distribution with \nu degrees of freedom is created like so:

T_v = z_0\sqrt{\frac{\nu}{\sum_{i=1}^\nu z_i^2}}

The sum of \nu squared standard normal variates \left(\sum_{i=1}^\nu z_i^2\right) has a \chi^2 distribution with \nu degrees of freedom, which has a mean of \nu. Therefore, \sqrt{\frac{\nu}{\sum_{i=1}^\nu z_i^2}}, on average, equals one. However, the expression \sqrt{\frac{\nu}{\sum_{i=1}^\nu z_i^2}} has a variability approaches 0 as \nu increases. When \nu is high, z_0 is being multiplied by a value very close to 1. Thus, T_\nu is nearly normal at high levels of nu.

8.7 Additional Distributions

8.7.1 F Distributions

Suppose that X is the ratio of two independent \chi^2 variates U_1 and U_2 scaled by their degrees of freedom \nu_1 and \nu_2, respectively:

X=\frac{\frac{U_1}{\nu_1}}{\frac{U_2}{\nu_2}}

The random variate X will have an F distribution with parameters, \nu_1 and \nu_2.

The primary application of the F distribution is to test the equality of variances in ANOVA. I am unaware of any direct applications of the F distribution in psychological assessment.

8.7.2 Weibull Distributions

How long do we have to wait before an event occurs? With Weibull distributions, we model wait times in which the probability of the event changes depending on how long we have waited. Some machines are designed to last a long time, but defects in a part might cause it fail quickly. If the machine is going to fail, it is likely to fail early. If the machine works flawlessly in the early period, we worry about it less. Of course, all physical objects wear out eventually, but a good design and regular maintenance might allow a machine to operate for decades. The longer machine has been working well, the less risk that it will irreparably fail on any particular day.

For some things, the risk of failure on any particular day becomes increasingly likely the longer it has been used. Biological aging causes increasing risk of death over time such that the historical records have no instances of anyone living beyond

For some events, there is a constant probability that the event will occur. For others, the probability is higher at first but becomes steadily less likely over time

the longer we wait the greater the probability will occur. For example, as animals age the probability of death accelerates such that beyond a certain age no individual as been observed to survive.

8.7.3 Unfinished

  • Gumbel Distributions
  • Beta Distributions
  • Exponential Distributions
  • Pareto Distributions