8 Distributions

Discrete

8.1 Random Variables

Because we first learn about variables in an algebra class, we tend to think of variables as having values that can be solved for—if we have enough information about them. If I say that x is a variable and that x+6=8, we can use algebra to find that x must equal 2.

Random variables are not like algebraic variables. Random variables simply take on values because of some random process. If we say that the outcome of a throw of a six-sided die is a random variable, there is nothing to “solve for.” There is no equation that determines the value of the die. Instead, it is determined by chance and the physical constraints of the die. That is, the outcome must be one of the numbers printed on the die, and the six numbers are equally likely to occur. This illustrates an important point. The word random here does not mean “anything can happen.” On a six-sided die, you will never roll a 7, 3.5, \sqrt{\pi}, −36,000, or any other number that does not appear on the six sides of the die. Random variables have outcomes that are subject to random processes, but those random processes do have constraints on them such that some outcomes are more likely than others—and some outcomes never occur at all.

Random variables have values that are determined by a random process.

Figure 8.1: Rolling a six-sided die is a process that creates a randomly ordered series of integers from 1 to 6.

When we say that the throw of a six-sided die is a random variable, we are not talking about any particular throw of a particular die but, in a sense, every throw (that has ever happened or ever could happen) of every die (that has ever existed or could exist). Imagine an immense, roaring, neverending, cascading flow of dice falling from the sky. As each die lands and disappears, a giant scoreboard nearby records the relative frequencies of ones, twos, threes, fours, fives, and sixes. That’s a random variable.

R Code for Figure 8.1

# Function to make dice
makedice <- function(i, id, add_blank = TRUE) {
  x = switch(
           i,
           `1` = 0,
           `2` = c(-1, 1),
           `3` = c(-1, 1, 0),
           `4` = c(-1, 1, -1, 1),
           `5` = c(-1, 1, -1, 1, 0),
           `6` = c(-1, 1, -1, 1, -1, 1)
         )
  y = switch(
           i,
           `1` = 0,
           `2` = c(1,-1),
           `3` = c(1,-1, 0),
           `4` = c(1,-1,-1, 1),
           `5` = c(1,-1,-1, 1, 0),
           `6` = c(1,-1,-1, 1, 0, 0))
  
  d <- tibble(id = id * 1,
              i = i,
              x = x,
              y = y)
  
  if (add_blank) {
    d <- d %>%
      add_case(id = id + 0.5,
               i = 0,
               x = NA,
               y = NA)
  }
  d
}

# Number of throws
k <- 50

# Dot positions
d <- map2_df(sample(1:6,k, replace = T), 1:k, makedice)  

# Plot
p <- ggplot(d) +
  geom_point(pch = 16, size = 50, aes(x, y))  +
  ob_ellipse(
    a = 1.85,
    b = 1.85,
    m1 = 12,
    linewidth = 4
  ) +
  coord_equal() +
  theme_void() +
  theme(legend.position = "none") +
  transition_manual(id) 

# Render animation
p_gif <- animate(
  p,
  fps = 1,
  device = "svg",
  renderer = magick_renderer(),
  width = 8,
  height = 8
)
p_gif

8.2 Sets

A set refers to a collection of objects. Each distinct object in a set is an element.

A set is a collection of distinct objects.An element is a distinct member of a set.

8.2.1 Discrete Sets

To show that a list of discrete elements is a discrete set, we can use curly braces. For example, the set of positive single-digit even numbers is \{2, 4, 6, 8\}. With large sets with repeating patterns, it is convenient to use an ellipsis (“…”), the punctuation mark signifying an omission or pause. For example, rather than listing every two-digit positive even number, we can show the pattern like so:

A discrete set has numbers that are isolated, meaning that each number has a range in which it is the only number in the overall set.

\{10, 12, 14,\ldots, 98\}

If we want the pattern to repeat forever, we can set an ellipsis on the left, right, or both sides. The set of odd integers extends to infinity in both directions:

\{\ldots, -5, -3, -1, 1, 3, 5, \ldots\}

8.2.2 Interval Sets

With continuous variables, we can define sets in terms of intervals. Whereas the discrete set \{0,1\} refers just to the numbers 0 and 1, the interval set (0,1) refers to all the numbers between 0 and 1.

Intervals are a continous range of numbers.

As shown in Figure 8.2, some intervals include their endpoints and others do not. Intervals noted with square brackets include their endpoints and intervals written with parentheses exclude them. Some intervals extend to positive or negative infinity: (-\infty,5] and (-8,+\infty). Use a parenthesis with infinity instead of a square bracket because infinity is not a specific number that can be included in an interval.

R Code for Figure 8.2

# Interval notation
tibble(lb = 1L,
       ub = 5L,
       y = 1:4,
       meaning = c("includes 1 and 5",
                   "excludes 1 and 5",
                   "includes 1 but not 5",
                   "includes 5 but not 1"),
       l_bracket = c("[", "(", "[", "("),
       u_bracket = c("]", ")", ")", "]")) %>% 
  mutate(Interval = paste0(l_bracket, lb, ",", ub, u_bracket) %>% 
           fct_inorder() %>% 
           fct_rev,
         l_fill = ifelse(l_bracket == "[", myfills[1], "white"),
         u_fill = ifelse(u_bracket == "]", myfills[1], "white")) %>% 
  ggplot(aes(lb, Interval)) + 
  geom_segment(aes(xend = ub, yend = Interval), 
               linewidth = 2, 
               color = myfills[1]) +
  geom_point(aes(fill = l_fill), 
             size = 5, 
             pch = 21, 
             stroke = 2, color = myfills[1]) + 
  geom_point(aes(fill = u_fill, x = ub), 
             size = 5, 
             pch = 21, 
             stroke = 2, 
             color = myfills[1]) +
  geom_label(aes(label = paste(Interval, meaning), x = 3), 
             vjust = -.75, 
             label.padding = unit(0,"lines"), 
             label.size = 0, 
             family = bfont, 
             size = ggtext_size(27)) +
  scale_fill_identity() +
  scale_y_discrete(NULL, expand = expansion(c(0.08, 0.25))) +
  scale_x_continuous(NULL, minor_breaks = NULL) +
  theme_minimal(base_size = 27, 
                base_family = bfont) + 
  theme(axis.text.y = element_blank(), 
        panel.grid.major = element_blank(),
        axis.line.x = element_line(linewidth = .5),
        axis.ticks.x = element_line(linewidth = .25), 
        plot.margin = margin())

8.3 Sample Spaces

The set of all possible outcomes of a random variable is the sample space. Continuing with our example, the sample space of a single throw of a six-sided die is the set \{1,2,3,4,5,6\}. Sample space is a curious term. Why sample and why space? With random variables, populations are infinitely large, at least theoretically. Random variables just keep spitting out numbers forever! So any time we actually observe numbers generated by a random variable, we are always observing a sample; actual infinities cannot be observed in their entirety. A space is a set that has mathematical structure. Most random variables generate either integers or real numbers, both of which are structured in many ways (e.g., order).

A sample space is the set of all possible values that a random variable can assume.A population consists of all entities under consideration.A sample is a subset of a population.

Unlike distributions having to do with dice, many distributions have a sample space with an infinite number of elements. Interestingly, there are two kinds of infinity we can consider. A distribution’s sample space might be the set of whole numbers: \{0,1,2,...\}, which extends to positive infinity. The sample space of all integers extends to infinity in both directions: \{...-2,-1,0,1,2,...\}.

The sample space of continuous variables is infinitely large for another reason. Between any two points in a continuous distribution, there is an infinite number of other points. For example, in the beta distribution, the sample space consists of all real numbers between 0 and 1: (0,1). Many continuous distributions have sample spaces that involve both kinds of infinity. For example, the sample space of the normal distribution consists of all real numbers from negative infinity to positive infinity: (-\infty, +\infty).

8.4 Probability Distributions

Each element of a random variable’s sample space occurs with a particular probability. When we list the probabilities of each possible outcome, we have specified the variable’s probability distribution. In other words, if we know the probability distribution of a variable, we know how probable each outcome is. In the case of a throw of a single die, each outcome is equally likely (Figure 8.3).

In a probability distribution, there is an assignment of a probability to each possible element in a variable’s sample space.

Figure 8.3: The probability distribution of a throw of a single die

R Code for Figure 8.3

position_dice <- function(i, r = 1.85, space = 1) {
  # Height of dice center
  y0 <- (6 - i) * (2 * r + space)
  
  # Dice dots
  p <- makedice(i, i, add_blank = F) %>%
    mutate(y = y + y0) %>%
    ob_point() %>%
    ob_circle(radius = .4,
              fill = "black",
              color = NA)
  
  # Dice outlines
  e <- ob_ellipse(
    ob_point(0, y0),
    a = r,
    b = r,
    m1  = 12,
    linewidth = .75
  )
  
  # Probability point
  p_pr <- e@center + ob_point(13, 0)
  
  # Probability label
  pr <- ob_label(
    label = "&frac16;", 
    p = p_pr, 
    size = 32, 
    family = "serif")
  
  # Arrow connectors
  a <- connect(
    e,
    p_pr,
    linewidth = 2,
    resect_fins = 3,
    resect_head = 6,
    length_head = 4,
    color = "gray70"
  )
  
  # bind list
  c(p, e, pr, a)
}

# shape list
sl <- map_ob(1:6, position_dice)

# Sample space label
ss_label <- ob_label(
  label = "Sample Space", 
  p = sl[["ob_ellipse"]]@bounding_box@north + 
           ob_point(0, 2),
  size = 24)

# Probability label
pr_label <- ob_label(
    label = "Probability", 
    ss_label@p %-|% sl[["ob_label"]]@p@bounding_box@north, 
    size = 24
    )

# Plot
ggdiagram(bfont) +
  sl +
  ss_label + 
  pr_label

There is an infinite variety of probability distributions, but a small subset of them have been given names. Now, one can manage one’s affairs quite well without ever knowing what a Bernoulli distribution is, or what a \chi{^2} distribution is, or even what a normal distribution is. However, sometimes life is a little easier if we have names for useful things that occur often. Most of the distributions with names are not really single distributions, but families of distributions. The various members of a family are unique but they are united by the fact that their probability distributions are generated by a particular mathematical function (more on that later). In such cases, the probability distribution is often represented by a graph in which the sample space is on the X-axis and the associated probabilities are on the Y-axis. In Figure 8.4, 16 probability distributions that might be interesting and useful to clinicians are illustrated. Keep in mind that what are pictured are only particular members of the families listed; some family members look quite different from what is shown in Figure 8.4.

Figure 8.4: A gallery of useful distributions

R Code for Figure 8.4

# A gallery of useful distributions
# Run output file pdfIllustration.tex in LaTeX
# pdflatex --enable-write18 --extra-mem-bot=10000000 --synctex=1 pdfIllustration.tex
tikzpackages <- paste(
  "\\usepackage{tikz}",
  "\\usepackage{amsmath}",
  "\\usepackage[active,tightpage,psfixbb]{preview}",
  "\\PreviewEnvironment{pgfpicture}",
  collapse = "\n"
)

tikzDevice::tikz('pdfIllustration.tex',
                 standAlone = TRUE, 
                 packages = tikzpackages, 
                 width = 11, 
                 height = 11)
par(
  mar = c(1.75, 1.3, 1.75, 0),
  mfrow = c(4, 4),
  las = 1,
  xpd = TRUE,
  family = 'serif',
  pty = "s",
  mgp = c(2, 0.5, 0),
  tcl = -0.3,
  cex = 1.35
)

# Bernoulli
plot(
  c(0.2, 0.8) ~ seq(0, 1),
  type = "b",
  ylim = c(0, 1),
  bty = "n",
  col = myfills[1],
  lwd = 1,
  xlab = "",
  ylab = "",
  main = "Bernoulli",
  lty = 3,
  pch = 19,
  xaxp = c(0, 1, 1),
  xlim = c(-0.1, 1)
)
text(x = .7, y = .8, "$p=0.8$")

# Binomial
plot(
  dbinom(seq(0, 5), 5, 0.2) ~ seq(0, 5),
  type = "b",
  xlim = c(-0.1, 5),
  bty = "n",
  col = myfills[1],
  lwd = 1,
  xlab = "",
  ylab = "",
  main = "Binomial",
  lty = 3,
  pch = 19
)
text(
  x = c(3.5, 3.5),
  y = c(.35, .25),
  c("$p=0.2$", "$n=5$"),
  adj = 0
)

# Poisson
plot(
  dpois(0:10, 3) ~ seq(0, 10),
  type = "b",
  xlim = c(-0.1, 10),
  bty = "n",
  col = myfills[1],
  lwd = 1,
  xlab = "",
  ylab = "",
  main = "Poisson",
  lty = 3,
  pch = 19
)
text(x = 7, y = .15, r"($\lambda=3$)")

# Geometric
plot(
  dgeom(0:4, prob = 0.8) ~ seq(0, 4),
  type = "b",
  ylim = c(0, .8),
  xlim = c(-0.1, 4),
  bty = "n",
  col = myfills[1],
  lwd = 1,
  xlab = "",
  ylab = "",
  main = "Geometric",
  lty = 3,
  pch = 19
)
text(x = 2, y = .6, "$p=0.8$")

# Discrete Uniform
plot(
  rep(1 / 4, 4) ~ seq(1, 4),
  type = "b",
  ylim = c(0, 1),
  xlim = c(0, 5),
  bty = "n",
  col = myfills[1],
  lwd = 1,
  xlab = "",
  ylab = "",
  main = "Discrete Uniform",
  lty = 3,
  pch = 19
)
text(
  x = c(1, 4),
  y = c(.5, .5),
  c("$a=1$", "$b=4$"),
  adj = 0.5
)

# Continuous
plot(
  c(0, 1 / 3, 1 / 3, 0) ~ c(1, 1, 4, 4),
  type = "n",
  ylim = c(0, 1),
  xlim = c(0, 5),
  bty = "n",
  col = myfills[1],
  lwd = 2,
  xlab = "",
  ylab = "",
  main = "Continuous Uniform"
)
polygon(
  c(1, 1, 4, 4),
  c(0, 1 / 3, 1 / 3, 0),
  col = myfills[1],
  xpd = FALSE,
  border = NA
)
text(
  x = c(1, 4),
  y = c(.5, .5),
  c("$a=1$", "$b=4$"),
  adj = 0.5
)

# Normal
x <- seq(-4, 4, 0.02)
plot(
  dnorm(x) ~ x,
  type = "n",
  col = myfills[1],
  xlab = "",
  ylab = "",
  main = "Normal",
  lwd = 2,
  bty = "n",
  axes = F
)
polygon(
  c(min(x), x, max(x)),
  c(0, dnorm(x), 0),
  col = myfills[1],
  lwd = 1,
  xpd = FALSE,
  border = NA
)
axis(2)

text(
  x = c(-1.5, -1.5),
  y = c(.35, .25),
  c("$\\mu=0$", "$\\sigma^2=1$"),
  adj = 1
)


center_neg <- function(x) {
  signs <- sign(x)
  paste0(ifelse(signs < 0,"$",""), x, ifelse(signs < 0,"\\phantom{-}$",""))
}

all_tick_labels <- function(side = 1, at, labels = at) {
  axis(side, labels = rep("",length(at)), at = at)
for (i in 1:length(at)) {
  axis(side, 
       at = at[i], 
       labels = labels[i],
       tick = F)
  }
}
axis_ticks <- seq(-4,4,2)
axis_labs <- center_neg(axis_ticks)
all_tick_labels(1, at = axis_ticks, labels = axis_labs)


# Student t
x <- seq(-6, 6, 0.02)
plot(
  dt(x, 2) ~ x,
  type = "n",
  col = myfills[1],
  xlab = "",
  ylab = "",
  main = "Student's $\\boldsymbol{t}$",
  lwd = 2,
  bty = "n",
  ylim = c(0, 0.4),
  axes = F
)
polygon(
  c(min(x), x, max(x)),
  c(0, dt(x, 2), 0),
  col = myfills[1],
  lwd = 1,
  xpd = FALSE,
  border = NA
)
text(x = 3, y = .3, "$\\nu=2$")
axis(2)
axis_ticks <- seq(-6,6,2)
axis_labs <- center_neg(axis_ticks)
all_tick_labels(1, at = axis_ticks, labels = axis_labs)

# Chi-Square
x <- seq(0, 40, 0.05)
plot(
  dchisq(x, 13) ~ x,
  type = "n",
  col = myfills[1],
  xlab = "",
  ylab = "",
  main = "$\\boldsymbol{\\chi^2}$",
  lwd = 2,
  bty = "n",
  ylim = c(0, 0.1)
)
polygon(
  c(min(x), x, max(x)),
  c(0, dchisq(x, 13), 0),
  col = myfills[1],
  lwd = 1,
  xpd = FALSE,
  border = NA
)
text(x = 20, y = .08, "$k=2$", adj = 0)

# F
x <- seq(0, 6, 0.01)
plot(
  df(x, 3, 120) ~ x,
  type = "n",
  col = myfills[1],
  xlab = "",
  ylab = "",
  main = "$\\boldsymbol{F}$",
  lwd = 2,
  bty = "n",
  ylim = c(0, 0.8)
)
polygon(
  c(min(x), x, max(x)),
  c(0, df(x, 3, 120), 0),
  col = myfills[1],
  lwd = 1,
  xpd = TRUE,
  border = NA
)
text(
  x = c(2, 2),
  y = c(.6, .4),
  c("$d_1=3$", "$d_2=120$"),
  adj = 0
)

# Weibull
x <- seq(6.5, 11.5, 0.01)
plot(
  dweibull(x, 20, 10) ~ x,
  type = "n",
  col = myfills[1],
  xlab = "",
  ylab = "",
  main = "Weibull",
  lwd = 2,
  bty = "n",
  ylim = c(0, 0.8)
)
polygon(
  c(min(x), x, max(x)),
  c(0, dweibull(x, 20, 10), 0),
  col = myfills[1],
  lwd = 1,
  xpd = TRUE,
  border = NA
)
text(
  x = c(7, 7),
  y = c(.6, .4),
  c("$k=20$", "$\\lambda=10$"),
  adj = 0
)

# Beta
x <- seq(0, 1, 0.01)
plot(
  dbeta(x, 2, 2.5) ~ x,
  type = "n",
  col = myfills[1],
  xlab = "",
  ylab = "",
  main = "Beta",
  lwd = 2,
  bty = "n",
  ylim = c(0, 2),
  xaxp = c(0, 1, 1)
)
polygon(
  c(min(x), x, max(x)),
  c(0, dbeta(x, 2, 2.5), 0),
  col = myfills[1],
  lwd = 1,
  xpd = TRUE,
  border = NA
)
text(
  x = c(0.75, 0.75),
  y = c(1.75, 1.25),
  c("$\\alpha=2$", "$\\beta=2.5$"),
  adj = 0
)

# Log Normal
x <- c(seq(0, .999, 0.001), seq(1, 15, .05))
plot(
  dlnorm(x, 2, .5) ~ x,
  type = "n",
  col = myfills[1],
  xlab = "",
  ylab = "",
  main = "Log Normal",
  lwd = 2,
  bty = "n",
  ylim = c(0, 0.4)
)
polygon(
  c(min(x), x, max(x)),
  c(0, dlnorm(x, 1, 0.5), 0),
  col = myfills[1],
  lwd = 1,
  xpd = TRUE,
  border = NA
)
text(
  x = c(7, 7),
  y = c(.3, .2),
  c("$\\mu=2$", "$\\sigma=0.5$"),
  adj = 0
)


# Skew Normal
x <- seq(-4, 4, 0.01)
plot(
  sn::dsn(x, 2, 2.5) ~ x,
  type = "n",
  col = myfills[1],
  xlab = "",
  ylab = "",
  main = "Skew Normal",
  lwd = 2,
  bty = "n",
  ylim = c(0, 0.5),
  axes = F
)
polygon(
  c(min(x), x, max(x)),
  c(0, sn::dsn(
    x,
    xi = 1,
    omega = 1.5,
    alpha = -4
  ), 0),
  col = myfills[1],
  lwd = 1,
  xpd = TRUE,
  border = NA
)
text(
  x = c(-3.9, -3.9, -3.9),
  y = c(.45, .35, .25),
  c("$\\xi=1$", "$\\omega=1.5$", "$\\alpha=-4$"),
  adj = 0
)

axis(2)
axis_ticks <- seq(-4,4,2)
axis_labs <- center_neg(axis_ticks)
all_tick_labels(1, at = axis_ticks, labels =axis_labs)

# Normal Mixture
x <- seq(-4, 4, 0.01)
y <- (dnorm(x, -2, 0.5) * .25 + dnorm(x))
plot(
  y ~ x,
  type = "n",
  col = "violet",
  xlab = "",
  ylab = "",
  main = "Normal Mixture",
  lwd = 4,
  bty = "n",
  ylim = c(0, 0.5),
  axes = F
)
polygon(
  c(min(x), x, max(x)),
  c(0, y, 0),
  col = myfills[1],
  lwd = 1,
  xpd = TRUE,
  border = NA
)
polygon(
  c(min(x), x, max(x)),
  c(0, dnorm(x, -2, 0.5) * .25, 0),
  col = myfills[2],
  lwd = 1,
  xpd = TRUE,
  border = NA
)
text(
  x = c(-0.85, -0.85),
  y = c(.45, .35),
  c("$\\mu_1=-2$", "$\\sigma_1^2=0.5$"),
  adj = 1
)
text(
  x = c(3.5, 3.5),
  y = c(.45, .35),
  c("$\\mu_2=0$", "$\\sigma_2^2=1$"),
  adj = 1
)
axis(2)
axis_ticks <- seq(-4,4,2)
axis_labs <- center_neg(axis_ticks)
all_tick_labels(1, at = axis_ticks, labels =axis_labs)
# Bivariate Normal

x = seq(-4, 4, 0.1)
X = fMultivar::grid2d(x)
z = fMultivar::dnorm2d(X$x, X$y, rho = 0.6)
Z = list(x = x,
         y = x,
         z = matrix(z, ncol = length(x)))
persp(
  Z,
  theta = 20,
  phi = 25,
  col = "royalblue1",
  xlab = "\nX",
  ylab = "\nY",
  zlab = "",
  zlim = c(0, .20),
  border = NA,
  expand = .7,
  box = FALSE,
  ticktype = "simple",
  ltheta = 0,
  shade = 0.5,
  main = "Bivariate Normal",
  lwd = 0.5
)
text(c(-.11, .32), c(-.42, -.25), c("X", "Y"))
text(0, 0.25, "$\\rho_{XY}=0.6$")
dev.off()

8.5 Discrete Distrubitions

The sample spaces in discrete distributions are discrete sets. Thus, in the x-axis of the probability distributions, you will see isolated numbers with gaps between each number (e.g., integers).

8.5.1 Discrete Uniform Distributions

The throw of a single die is a member of a family of distributions called the discrete uniform distribution. It is “discrete” because the elements in the sample space are countable, with evenly spaced gaps between them. For example, there might be a sequence of 8, 9, 10, and 11 in the sample space, but there are no numbers in between. It is “uniform” because all outcomes are equally likely. With dice, the numbers range from a lower bound of 1 to an upper bound of 6. In the family of discrete uniform distributions, the lower and upper bounds are typically integers, mostly likely starting with 1. However, any real number a can be the lower bound and the spacing k between numbers can be any positive real number. For the sake of simplicity and convenience, I will assume that the discrete uniform distribution refers to consecutive integers ranging from a lower bound of a and an upper bound of b.

A discrete uniform distribution is a family of random variable distributions in which the sample space is an evenly spaced sequence of numbers, each of which is equally likely to occur.

This kind of discrete uniform distribution has a number of characteristics listed in Table 8.1. I will explain each of them in the sections that follow. As we go, I will also explain the mathematical notation. For example, a \in \{\ldots,-1,0,1,\ldots\} means that a is an integer because \in means is a member of and \{\ldots,-1,0,1,\ldots\} is the set of all integers.¹ x \in \{a,a+1,\ldots,b\} means that the each member of the sample space x is a member of the set of integers that include a, b, and all the integers between a and b. The notation for the probability mass function and the cumuluative distribution function function will be explained later in this chapter.

¹ Notation note: Sometimes the set of all integers is referred to with the symbol \mathbb{Z}.

Feature	Symbol
Lower Bound	a \in \{\ldots,-1,0,1,\ldots\}
Upper Bound	b \in \{a + 1, a + 2, \ldots\}
Sample Space	x \in\{a, a + 1,\ldots,b\}
Number of points	n=b-a+1
Mean	\mu=\frac{a+b}{2}
Variance	\sigma^2=\frac{n^2-1}{12}
Skewness	\gamma_1=0
Kurtosis	\gamma_2=-\frac{6(n^2+1)}{5(n^2-1)}
Probability Mass Function	f_X(x;a,b)=\frac{1}{n}
Cumulative Distribution Function	F_X(x;a,b)=\frac{x-a+1}{n}

Table 8.1: Features of Discrete Uniform Distributions

8.5.2 Parameters of Random Variables

The lower bound a and the upper bound b are the discrete uniform distribution’s parameters. The word parameter has many meanings, but here it refers to a characteristic of a distribution family that helps us identify precisely which member of the family we are talking about. Most distribution families have one, two, or three parameters.

A parameter is a defining feature of a random variable’s probability distribution.

If you have taken an algebra class, you have seen parameters before, though the word parameter may not have been used. Think about the formula of a line:

y=mx+b

Both x and y are variables, but what are m and b? Well, you probably remember that m is the slope of the line and that b is the y-intercept. If we know the slope and the intercept of a line, we know exactly which line we are talking about. No additional information is needed to graph the line. Therefore, m and b are the line’s parameters, because they uniquely identify the line.² All lines have a lot in common but there is an infinite variety of lines because the parameters, the slope and the intercept, can take on the value of any real number. Each unique combination of parameter values (slope and intercept) will produce a unique line. So it is with probability distribution families. All family members are alike in many ways, but they also differ because their differing parameters alter the shape of the distributions.

² What about other mathematical functions? Do they have parameters? Yes! Most do! For example, in the equation for a parabola (y=ax^2+bx+c), a, b, and c determine its precise shape.

³ If we allow the lower bound to be any real number and the spacing to be any positive real number, the discrete uniform distribution can be specified by three parameters: the lower bound a, the spacing between numbers k (k>0), and the number of points n (n>1). The upper bound b of such a distribution would be b=a+k(n-1)

The discrete uniform distribution (i.e., the typical variety consisting of consecutive integers) is defined by the lower and upper bound. Once we know the lower bound and the upper bound, we know exactly which distribution we are talking about.³ Not all distributions are defined by their lower and upper bounds. Indeed, many distribution families are unbounded on one or both sides. Therefore, other features are used to characterize the distributions, such as the population mean (\mu).

8.5.3 Probability Mass Functions

Many distribution families are united by the fact that their probability distributions are generated by a particular mathematical function. For discrete distributions, those functions are called probability mass functions. In general, a mathematical function is an expression that takes one or more constants (i.e., parameters) and one or more input variables, which are then transformed according to some sort of rule to yield a single number.

A probability mass function is a mathematical expression that gives the probability that a discrete random variable will equal a particular element of the variable’s sample space.

A probability mass function transforms a random variable’s sample space elements into probabilities. In Figure 8.3, the probability mass function can be thought of as the arrows between the sample space and the probabilities. That is, the probability mass function is the thing that was done to the sample space elements to calculate the probabilities. In Figure 8.3, each outcome of a throw of the the die was mapped onto a probability of ⅙. Why ⅙, and not some other number? The probability mass function of the discrete uniform distribution tells us the answer.

The probability mass function of the discrete uniform distribution is fairly simple but the notation can be intimidating at first (Figure 8.5). By convention, a single random variable is denoted by a capital letter X. Any particular value of X in its sample space is represented by a lowercase x. In other words, X represents the variable in its totality whereas x is merely one value that X can take on. Confusing? Yes, statisticians work very hard to confuse us—and most of the time they succeed!

Figure 8.5: Probability mass functions tell us how probable each sample space element is. That is, they are functions that convert samples spaces (\boldsymbol{x}) into probabilities (\boldsymbol{p}) according to specific parameters (\boldsymbol{\theta}).

R Code for Figure 8.5

whitespace <- function(
    size = 10, 
    text = ".", 
    color = "white") {
  paste0("<span style='color:",
         color,
         "; font-size:",
         size, 
         "pt;'>",
         text,
         "</span>")
}

xr <- ob_rectangle(
  center = ob_point(0, 0), 
  width = 1, 
  height = 1/4, 
  label = ob_label(
    label = "***x*** = *x*~1~, *x*~2~, *x*~3~, …, *x*~*n*~", 
    size = 20),
  corner_radius = unit(5, "pt"))

ss_label <- ob_label("**Sample Space**", 
                     xr@north, vjust = -.2, 
                     size = 24)

pr <- ob_rectangle(
  center = ob_point(2, 0), 
  width = 1, 
  height = 1/4, 
  family = "serif",
  label = ob_label(
    "***p*** = *p*~1~, *p*~2~, *p*~3~, …, *p*~*n*~", 
    size = 20),
  corner_radius = unit(5, "pt"))

p_label <- ob_label(
  label = "**Probabilities**", 
  p = pr@north, 
  vjust = -.2, 
  size = 24)

pmf_label <- ob_label(
  label = "**Mass Function**", 
  p = midpoint(ss_label@p, p_label@p), 
  vjust = -.2, 
  size = 24,
  color = myfills[1])

th <- "<span style='font-family: serif'>&theta;<span>"

thetar <- ob_rectangle(
  pmf_label@p + ob_point(0,-.8),  
  width = 1, 
  height = 1/4, 
  label = ob_label(
    paste0(
      "***",
      th,
      "*** = ",
      paste0(
        "*", 
        th, 
        "*~", 
        1:3, 
        "~, ", 
        collapse = ""),
      " …, *",
      th,
      "*~*n*~"), 
    size = 20,
    vjust = .55),
  corner_radius = unit(5, "pt"))

th_label <- ob_label(
  label = "**Parameters**", 
  p = thetar@north, 
  vjust = -.2, 
  size = 24)

fx_label <-  ob_label(
    "***f***~*X*~",
    midpoint(xr@center, pr@center),
    color = "white",
    fill = NA,
    size = 18,
    vjust = .55
  ) 

ggdiagram(font_family = bfont) +
  xr +
  ss_label +
  pr +
  p_label +
  pmf_label +
  ob_label(
    "**Probability**",
    pmf_label@p,
    vjust = -1.15,
    size = 24,
    color = myfills[1]
  ) +
  connect(
    xr,
    pr,
    resect = 7.5,
    linewidth = 15,
    length_head = .85,
    arrow_head = ggarrow::arrow_head_wings(
      offset = 45, 
      inset = 45),
    color = myfills[1]
  ) +
  fx_label + 
  thetar + 
  th_label +
  connect(th_label@p, 
          fx_label@p, 
          resect_head = 9, 
          resect_fins = 14, 
          linewidth = 1.25, 
          length_head = 4) +
  ob_label(
    paste0(
      "*f<sub>X",
      whitespace(),
      "</sub>*(",
      whitespace(1),
      "***x***; ***", 
      th, 
      "***",
      whitespace(2),
      ") = ***p***"),
    p = pmf_label@p + ob_point(0,.5), 
    size = 20)

The probability mass function of random variable X is denoted by f_X(x). This looks strange at first. It means, “When random variable X generates a number, what is the probability that the outcome will be a particular value x?” That is, f_X(x)=P(X=x), where P means “What is the probability that…?” Thus, P(X=x) reads, “What is the probability that random variable X will generate a number equal to a particular value x?” So, f_X(7) reads, “When random variable X generates a number, what is the probability that the outcome will be 7?”

Most probability mass functions also have parameters, which are listed after a semi-colon. In the case of the discrete uniform distribution consisting of consecutive integers, the lower and upper bounds a and b are included in the function’s notation like so: f_X(x;a,b). This reads, “For random variable X with parameters a and b, what is the probability that the outcome will be x?” Some parameters can be derived from other parameters, as was the case with the number of points n in the sample space of a discrete uniform distribution: n=b-a+1. The probability for each outcome in the sample space is the same and there are n possible outcomes. Therefore, the probability associated with each outcome is \frac{1}{n}.

Putting all of this together, if a and b are integers and a<b, for all n integers x between a and b, inclusive:

\begin{aligned} f_X\left(x;a,b\right)&=\frac{1}{b-a+1}\\[2ex] &=\frac{1}{n} \end{aligned}

Symbol	Meaning
X	A random variable with a discrete uniform distribution
f_X	The probability mass function of X
x	Any particular member of the sample space of X
a	The lower bound of the sample space
b	The upper bound of the sample space
n	b-a+1 (The number of points in the sample space)

You might notice that x is not needed to calculate the probability. Why? Because this is a uniform distribution. No matter which sample space element x we are talking about, the probability associated with it is always the same. In distributions that are not uniform, the position of x matters and thus influences the probability of its occurrence.

8.5.4 Cumulative Distribution Functions

The cumulative distribution function tells us where a sample space element ranks in a distribution. Whereas the probability mass function tells us the probability that a random variable will generate a particular number, the cumulative distribution function tells us the probability that a random variable will generate a particular number or less.

A cumulative distribution function is a mathematical expression that gives the probability that a random variable will equal a particular element of the variable’s sample space or less.

F_X(x) = P(X \le x)=p

Figure 8.6: The cumulative distribution function of the roll of a die is F_X(x)=\frac{x}{6}

The cumulative distribution function of the roll of a die (Figure 8.6) tells us that the probability of rolling at least a 4 is 4⁄6 (i.e., ⅔).

R Code for Figure 8.6

r <- 1.85
space <- 1.4

make_e_center <- \(x0,y0, r, space) {
    cc <- "black"
    if (y0 <= x0) cc <- cc <- "#4169e1"
    x0 <- x0 * (2 * r + space)
    y0 <- y0 * (2 * r + space)
p <- tibble(x = x0, y = y0) %>% 
  select(x,y) %>% 
  ob_point()
p %>% set_props(color = cc)
  }

e_center <- crossing(x0 = 1:6, y0 = 1:6, r = r, space = space) %>% 
  pmap(make_e_center) %>% 
  bind()



p_center <- crossing(x0 = 1:6, y0 = 1:6, r = r, space = space) %>% 
  pmap(\(x0,y0, r, space) {
    cc <- "black"
    if (y0 <= x0) cc <- "#4169e1"
    i = y0
    x0 <- x0 * (2 * r + space)
    y0 <- y0 * (2 * r + space)
    p <- makedice(i,i, F) %>% 
      mutate(y = y + y0 ,
             x = x + x0) %>% 
      select(x,y) %>% 
      ob_point()
    p %>% set_props(color = cc)
  }) %>% 
  bind()

# shape list
ggdiagram(bfont) +
  ob_circle(p_center,
            radius = .35,
            fill = p_center@color,
            color = NA) +
  ob_ellipse(e_center, a = r, b = r, m1 = 12) +
  geom_path(
    mapping = aes(x, y),
    data = tibble(
      x0 = c(rep(1:6, each = 2)[-12], 6, 6, 1),
      y0 = c(rep(1:6, each = 2)[-1], 6, 1, 1),
      r = r,
      space = space,
      corner = c(rep(c(
        "northwest", "southeast"
      ), 5), "northwest", "northeast", "southeast", "southwest"),
      revsort = c(rep(c(-1, 1), 5), -1, -1, -1, -1)
    ) %>%
      pmap_df(\(x0, y0, r, space, corner, revsort) {
        p <- make_e_center(x0, y0, r, space)
        l <- c(
          northeast = 0,
          northwest = 90,
          southwest = 180,
          southeast = 270
        )
        h <- c(
          northeast = 90,
          northwest = 180,
          southwest = 270,
          southeast = 360
        )
        e <- ob_ellipse(p,
                        a = r + space / 2,
                        b = r + space / 2,
                        m1 = 12)
        e@polygon %>%
          dplyr::mutate(diagonal = x0 == y0) %>%
          dplyr::filter((diagonal &
                           dplyr::between(degree, l[corner], h[corner])) |
                          (!diagonal & dplyr::between(degree, 270, 360))) %>%
          dplyr::mutate(new_degree = revsort * degree) %>%
          dplyr::arrange(new_degree)
      }),
    color = myfills[2],
    linewidth = 1
  ) + 
  {x_label <- ob_label(
    1:6,
    p = purrr::pmap(tibble(
      x0 = 1:6,
      y0 = 1,
      r = r,
      space = space
    ), make_e_center) %>% bind() + ob_point(0, -4),
    color = "black",
    size = 20
  )} + 
  {y_label <- ob_label(
    paste0(1:6, "&frasl;6"),
    p = purrr::pmap(tibble(
      y0 = 1:6,
      x0 = 1,
      r = r,
      space = space
    ), make_e_center) %>% bind() + ob_point(-4, 0),
    color = "black",
    size = 24
  )} +
  (x_label@p@bounding_box@south + ob_point(0, -2))@label(
    "Die roll is this value or less", 
    size = 24) +
  (y_label@p@bounding_box@west + ob_point(-2,0))@label(
    "Probability", 
    size = 24, 
    angle = 90)

The cumulative distribution function is often distinguished from the probability mass function with a capital F instead of a lowercase f. In the case of a discrete uniform distribution consisting of n consecutive integers from a to b, the cumulative distribution function is:

\begin{align*} F_X(x;a,b)=\frac{x-a+1}{b-a+1}\\[2ex] =\frac{x-a+1}{n} \end{align*}

Symbol	Meaning
X	A random variable with a discrete uniform distribution
F_X	The cumulative distribution function of X
x	Any particular member of the sample space of X
a	The lower bound of the sample space
b	The upper bound of the sample space
n	b-a+1 (The number of points in the sample space)

In the case of the the six-sided die, the cumulative distribution function is

\begin{aligned} F_X(x;a=1,b=6)&=\frac{x-a+1}{b-a+1}\\[2ex] &=\frac{x-1+1}{6-1+1}\\[2ex] &=\frac{x}{6} \end{aligned}

The cumulative distribution function is so-named because it adds all the probabilities in the probability mass function up to and including a particular member of the sample space. Figure 8.7 shows how the each probability in the cumulative distribution function of the roll of a six-sided die is the sum of the current and all previous probabilities in the probability mass function.

Figure 8.7: The cumulative distribution function is the sum of the current and all previous elements of the probability mass function.

R Code for Figure 8.7

# The cumulative distribution function is the sum of the current 
# and all previous elements of the probability mass function
p <- crossing(id = 1:6,
         x = 1:6) %>% 
  mutate(pmf = 1 / 6) %>% 
  mutate(cdf = ifelse(id < x, 1 / 6, x / 6)) %>% 
  ggplot(aes(x = x, y = cdf)) + 
  geom_segment(aes(yend = cdf - pmf, xend = x), 
               color = myfills[2]) + 
  geom_segment(aes(y = 0, yend = pmf, xend = x), 
               color = myfills[1]) + 
  geom_line(aes(y = pmf), lty = "dotted", color = myfills[1]) +
  geom_line(data = . %>% dplyr::filter(x <= id), 
            lty = "dotted", 
            color = myfills[2] ) +
  geom_point(aes(y = pmf),
             color = myfills[1],
             size = 5) +
  geom_point(data = . %>% dplyr::filter(x <= id), 
             color = myfills[2],
             size = 3.5) +
  scale_x_continuous("Sample Space", 
                     breaks = 1:6,
                     expand = c(0.03,0),
                     minor_breaks = NULL) +
  scale_y_continuous("Probability",
                     breaks = 0:6 / 6, 
                     labels = c(0, paste0(1:5, "/", 6),1),
                     expand = c(0.03,0),
                     minor_breaks = NULL) +
  theme_minimal(base_size = 18, base_family = bfont) +
  coord_fixed(6) + 
  annotate("point",
           size = 5, 
           x = 1.2,  
           y = 5.33 / 6, 
           color = myfills[1]) + 
  annotate("point",
           size = 3.5, 
           x = 1.2,  
           y = 5.66 / 6, 
           color = myfills[2])  + 
  annotate("label",
           size = 6, 
           x = 1.33,  
           y = 5.33 / 6, 
           color = myfills[1],
           label = "Probability Mass Function",
           hjust = 0,
           family = bfont, 
           label.padding = unit(0, "lines"), 
           label.size = 0
           ) + 
  annotate("label",
           size = 6, 
           x = 1.33,  
           y = 5.66 / 6, 
           color = myfills[2],
           label = "Cumulative Distribution Function",
           hjust = 0,
           family = bfont, 
           label.padding = unit(0, "lines"), 
           label.size = 0) +
  transition_states(id,2,1) +
  ease_aes("sine-in-out")

animate(p, 
        device = "svg",
        renderer = magick_renderer(), 
        width = 8,
        height = 8)

8.5.5 Quantile functions

The inverse of the cumulative distribution function is the quantile function. The cumulative distribution starts with a value x in the sample space and tells us p, the proportion of values in that distribution that are less than or equal to x. A quantile function starts with a proportion p and tells us the value x that splits the distribution such that the proportion p of the distribution is less than or equal to x.

A quantile function tells us which value in the sample space of a random variable is greater than a particular proportion of the values the random variable generates.

Figure 8.8: The quantile function is the inverse of the cumulative distribution function: Just flip the X and Y axes!

As seen in Figure 8.8, if you see a graph of a continuous distribution function, just flip the X and Y axes, and you have a graph of a quantile function.

R Code for Figure 8.8

# The quantile function is the inverse of the cumulative distribution function

d <- tibble(x = seq(-4, 4, 0.01)) %>%
  mutate(p = pnorm(x))

p1 <- ggplot(d, aes(x, p)) +
  geom_arrow(
    arrow_head = my_arrowhead,
    arrow_fins = my_arrowhead,
    color = myfills[1],
    lwd = 1
  ) +
  scale_x_continuous("Sample Space (*x*)", labels = \(x) signs::signs(x, accuracy = 1)) +
  scale_y_continuous(paste0("Proportion (",whitespace(10),"*p*",whitespace(6),")"), labels = prob_label) +
  theme_minimal(base_size = 26, base_family = bfont) +
  coord_fixed(8) +
  annotate(
    "richtext",
    x = -2,
    y = .5 + .25 / 4,
    label = paste0("*X* ~ ",
                   span_style("N", style = "font-family:'Lucida Calligraphy'"),
                   whitespace(12),
                   "(0, 1<sup>2</sup>",whitespace(8),")"),
    family = bfont,
    label.size = 0,
    label.padding = unit(0, "mm"),
    size = ggtext_size(26)
  ) +
  ggtitle(paste0("Cumulative Distribution Function *F<sub>X",whitespace(8),"</sub>*(*x*) = *p*")) +
  theme(
    plot.title = ggtext::element_markdown(size = 26 * 0.8), 
    plot.title.position = "plot",
    axis.title.x = ggtext::element_markdown(),
    axis.title.y = ggtext::element_markdown(),
    axis.text.x = element_text(hjust = c(0.7, 0.7, 0.5, 0.5, 0.5))
  )

p2 <- ggplot(d, aes(p, x)) +
  geom_arrow(
    arrow_head = my_arrowhead,
    arrow_fins = my_arrowhead,
    color = myfills[1],
    lwd = 1
  ) +
  scale_y_continuous("Sample Space (*x*)", labels = \(x) signs::signs(x, accuracy = 1)) +
  scale_x_continuous(paste0("Proportion (",whitespace(16),"*p*",whitespace(4),")"), labels = prob_label) +
  theme_minimal(base_size = 26, base_family = bfont) +
  coord_fixed(1 / 8) +
  annotate(
    "richtext",
    y = 0.5,
    x = .25,
    label = paste0("*X* ~ ",
                   span_style("N", style = "font-family:'Lucida Calligraphy'"),
                   whitespace(12),
                   "(0, 1<sup>2</sup>",whitespace(8),")"),
    family = bfont,
    label.size = 0,
    label.padding = unit(0, "mm"),
    size = ggtext_size(26)
  ) +
  ggtitle(paste0("Quantile Function *Q*",whitespace(2),"<sub>*X*",whitespace(8),"</sub>(",whitespace(12),"*p*",whitespace(8),") = *x*")) +
  theme(plot.title = ggtext::element_markdown(size = 26 * 0.8), 
        plot.title.position = "plot",
        axis.title.x = ggtext::element_markdown(),
        axis.title.y = ggtext::element_markdown())

p1 + p2

8.5.6 Generating a Random Sample in R

In R, the sample function generates numbers from the discrete uniform distribution.

set.seed(1)
# n = the sample size
n <- 6000
# a = the lower bound
a <- 1
# b = the upper bound
b <- 6
# The sample space is the sequence of integers from a to b
sample_space <- seq(a, b)
# X = the sample with a discrete uniform distribution
# The sample function selects n values
# from the sample space with replacement at random
X <- sample(sample_space, 
            size = n, 
            replace = TRUE)

Figure 8.9: Frequency distribution of a discrete uniform random variable from 1 to 6 (n = 6,000)

R Code for Figure 8.9

# Frequency distribution of a discrete uniform random variable from 1 to 6 
tibble(X = factor(X)) %>% 
  group_by(X) %>% 
  summarise(Frequency = n()) %>% 
  ggplot(aes(X,Frequency, fill = X)) + 
  geom_col(width = 0.7, fill = myfills[1]) + 
  geom_label(aes(label = Frequency), 
             vjust = -0.3, 
             label.size = 0,
             label.padding = unit(0,"mm"), 
             family = bfont,
             size = 8,
             color = "gray30",
             fill = "white") + 
  theme_minimal(base_size = 28, 
                base_family = bfont) + 
  scale_y_continuous("Count", 
                     expand = expansion(c(0,0.075)), 
                     breaks = seq(0,1000,200)) + 
  scale_x_discrete(NULL) +
  theme(panel.grid.major.x = element_blank(), 
        legend.position = "none")

The frequencies of the random sample can be seen in Figure 8.9. Because of sampling error, the frequencies are approximately the same, but not exactly the same. If the sample is larger, the sampling error is smaller, meaning that the sample’s characteristics will tend to more closely resemble the population characteristics. In this case, a larger sample size will produce frequency counts that will appear more even in their magnitude. However, as long as the sample is smaller than the population, sampling error will always be present. With random distributions, the population is assumed to be infinitely large, and thus sampling error at best becomes negligibly small.

Samples imperfectly represent the population from which they are drawn. Sampling error refers to differences between sample statistics and population parameters.

8.5.7 Bernoulli Distributions

Feature	Symbol
Sample Space:	x \in \{0,1\}
Probability that x=1	p \in {[0,1]}
Probability that x=0	q = 1 - p
Mean	\mu = p
Variance	\sigma^2 = pq
Skewness	\gamma_1 = \frac{1 - 2p}{\sqrt{pq}}
Kurtosis	\gamma_2 = \frac{1}{pq} - 6
Probability Mass Function	f_X(x;p) = p^xq^{1 - x}
Cumulative Distribution Function	F_X(x;p) = x+p(1 - x)

Table 8.2: Features of Bernoulli Distributions
Notation note: Whereas {a,b} is the set of just two numbers, a and b, [a,b] is the set of all real numbers between a and b.

The toss of a single coin has the simplest probability distribution that I can think of—there are only two outcomes and each outcome is equally probable (Figure 8.10). This is a special case of the Bernoulli distribution. Jakob Bernoulli (Figure 8.11) was a famous mathematician from a famous family of mathematicians. The Bernoulli distribution is just one of the ideas that made Jakob and the other Bernoullis famous.

In the Bernoulli distribution, there are only two outcomes: a “success” (1) and a “failure” (0). If a success has a probability p then a failure has a probability of q = 1 - p.

Figure 8.10: The probability distribution of a coin toss

R Code for Figure 8.10

ggdiagram(bfont) +
  ggimage::geom_image(data = tibble(
    image = paste0("images/Quarter", 
                   c("Heads", "Tails"), 
                   ".png"), 
    x = 0, 
    y = c(2,1)),
    aes(x,y, image = image), 
    size = .35) +
  ob_label(c("Sample Space", "Probability"), 
           ob_point(c(0,1.35),2.5), 
           size = 24, 
           vjust = 0) +
  {c1 <- ob_circle(ob_point(x = 0, 
                            y = c(2,1)), 
                   radius = .35, 
                   color = NA, 
                   fill = NA)} +
  {l1 <- ob_label(
    "&frac12;",
    ob_point(x = 1.35, 
             y = c(2,1)), 
    size = 36)} +
  connect(
    c1,
    l1@p ,
    linewidth = 2,
    resect_fins = 5,
    resect_head = 10,
    length_head = 4,
    color = "gray70"
  ) +
  scale_x_continuous(NULL, expand = expansion(add = c(.3, .5))) +
  scale_y_continuous(NULL, expand = expansion(add = c(-.1, .1)))

Figure 8.11: Jakob Bernoulli (1654–1705)
Image Credits

The Bernoulli distribution can describe any random variable that has two outcomes, one of which has a probability p and the other has a probability q=1-p. In the case of a coin flip, p=0.5. For other variables with a Bernoulli distribution, p can range from 0 to 1.

In psychological assessment, many of the variables we encounter have a Bernoulli distribution. In ability test items in which there is no partial credit, examinees either succeed or fail. The probability of success on an item (in the whole population) is p. In other words, p is the proportion of the entire population that correctly answers the question. Some ability test items are very easy and the probability of success is high. In such cases, p is close to 1. When p is close to 0, few people succeed and items are deemed hard. Thus, in the context of ability testing, p is called the difficulty parameter. This is confusing because when p is high, the item is easy, not difficult. Many people have suggested that it would make more sense to call it the easiness parameter, but the idea has never caught on.

The difficulty parameter is the proportion of people who succeed on an ability item or endorse a yes/no questionnaire item.

True/False and Yes/No items on questionnaires also have Bernoulli distributions. If an item is frequently endorsed as true (“I like ice cream.”), p is high. If an item is infrequently endorsed (“I like black licorice and mayonnaise in my ice cream.”), p is very low. Oddly, the language of ability tests prevails even here. Frequently endorsed questionnaire items are referred to as “easy” and infrequently endorsed items are referred to as “difficult,” even though there is nothing particularly easy or difficult about answering them either way.

8.5.7.1 Generating a Random Sample from the Bernoulli Distribution

Figure 8.12: Counts of a Random Variable with a Bernoulli Distribution (p = 0.8, n = 1000)

In R, there is no specialized function for the Bernoulli distribution because it turns out that the Bernoulli distribution is a special case of the binomial distribution, which will be described in the next section. With the function rbinom, we can generate data with a Bernoulli distribution by setting the size parameter equal to 1.

# n = sample size
n <- 1000
# p = probability
p <- 0.8
# X = sample
X <- rbinom(n, size = 1, prob = p)
# Make a basic plot
barplot(table(X))

In Figure 8.12, we can see that the random variable generated a sequence that consists of about 80% ones and 20% zeroes. However, because of sampling error, the results are rarely exactly what the population parameter specifies.

R Code for Figure 8.12

# Counts of a Random Variable with a Bernoulli Distribution
set.seed(4)
# n = sample size
n <- 1000
# p = probability
p <- 0.8
# X = sample
X <- rbinom(n, size = 1, prob = p)
# Make a basic plot
ggplot(tibble(X = factor(X)), aes(X)) +
  geom_bar(fill = myfills[1]) +
  scale_x_discrete(NULL) + 
  scale_y_continuous(NULL, expand = expansion(c(0.01,0.1))) +
  geom_text(aes(label = after_stat(count)), 
            stat = "count", 
            vjust = -0.4, 
            size = ggtext_size(18),
            family = bfont,
            color = "gray10") + 
    theme_minimal(18, bfont) + 
  theme(panel.grid.major.x = element_blank())

8.5.8 Binomial Distributions

Feature	Symbol
Number of Trials	n \in \{1,2,3,\ldots\}
Sample Space	x \in \{0,...,n\}
Probability of success in each trial	p \in [0,1]
Probability of failure in each trial	q = 1 - p
Mean	\mu = np
Variance	\sigma = npq
Skewness	\gamma_1 = \frac{1-2p}{\sqrt{npq}}
Kurtosis	\gamma_2 = \frac{1}{npq} - \frac{6}{n}
Probability Mass Function	f_X(x;n,p)=\frac{n!}{x!\left(n-x\right)!}p^x q^{n-x}
Cumulative Distribution Function	F_X(x;n,p)=\sum_{i=0}^{x}{\frac{n!}{i!(n-i)!} p^i q^{n-i}}

Table 8.3: Features of Binomial Distributions

Figure 8.13: Probability distribution of the number of heads observed when two coins are tossed

Let’s extend the idea of coin tosses and see where it leads. Imagine that two coins are tossed at the same time and we count how many heads there are. The outcome we might observe will be zero, one, or two heads. Thus, the sample space for the outcome of the tossing of two coins is the set \{0,1,2\} heads. There is only one way that we will observe no heads (both coins tails) and only one way that we will observe two heads (both coins heads). In contrast, as seen in Figure 8.13, there are two ways that we can observe one head (heads-tails & tails-heads).

R Code for Figure 8.13

h <- dbinom(c(0,1,2), 2, .5)
nudged <- c(.7, .3)
tibble(x = c(0,1,2),
       y = dbinom(x, 2, .5)) %>%
  # mutate(x = factor(x)) %>% 
ggplot(aes(x,y)) +
  ob_rectangle(ob_point(c(0,1,2) , h / 2), 
               height = h, 
               width = .96, 
               corner_radius = unit(5,"pt"), 
               fill = myfills[1], 
               color = NA) +
  ggimage::geom_image(aes(x,y, image = i), 
                      data = tibble(x = c(0,0,1,1,1,1,2,2),
                                    y = c(.25 * nudged, .25 * nudged, .25 + .25 * nudged, .25 * nudged),
                                    i = paste0("images/Quarter",
                                               c("Heads", "Tails"), 
                                               ".png")[c(2,2,2,1,1,2,1,1)]),
                      size = .25) +
  theme_minimal(base_size = 32, base_family = bfont) +
  theme(panel.grid = element_blank(), axis.line.y = element_line("gray40", linewidth = .5), axis.ticks = element_line("gray40", linewidth = .5)) + 
  scale_x_continuous("Number of Heads") +
  scale_y_continuous(NULL, breaks = c(0,.25, .5), labels = WJSmisc::prob_label, expand = expansion(mult = c(0,.01)))

The probability distribution of the number of heads observed when two coins are tossed at the same time is a member of the binomial distribution family. The binomial distribution occurs when independent random variables with the same Bernoulli distribution are added together. In fact, Bernoulli discovered the binomial distribution as well as the Bernoulli distribution.

Two random variable are said to be independent if the outcome of one variable does not alter the probability of any outcome in the other variable.

Imagine that a die is rolled 10 times and we count how often a 6 occurs.⁴ Each roll of the die is called a trial. The sample space of this random variable is \{0,1,2,...,10\}. What is the probability that a 6 will occur 5 times? or 1 time? or not at all? Such questions are answered by the binomial distribution’s probability mass function:

Every time a random variable generates a number, that instance of the variable is called a trial, which is also known as an experiment.

⁴ Wait! Hold on! I thought that throwing dice resulted in a (discrete) uniform distribution. Well, it still does. However, now we are asking a different question. We are only concerned with two outcomes each time the die is thrown: 6 and not 6. This is a Bernoulli distribution, not a uniform distribution, because the probability of the two events is unequal: {⅙,⅚}

f_X(x;n,p)=\frac{n!}{x!\left(n-x\right)!}p^x\left(1-p\right)^{n-x}

Symbol	Meaning
X	The random variable (the number of sixes from 10 throws of the die)
x	Any particular member of the sample space (i.e., x \in \{0,1,2,...,10\})
n	The number of times that the die is thrown (i.e., n=10)
p	The probability that a six will occur on a single throw of the die (i.e., p=\frac{1}{6})

Figure 8.14: The probability distribution of the number of sixes observed when a six-sided die is thrown 10 times.

Because n=10 and p=\frac{1}{6}, the probability mass function simplifies to:

\begin{equation*} f_X(x)=\frac{n!}{x!\left(n-x\right)!}\left(\frac{1}{6}\right)^x\left(\frac{5}{6}\right)^{10-x} \end{equation*}

If we take each element x of the sample space from 0 to 10 and plug it into the equation above, the probability distribution will look like Figure 8.14.

R Code

# Probability of sixes

tibble(sample_space = 0:10) %>% 
  mutate(
    probability = dbinom(
      sample_space, 
      size = max(sample_space), 
      prob = 1 / 6),
    probabiltiy_label = prob_label(probability, 
                                   digits = 2)) %>% 
  ggplot(aes(factor(sample_space), probability)) +
  geom_col(fill = myfills[1]) + 
  ggtext::geom_richtext(
    aes(label = probabiltiy_label),
    size = ggtext_size(30),
    angle = 90, 
    hjust = 0,
    family = bfont, 
    label.margin = unit(c(0,0,0,1), "mm"),
    label.padding = unit(c(1.6,0,0,0),"mm"), 
    label.colour = NA,
    color = "gray40") +
  theme_minimal(base_family = bfont, base_size = 30) + 
  scale_y_continuous("Probability", 
                     expand = expansion(c(0,.09)), 
                     labels = prob_label) + 
  scale_x_discrete("Sample Space (Number of Sixes)") + 
  theme(panel.grid.major.x = element_blank())

8.5.8.1 Clinical Applications of the Binomial Distribution

When would a binomial distribution be used by a clinician? One particularly important use of the binomial distribution is in the detection of malingering. Sometimes people pretend to have memory loss or attention problems in order to win a lawsuit or collect insurance benefits. There are a number of ways to detect malingering but a common method is to give a very easy test of memory in which the person has at least a 50% chance of getting each test item correct even if the person guesses randomly.

A person who malingers is pretending to be sick to avoid work or some other responsibility.

Suppose that there are 20 questions. Even if a person has the worst memory possible, that person is likely to get about half the questions correct. However, it is possible for someone with a legitimate memory problem to guess randomly and by bad luck answer fewer than half of the questions correctly. Suppose that a person gets 4 questions correct. How likely is it that a person would, by random guessing, only answer 4 or fewer questions correctly?

We can use the binomial distribution’s cumulative distribution function. However, doing so by hand is rather tedious. Using R, the answer is found with the pbinom function:

p <- pbinom(4,20,0.5)

We can see that the probability of randomly guessing and getting 4 or fewer items correct out of 20 items total is approximately 0.006, which is so low that the hypothesis that the person is malingering seems plausible. Note here that there is a big difference between these two questions:

If the person is guessing at random (i.e., not malingering), what is the probability of answering correctly 4 questions or fewer out of 20?
If the person answers 4 out of 20 questions correctly, what is the probability that the person is guessing at random (and therefore not malingering)?

Here we answer only the first question. It is an important question, but the answer to the second question is probably the one that we really want to know. We will answer it in another chapter when we discuss positive predictive power. For now, we should just remember that the questions are different and that the answers can be quite different.

8.5.8.2 Graphing the binomial distribution

Suppose that there are n=10 trials, each of which have a probability of p=0.8. The sample space is the sequence of integers from 0 to 10, which can be generated with the seq function (i.e., seq(0,10)) or with the colon operator 0:10. First, the sample space is generated (a sequence from 0 to 10.), using the seq function. The associated probability mass function probabilities are found using the dbinom function. The cumulative distribution function probabilities are found using the pbinom function.

# Make a sequence of numbers from 0 to 10
SampleSpace <- seq(0, 10)
# Probability mass distribution for 
# binomial distribution (n = 10, p = 0.8)
pmfBinomial <- dbinom(SampleSpace, 
                      size = 10, 
                      prob = 0.8)
# Generate a basic plot of the 
# probability mass distribution
plot(pmfBinomial ~ SampleSpace, 
     type = "b")
# Cumulative distribution function 
# for binomial distribution (n = 10, p = 0.8)
cdfBinomial <- pbinom(SampleSpace, 
                      size = 10, 
                      prob = 0.8)

Figure 8.15: Basic plot of a binomial probability mass function

# Cumulative distribution function 
# for binomial distribution (n = 10, p = 0.8)
cdfBinomial <- pbinom(SampleSpace, 
                      size = 10, 
                      prob = 0.8)
# Generate a basic plot of the
# binomial cumulative distribution function
plot(cdfBinomial ~ SampleSpace, 
     type = "b")

Figure 8.16: Basic plot of a binomial cumulative distribution function

However, making the graph look professional involves quite a bit of code that can look daunting at first. However, the results are often worth the effort. Try running the code below to see the difference. For presentation-worthy graphics, export the graph to the .pdf or .svg format. An .svg file can be imported directly into MS Word or MS PowerPoint.

Figure 8.17: Probability Mass Function and Cumulative Distribution Function of the Binomial Distribution (n = 10,~p = 0.8)

R Code for Figure 8.17

# Probability Mass Function and Cumulative Distribution 
# Function of the Binomial Distribution
tibble(SampleSpace = 0:10,
       pmf = dbinom(SampleSpace, 10, 0.8),
       cdf = pbinom(SampleSpace, 10, 0.8)) %>%
  pivot_longer(-SampleSpace, 
               names_to = "Function", 
               values_to = "Proportion") %>% 
  mutate(Function = factor(Function, 
                           levels = c("pmf", "cdf")) ) %>%
  arrange(desc(Function)) %>%
  ggplot(aes(x = SampleSpace,
             y = Proportion,
             color = Function)) +
  geom_pointline(lty = "dotted", distance = unit(3, "mm")) +
  geom_point(aes(size = Function)) +
  theme_minimal(base_family = "serif",
                base_size = 18) +
  scale_color_manual(values = myfills) +
  scale_x_continuous("Sample Space",
                     breaks = seq(0, 10, 2),
                     expand = expansion(0.02)) +
  scale_y_continuous(expand = expansion(0.02),
                     labels = prob_label) +
  scale_size_manual(values = c(3, 4.5)) +
  theme(legend.position = "none") +
  annotate(
    x = 10 - 0.16,
    y = dbinom(10, 10, 0.8),
    geom = "label",
    label = "Probability Mass Function",
    hjust = 1,
    size = 4.75,
    color = myfills[1],
    label.size = 0,
    family = "serif",
    label.padding = unit(0, "mm")
  ) +
  annotate(
    x = 10 - 0.2,
    y = 1,
    geom = "label",
    label = "Cumulative Distribution Function",
    hjust = 1,
    size = 4.75,
    color = myfills[2],
    label.size = 0,
    family = "serif",
    label.padding = unit(0, "mm")
  )

8.5.9 Poisson Distributions

Feature	Symbol
Parameter	\lambda \in (0,\infty)
Sample Space	x\in \{0,1,2,\ldots\}
Mean	\mu = \lambda
Variance	\sigma^2 = \lambda
Skewness	\gamma_1 = \frac{1}{\sqrt{\lambda}}
Kurtosis	\gamma_2 = \frac{1}{\lambda}
Probability Mass Function	f_X(x;\lambda) = \frac{\lambda^x}{e^{\lambda} x!}
Cumulative Distribution Function	F_X(x;\lambda) = \sum_{i=0}^{x}{\frac{\lambda^i}{e^{\lambda} i!}}

Table 8.4: Features of Poisson Distributions
Notation note: The notation (0,\infty) means all real numbers greater than 0.

Figure 8.18: Siméon Denis Poisson (1781–1840)
Image Credits

Imagine that an event happens sporadically at random and we measure how often it occurs in regular time intervals (e.g., events per hour). Sometimes the event does not occur in the interval, sometimes just once, and sometimes more than once. However, we notice that over many intervals, the average number of events is constant. The distribution of the number of events in each interval will follow a Poisson distribution. Although “Poisson” means “fish” in French, fish have nothing to do with it. This distribution was named after Siméon Denis Poisson, whose work on the distribution made it famous.

The Poisson distribution is a discrete distribution used to model how often an event will occur during a particular interval of time.

The Poisson distribution has a single parameter \lambda, the average number of events per time interval. Interestingly, \lambda is both the mean and the variance of this distribution. The distribution shape will differ depending on how long our interval is. If an event occurs on average 30 times per hour, \lambda = 30. If we count how often the event occurs in 10-minute intervals, the same event will occur about 5 times per interval, on average (i.e., \lambda = 1). If we choose to count how often the same event occurs every minute, then \lambda = 0.5.

Figure 8.19: The shape of the Poisson Distribution depends on the interval used for counting events. Here, the event occurs once per minute, on average.

R Code for Figure 8.19

# The shape of the Poisson Distribution
crossing(lambda = c(0.5, 5, 30), x = 0:60) %>%
  mutate(p = dpois(x, lambda)) %>%
  mutate(lambda = factor(lambda, labels = paste0(
    "Every ",
    c("30 Seconds", "5 Minutes", "Half Hour"),
    " (\u03BB = ",
    c(0.5, 5, 30),
    ")"
  ))) %>%
  ggplot(aes(x, p, color = lambda)) +
  geom_line(linetype = 3, linewidth = 0.1) +
  geom_point(size = 2.5, color = "white") +
  geom_point(size = 1) +
  facet_grid(lambda ~ ., scales = "free") +
  scale_x_continuous("Number of Events", breaks = seq(0, 120, 20)) +
  scale_color_manual(values = c(myfills[1], "darkorchid", myfills[2])) +
  theme_light(base_size = 20, base_family = bfont) +
  theme(legend.position = "none") +
  scale_y_continuous("Probability", labels = prob_label)

8.5.9.1 A clinical application of the the Poisson distribution

Suppose that you begin treating an adult male client who has panic attacks that come at unpredictable times. Some weeks there are no panic attacks and some weeks there are many, but on average he has 2 panic attacks each week. The client knows this because he has kept detailed records in a spreadsheet for the last 5 years. The client had sought treatment once before, but terminated early and abruptly because, according to him, “It wasn’t working.” After sensitive querying, you discover that he expected that treatment should have quickly reduced the frequency of panic attacks to zero. When that did not happen, he became discouraged and stopped the treatment.

Because your client is well educated and quantitatively inclined, you decide to to use the data he has collected as part of the intervention and also to help set a more realistic set of expectations. Obviously, you and your client both would prefer 0 panic attacks per week, but sometimes it takes more time to get to the final goal. We do not want to terminate treatment that is working just because the final goal has not yet been achieved.

You plot the frequency of how often he had 0 panic attacks in a week, 1 panic attack in a week, 2 panic attacks in a week, and so forth, as shown in red in Figure 8.20. Because you have read this book, you immediately recognize that this is a Poisson distribution with \lambda = 2. When you graph an actual Poison distribution and compare it with your client’s data, you see that it is almost a perfect match.⁵ Then you explain that although the goal is permanent cessation of the panic attacks, sometimes an intervention can be considered successful if the frequency of panic attacks is merely reduced. For example, suppose that in the early stages of treatment the frequency of panic attacks were reduced from twice per week to once every other week (\lambda = 0.5), on average. If such a reduction were achieved, there would still be weeks in which two or more panic attacks occur. According to Figure 8.23, this will occur about 9% of the time.

⁵ Note that I am not claiming that all clients’ panic attack frequencies have this kind of distribution. It just so happens to apply in this instance.

Figure 8.20: The variability of a hypothetical client’s panic attack frequency before and after treatment

R Code for Figure 8.20

# The variability of a hypothetical 
# client's panic attack frequency
d_label <- tibble(
  x = c(0, 2),
  Time = c("After", "Before"),
  Proportion = dpois(x, lambda = c(0.5, 2)),
  Label = c("After Treatment: \u03BB = 0.5",
            "Before Treatment: \u03BB = 2")
)

tibble(
  x = seq(0, 7),
  Before = dpois(x, lambda = 2),
  After = dpois(x, lambda = 0.5)
) %>%
  gather(key = Time, value = Proportion,-x) %>%
  ggplot(aes(x, Proportion, color = Time)) +
  geom_pointline(linetype = "dotted", linesize = .5, size = 3, distance = unit(2, "mm")) +
  # geom_point(size = 3) +
  geom_label(
    data = d_label,
    aes(label = Label),
    hjust = 0,
    nudge_x = 0.2,
    family = bfont,
    label.padding = unit(0, "lines"),
    label.size = 0,
    size = 6
  ) +
  scale_color_manual(values = myfills) +
  scale_x_continuous("Panic Attacks Per Week",
                     breaks = 0:7,
                     minor_breaks = NULL) +
  scale_y_continuous(labels = . %>% prob_label(., 0.1)) +
  theme_minimal(base_size = 20, base_family = bfont) +
  theme(legend.position = "none")

In R, you can use the dpois function to plot the Poisson probability mass function. For example, if the average number of events per time period is λ = 2, then the probability that there will be 0 events is dpois(x = 0, lambda = 2), which evaluates to 0.1353.

To calculate the cumulative distribution function of Poisson distribution in R, use the ppois function. For example, if we want to estimate the probability of having 4 panic attacks or more in a week if λ = 2, we must subtract the probability of having 3 panic attacks or less from 1, like so:

1 - ppois(q = 3, lambda = 2)

p = 0.143

Here is a simple way to plot the probability mass function and the cumulative distribution function using the dpois and ppois functions:

# Make a sequence of integers from 0 to 7
PanicAttacks <- seq(0, 7)

# Generate the probability mass function with lambda = 2
Probability <- dpois(PanicAttacks, 2)

# Basic plot of the Poisson 
# distribution's probability mass function
plot(Probability ~ PanicAttacks, type = "b")

Figure 8.21: Poisson Probability Mass Function (\lambda=2)

# Generate the cumulative 
# distribution function with lambda = 2
CumulativeProbability <- ppois(PanicAttacks, 2)

# Basic plot of the Poisson distribution's 
# cumulative distribution function
plot(CumulativeProbability ~ PanicAttacks, type = "b")

Figure 8.22: Poisson Cumulative Distribution Function (\lambda=2)

With an additional series with \lambda = 0.5, the plot can look like Figure 8.23.

Figure 8.23: The cumulative distribution function of a hypothetical client’s panic attack frequency before and after treatment

R Code for Figure 8.23

# The cumulative distribution function 
# of a hypothetical client's
# panic attack frequency before 
# and after treatment

d_label <- tibble(
  x = c(0, 0),
  Time = c("After", "Before"),
  Proportion = ppois(x, lambda = c(0.5, 2)),
  Label = c("After Treatment: *&lambda;* = 0.5",
            "Before Treatment: *&lambda;* = 2")
)

tibble(
  x = seq(0, 7),
  Before = ppois(x, lambda = 2),
  After = ppois(x, lambda = 0.5)
) %>%
  gather(key = Time, value = Proportion,-x) %>%
  ggplot(aes(x, Proportion, color = Time)) +
  geom_pointline(linetype = "dotted", linesize = .5, size = 3, distance = unit(2, "mm")) +
  geom_richtext(
    data = d_label,
    aes(label = Label),
    hjust = 0,
    nudge_x = 0.2,
    family = bfont,
    label.padding = unit(0, "lines"),
    label.size = 0,
    size = ggtext_size(20)
  ) +
  scale_color_manual(values = myfills) +
  scale_y_continuous(breaks = seq(0, 1, 0.2), 
                     limits = c(0, 1),
                     labels = . %>% prob_label(., 0.1)) +
  scale_x_continuous("Panic Attacks Per Week",
                     breaks = 0:7,
                     minor_breaks = NULL) +
  theme_minimal(base_size = 20, base_family = bfont) +
  theme(legend.position = "none")

8.5.10 Geometric Distributions

Feature	Symbol
Probability of success in each trial	p\in[0,1]
Sample Space	x \in \{1,2,3,\ldots\}
Mean	\mu = \frac{1}{p}
Variance	\sigma^2 = \frac{1-p}{p^2}
Skewness	\gamma_1 = \frac{2-p}{\sqrt{1-p}}
Kurtosis	\gamma_2 = 6 + \frac{p^2}{1-p}
Probability Mass Function	f_X(x;p) = (1-p)^{x-1}p^x
Cumulative Distribution Function	F_X(x;p) = 1-(1-p)^x

Table 8.5: Features of Geometric Distributions

Atul Gawande (2007, pp. 219–223) tells a marvelous anecdote about how a doctor used some statistics to help a young patient with cystic fibrosis to return to taking her medication more regularly. Because the story is full of pathos and masterfully told, I will not repeat a clumsy version of it here. However, unlike Gawande, I will show how the doctor’s statistics were calculated.

Gawande, A. (2007). Better: A surgeon’s notes on performance. Metropolitan Books.

According to the story, if a patient fails to take medication, the probability that a person with cystic fibrosis will develop a bad lung illness on any particular day is .005. If medication is taken, the risk is .0005. Although these probabilities are both close to zero, over the the course of a year, they result in very different levels of risk. Off medication, the patient has about an 84% chance of getting sick within a year’s time. On medication, the patient’s risk falls to 17%. As seen in Figure 8.24, the cumulative risk over the course of 10 years is quite different. Without medication, the probability of becoming seriously ill within 10 years at least once is almost certain. With medication, however, a small but substantial percentage (~16%) of patients will go at least 10 years without becoming ill.

Figure 8.24: The cumulative risk of serious lung disease with and without medication

R Code for Figure 8.24

# The cumulative risk of serious lung 
# disease with and without medication

total_years <- 10

tibble(Days = seq(0, total_years * 365, 10),
       WithoutMeds = pgeom(Days, 0.005),
       WithMeds = pgeom(Days, 0.0005)) %>% 
  gather(Meds, p, -Days) %>% 
  mutate(Years = Days / 365) %>% 
  ggplot(aes(Years, p, color = Meds)) + 
  geom_line(linewidth = 1) + 
  theme_minimal(base_size = 18, base_family = bfont) + 
  theme(legend.position = "none") +
  scale_x_continuous(breaks = seq(0,total_years,2)) + 
  scale_y_continuous("Cumulative Risk", breaks = seq(0,1,0.2),
                     labels = . %>% prob_label(., 0.1)) +
  scale_color_manual(values = myfills) +
  coord_fixed(ratio = 10) +
  annotate(x = 4, y = 0.93, 
           label = "Without Medication\nDaily Risk = .005",
           geom = "label", 
           color = myfills[2],
           label.padding = unit(0,"lines"),
           label.size = 0,
           family = bfont,
           size = 5.2
           ) +
  annotate(x = 6.25, y = 0.535, 
           label = "With Medication\nDaily Risk = .0005",
           geom = "label", 
           color = myfills[1],
           label.padding = unit(0,"lines"),
           label.size = 0,
           family = bfont,
           size = 5.2
           )

Such calculations make use of the geometric distribution. Consider a series of Bernoulli trials in which an event has a probability p of occurring on any particular trial. The probability mass function of the geometric distribution will tell us the probability that the x^th trial will be the first time the event occurs.

f_X(x;p)=(1-p)^{x-1}p^x

Symbol	Meaning
X	A random variable with a geometric distribution
f_X	The probability mass function of X
x	The number of Bernoulli trials on which the event first occurs
p	The probability of an event occurring on a single Bernoulli trial

In R, the probability mass function of the geometric distribution is calculated with the dgeom function:

# Make a sequence of integers from 1 to 10
x <- seq(1, 10)

# Generate the probability mass 
# function with p = 0.6
Probability <- dgeom(x, prob = 0.6)

# Basic plot of the geometric 
# distribution's probability mass function
plot(Probability ~ x, type = "b")

Figure 8.25: Geometric Probability Mass Function (p=.6)

The cumulative distribution function of the geometric distribution was used to create Figure 8.24. It tells us the probability that the event will occur on the x^{th} trial or earlier:

F_X(x;p)=1-(1-p)^x

In R, the cumulative distribution function of the geometric distribution uses the pgeom function:

# Generate the cumulative 
# distribution function with p = 0.6
CumulativeProbability <- pgeom(x, prob = 0.6)

# Basic plot of the geometric
# distribution's cumulative distribution function
plot(CumulativeProbability ~ x, type = "b")

Figure 8.26: Geometric Cumulative Distribution Function (p=.6)