NumPy - Binomial Distribution

Binomial Distribution is a discrete probability distribution and it expresses the probability of a given number of successes in a sequence of n independent experiments with a known probability of success on each trial.

The probability mass function (pmf) of binomial distribution is defined as:

Where,

p is the probability of success in each trial
q is the probability of failure in each trial, q = 1 - p
n is number of trials
k is the number of successes which can occur anywhere among the n trials

An binomial distribution has mean np and variance npq.

The cumulative distribution function (cdf) evaluated at k, is the probability that the random variable (X) will take a value less than or equal to k. The cdf of binomial distribution is defined as:

Where, [k] is the greatest integer less than or equal to k.

The NumPy random.binomial() function returns random samples from a binomial distribution.

Syntax

numpy.random.binomial(n, p, size=None)

Parameters

`n`	`Required.` Specify number of trials, must be >= 0. Floats are also accepted, but they will be truncated to integers.
`p`	`Required.` Specify probability of success in each trial, must be in range [0, 1]. float or array_like of floats.
`size`	`Optional.` Specify output shape. int or tuple of ints. If the given shape is (m, n, k), then m n * k* samples are drawn. If size is None (default), a single value is returned if n and p are both scalars. Otherwise, np.broadcast(n, p).size samples are drawn.

Return Value

Returns samples from the parameterized binomial distribution. ndarray or scalar.

Example: Values from binomial distribution

In the example below, random.binomial() function is used to create a matrix of given shape containing random values drawn from specified binomial distribution.

import numpy as np

size = (5,3)

sample = np.random.binomial(20, 0.5, size)
print(sample)

The possible output of the above code could be:

[[ 8  8 10]
 [ 5  9  8]
 [11  9 12]
 [12  7 11]
 [ 9  9 10]]

Plotting binomial distribution

Example: Histogram plot

Matplotlib is a plotting library for the Python which can be used to plot the probability mass function (pmf) of binomial distribution using hist() function.

import matplotlib.pyplot as plt
import numpy as np

#fixing the seed for reproducibility
#of the result
np.random.seed(10)

size = 10000
#drawing 10000 sample from 
#binomial distribution
sample = np.random.binomial(20, 0.5, size)
bin = np.arange(0,20,1)

plt.hist(sample, bins=bin, edgecolor='blue') 
plt.title("Binomial Distribution") 
plt.show()

The output of the above code will be:

Example: Comparing pmfs

Multiple mass functions can be compared graphically using Seaborn kdeplot() function. In the example below, pmf of three binomial distributions (each with different number of trials but same probability of success) are compared.

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

#fixing the seed for reproducibility
#of the result
np.random.seed(10)

size = 1000
#plotting 1000 sample from 
#different binomial distribution
sns.kdeplot(np.random.binomial(15, 0.5, size))
sns.kdeplot(np.random.binomial(20, 0.5, size))
sns.kdeplot(np.random.binomial(25, 0.5, size))

plt.legend(["$n = 15, p = 0.5$", 
            "$n = 20, p = 0.5$", 
            "$n = 25, p = 0.5$"])
plt.show()

The output of the above code will be:

Example: Comparing cdfs

Multiple cumulative distribution functions can be compared graphically using Seaborn ecdfplot() function. In the example below, cdf of three binomial distributions (each with different number of trials but same probability of success) are compared.

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

#fixing the seed for reproducibility
#of the result
np.random.seed(10)

size = 1000
#plotting 1000 sample from 
#different binomial distribution
sns.ecdfplot(np.random.binomial(15, 0.5, size))
sns.ecdfplot(np.random.binomial(20, 0.5, size))
sns.ecdfplot(np.random.binomial(25, 0.5, size))

plt.legend(["$n = 15, p = 0.5$", 
            "$n = 20, p = 0.5$", 
            "$n = 25, p = 0.5$"])
plt.show()

The output of the above code will be: