NumPy - Geometric Distribution
Geometric Distribution is a discrete probability distribution and it expresses the probability distribution of the random variable (X) representing number of Bernoulli trials needed to get one success.
The probability mass function (pmf) of geometric distribution is defined as:
Where, k is the number of Bernoulli trials (k = 1,2...) and p is probability of success in each trial.
An geometric distribution has mean 1/p and variance (1-p)/p2.
The cumulative distribution function (cdf) evaluated at k, is the probability that the random variable (X) will take a value less than or equal to k. The cdf of geometric distribution is defined as:
The NumPy random.geometric() function returns random samples from a geometric distribution.
Syntax
numpy.random.geometric(p, size=None)
Parameters
p |
Required. Specify probability of success in each trial, must be in range [0, 1]. float or array_like of floats. |
size |
Optional. Specify output shape. int or tuple of ints. If the given shape is (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if n and p are both scalars. Otherwise, np.broadcast(n, p).size samples are drawn. |
Return Value
Returns samples from the parameterized geometric distribution. ndarray or scalar.
Example: Values from geometric distribution
In the example below, random.geometric() function is used to create a matrix of given shape containing random values drawn from specified geometric distribution.
import numpy as np size = (5,3) sample = np.random.geometric(0.5, size) print(sample)
The possible output of the above code could be:
[[2 2 4] [2 2 1] [1 1 1] [1 1 2] [5 2 1]]
Plotting geometric distribution
Example: Histogram plot
Matplotlib is a plotting library for the Python which can be used to plot the probability mass function (pmf) of geometric distribution using hist() function.
import matplotlib.pyplot as plt import numpy as np #fixing the seed for reproducibility #of the result np.random.seed(10) size = 10000 #drawing 10000 sample from #geometric distribution sample = np.random.geometric(0.5, size) bin = np.arange(0,20,1) plt.hist(sample, bins=bin, edgecolor='blue') plt.title("Geometric Distribution") plt.show()
The output of the above code will be:
Example: Comparing cdfs
Multiple cumulative distribution functions can be compared graphically using Seaborn ecdfplot() function. In the example below, cdf of three geometric distributions (each with different success probability) are compared.
import numpy as np import matplotlib.pyplot as plt import seaborn as sns #fixing the seed for reproducibility #of the result np.random.seed(10) size = 1000 #plotting 1000 sample from #different geometric distribution sns.ecdfplot(np.random.geometric(0.2, size)) sns.ecdfplot(np.random.geometric(0.5, size)) sns.ecdfplot(np.random.geometric(0.8, size)) plt.legend(["$p = 0.2$", "$p = 0.5$", "$p = 0.8$"]) plt.show()
The output of the above code will be: