SciPy Tutorial SciPy Statistics

SciPy - Normal Distribution



Normal (Gaussian) Distribution is a probability function that describes how the values of a variable are distributed. It is a symmetric distribution about its mean where most of the observations cluster around the mean and the probabilities for values further away from the mean taper off equally in both directions. It is the most important probability distribution function used in statistics because of its advantages in real case scenarios. For example, the height of the population, measurement errors etc.

The probability density function (pdf) of normal distribution is defined as:

Normal Distribution

Where, μ is the mean or expectation of the distribution and σ is the standard deviation of the distribution.

An normal distribution has mean μ and variance σ2. A normal distribution with μ=0 and σ=1 is called standard normal distribution.

The cumulative distribution function (cdf) evaluated at x, is the probability that the random variable (X) will take a value less than or equal to x. The cdf of normal distribution is defined as:

Normal Distribution

The scipy.stats.norm contains all the methods required to generate and work with a normal distribution. The most frequently methods are mentioned below:

Syntax

scipy.stats.norm.pdf(x, loc=0, scale=1)
scipy.stats.norm.cdf(x, loc=0, scale=1)
scipy.stats.norm.ppf(q, loc=0, scale=1)
scipy.stats.norm.rvs(loc=0, scale=1, size=1)

Parameters

x Required. Specify float or array_like of floats representing random variable.
q Required. Specify float or array_like of floats representing probabilities.
loc Optional. Specify mean of the distribution. Default is 0.0.
scale Optional. Specify standard deviation of the distribution. Must be non-negative. Default is 1.0.
size Optional. Specify output shape.

norm.pdf()

The norm.pdf() function measures probability density function (pdf) of the distribution.

from scipy.stats import norm
import matplotlib.pyplot as plt
import numpy as np

#creating an array of values between
#-10 to 10 with a difference of 0.1
x = np.arange(-10, 10, 0.1)
   
y = norm.pdf(x, 0, 2)
   
plt.plot(x, y) 
plt.show()

The output of the above code will be:

Normal Distribution

norm.cdf()

The norm.cdf() function returns cumulative distribution function (cdf) of the distribution.

from scipy.stats import norm
import matplotlib.pyplot as plt
import numpy as np

#creating an array of values between
#-10 to 10 with a difference of 0.1
x = np.arange(-10, 10, 0.1)
   
y = norm.cdf(x, 0, 2)
   
plt.plot(x, y) 
plt.show()

The output of the above code will be:

Normal Distribution

norm.ppf()

The norm.ppf() function takes the probability value and returns cumulative value corresponding to probability value of the distribution.

from scipy.stats import norm
import matplotlib.pyplot as plt
import numpy as np

#creating an array of probability from
#0 to 1 with a difference of 0.01
x = np.arange(0, 1, 0.01)
   
y = norm.ppf(x, 0, 2)
   
plt.plot(x, y) 
plt.show()

The output of the above code will be:

Normal Distribution

norm.rvs()

The norm.ppf() function generates an array containing specified number of random numbers of the given normal distribution. In the example below, a histogram is plotted to visualize the result.

from scipy.stats import norm
import matplotlib.pyplot as plt
import numpy as np

#fixing the seed for reproducibility
#of the result
np.random.seed(10)

#creating a vector containing 10000
#normally distributed random numbers
y = norm.rvs(0, 1, 10000)

#creating bin
bin = np.arange(-4,4,0.1)  

plt.hist(y, bins=bin, edgecolor='blue') 
plt.show()

The output of the above code will be:

Normal Distribution