Pandas DataFrame - cov() function
The Pandas DataFrame cov() function computes pairwise covariance of columns, excluding NA/null values. The returned DataFrame is the covariance matrix of the columns of the DataFrame. Both NA and null values are automatically excluded from the calculation.
Syntax
DataFrame.cov(min_periods=None, ddof=1)
Parameters
min_periods |
Optional. An int to specify minimum number of observations required per pair of columns to have a valid result. Default is None. |
ddof |
Optional. Specify Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements. |
Return Value
Returns the covariance matrix of the series of the DataFrame.
Example: Creating a covariance matrix using whole DataFrame
In the example below, a DataFrame report is created. The cov() function is used to create a covariance matrix using all numeric columns of the DataFrame.
import pandas as pd import numpy as np report = pd.DataFrame({ "GDP": [1.02, 1.03, 1.04, 0.98], "GNP": [1.05, 0.99, np.nan, 1.04], "HDI": [1.02, 1.01, 1.02, 1.03]}, index= ["Q1", "Q2", "Q3", "Q4"] ) print(report,"\n") print(report.cov())
The output of the above code will be:
GDP GNP HDI Q1 1.02 1.05 1.02 Q2 1.03 0.99 1.01 Q3 1.04 NaN 1.02 Q4 0.98 1.04 1.03 GDP GNP HDI GDP 0.000692 -0.000450 -0.000167 GNP -0.000450 0.001033 0.000250 HDI -0.000167 0.000250 0.000067
Example: Creating a covariance matrix using selected columns
Instead of whole DataFrame, the cov() function can be applied on selected columns. Consider the following example.
import pandas as pd import numpy as np report = pd.DataFrame({ "GDP": [1.02, 1.03, 1.04, 0.98], "GNP": [1.05, 0.99, np.nan, 1.04], "HDI": [1.02, 1.01, 1.02, 1.03], "Agriculture": [1.02, 1.02, 0.99, 0.98]}, index= ["Q1", "Q2", "Q3", "Q4"] ) #displaying the dataframe print(report,"\n") #covariance matrix using two columns print("report[['GDP', 'HDI']].cov() returns:") print(report[['GDP', 'HDI']].cov(),"\n") #covariance matrix using three columns print("report[['GDP', 'HDI', 'Agriculture']].cov() returns:") print(report[['GDP', 'HDI', 'Agriculture']].cov(),"\n")
The output of the above code will be:
GDP GNP HDI Agriculture Q1 1.02 1.05 1.02 1.02 Q2 1.03 0.99 1.01 1.02 Q3 1.04 NaN 1.02 0.99 Q4 0.98 1.04 1.03 0.98 report[['GDP', 'HDI']].cov() returns: GDP HDI GDP 0.000692 -0.000167 HDI -0.000167 0.000067 report[['GDP', 'HDI', 'Agriculture']].cov() returns: GDP HDI Agriculture GDP 0.000692 -0.000167 0.000275 HDI -0.000167 0.000067 -0.000133 Agriculture 0.000275 -0.000133 0.000425
❮ Pandas DataFrame - Functions