Pandas DataFrame - corr() function
The Pandas DataFrame corr() function computes pairwise correlation of columns, excluding NA/null values. The returned DataFrame is the correlation matrix of the columns of the DataFrame. Both NA and null values are automatically excluded from the calculation.
Syntax
DataFrame.corr(method='pearson', min_periods=1)
Parameters
method |
Optional. Specify method of correlation. Default is 'pearson'. Possible values are:
|
min_periods |
Optional. An int to specify minimum number of observations required per pair of columns to have a valid result. Default is 1. |
Return Value
Returns the correlation matrix of the series of the DataFrame.
Example: Creating a correlation matrix using whole DataFrame
In the example below, a DataFrame report is created. The corr() function is used to create a correlation matrix using all numeric columns of the DataFrame.
import pandas as pd import numpy as np report = pd.DataFrame({ "GDP": [1.02, 1.03, 1.04, 0.98], "GNP": [1.05, 0.99, np.nan, 1.04], "HDI": [1.02, 1.01, 1.02, 1.03]}, index= ["Q1", "Q2", "Q3", "Q4"] ) print(report,"\n") print(report.corr())
The output of the above code will be:
GDP GNP HDI Q1 1.02 1.05 1.02 Q2 1.03 0.99 1.01 Q3 1.04 NaN 1.02 Q4 0.98 1.04 1.03 GDP GNP HDI GDP 1.000000 -0.529107 -0.776151 GNP -0.529107 1.000000 0.777714 HDI -0.776151 0.777714 1.000000
Example: Creating a correlation matrix using selected columns
Instead of whole DataFrame, the corr() function can be applied on selected columns. Consider the following example.
import pandas as pd import numpy as np report = pd.DataFrame({ "GDP": [1.02, 1.03, 1.04, 0.98], "GNP": [1.05, 0.99, np.nan, 1.04], "HDI": [1.02, 1.01, 1.02, 1.03], "Agriculture": [1.02, 1.02, 0.99, 0.98]}, index= ["Q1", "Q2", "Q3", "Q4"] ) #displaying the dataframe print(report,"\n") #correlation matrix using two columns print("report[['GDP', 'HDI']].corr() returns:") print(report[['GDP', 'HDI']].corr(),"\n") #correlation matrix using three columns print("report[['GDP', 'HDI', 'Agriculture']].corr() returns:") print(report[['GDP', 'HDI', 'Agriculture']].corr(),"\n")
The output of the above code will be:
GDP GNP HDI Agriculture Q1 1.02 1.05 1.02 1.02 Q2 1.03 0.99 1.01 1.02 Q3 1.04 NaN 1.02 0.99 Q4 0.98 1.04 1.03 0.98 report[['GDP', 'HDI']].corr() returns: GDP HDI GDP 1.000000 -0.776151 HDI -0.776151 1.000000 report[['GDP', 'HDI', 'Agriculture']].corr() returns: GDP HDI Agriculture GDP 1.000000 -0.776151 0.507212 HDI -0.776151 1.000000 -0.792118 Agriculture 0.507212 -0.792118 1.000000
❮ Pandas DataFrame - Functions