Pandas DataFrame - diff() function
The Pandas DataFrame diff() function calculates the difference of a Dataframe element compared with another element in the Dataframe (default is element in previous row).
Syntax
DataFrame.diff(periods=1, axis=None)
Parameters
periods |
Optional. Specify the period to shift for calculating difference (negative values can also be used). Default: 1 |
axis |
Optional. Specify {0 or 'index', 1 or 'columns'}. If 0 or 'index', difference is taken for each column. If 1 or 'columns', difference is taken for each row. Default: 0 |
Return Value
Returns the dataframe with first (or specified period) discrete difference of element.
Example: using diff() column-wise on whole DataFrame
In the example below, a DataFrame df is created. The diff() function is used to get the specified discrete difference of element.
import pandas as pd import numpy as np df = pd.DataFrame({ "GDP": [1.5, 2.5, 3.5, 1.5, 2.5, -1], "GNP": [1, 2, 3, 3, 2, -1], "HPI": [2, 3, 2, np.NaN, 2, 2]}, index= ["2015", "2016", "2017", "2018", "2019", "2020"] ) print("The DataFrame is:") print(df) #first discrete difference of element print("\ndf.diff() returns:") print(df.diff()) #second discrete difference of element print("\ndf.diff(2) returns:") print(df.diff(2))
The output of the above code will be:
The DataFrame is: GDP GNP HPI 2015 1.5 1 2.0 2016 2.5 2 3.0 2017 3.5 3 2.0 2018 1.5 3 NaN 2019 2.5 2 2.0 2020 -1.0 -1 2.0 df.diff() returns: GDP GNP HPI 2015 NaN NaN NaN 2016 1.0 1.0 1.0 2017 1.0 1.0 -1.0 2018 -2.0 0.0 NaN 2019 1.0 -1.0 NaN 2020 -3.5 -3.0 0.0 df.diff(2) returns: GDP GNP HPI 2015 NaN NaN NaN 2016 NaN NaN NaN 2017 2.0 2.0 0.0 2018 -1.0 1.0 NaN 2019 -1.0 -1.0 0.0 2020 -2.5 -4.0 NaN
Example: using diff() row-wise on whole DataFrame
To perform the operation row-wise, the axis parameter can be set to 1.
import pandas as pd import numpy as np df = pd.DataFrame({ "2015": [1.5, 1, 2], "2016": [2.5, 2, 3], "2017": [3.5, 3, 2], "2018": [1.5, 3, np.NaN], "2019": [2.5, 2, 2], "2020": [-1, -1, 2]}, index= ["GDP", "GNP", "HDI"] ) print("The DataFrame is:") print(df) #first discrete difference of element print("\ndf.diff(axis=1) returns:") print(df.diff(axis=1)) #second discrete difference of element print("\ndf.diff(2, axis=1) returns:") print(df.diff(2, axis=1))
The output of the above code will be:
The DataFrame is: 2015 2016 2017 2018 2019 2020 GDP 1.5 2.5 3.5 1.5 2.5 -1 GNP 1.0 2.0 3.0 3.0 2.0 -1 HDI 2.0 3.0 2.0 NaN 2.0 2 df.diff(axis=1) returns: 2015 2016 2017 2018 2019 2020 GDP NaN 1.0 1.0 -2.0 1.0 -3.5 GNP NaN 1.0 1.0 0.0 -1.0 -3.0 HDI NaN 1.0 -1.0 NaN NaN 0.0 df.diff(2, axis=1) returns: 2015 2016 2017 2018 2019 2020 GDP NaN NaN 2.0 -1.0 -1.0 -2.5 GNP NaN NaN 2.0 1.0 -1.0 -4.0 HDI NaN NaN 0.0 NaN 0.0 NaN
Example: using diff() on selected column
Instead of whole DataFrame, the diff() function can be applied on selected columns. Consider the following example.
import pandas as pd import numpy as np df = pd.DataFrame({ "GDP": [1.5, 2.5, 3.5, 1.5, 2.5, -1], "GNP": [1, 2, 3, 3, 2, -1], "HPI": [2, 3, 2, np.NaN, 2, 2]}, index= ["2015", "2016", "2017", "2018", "2019", "2020"] ) print("The DataFrame is:") print(df) #first discrete difference of single column print("\ndf['GDP'].diff() returns:") print(df['GDP'].diff()) #first discrete difference of multiple columns print("\ndf[['GDP', 'GNP']].diff() returns:") print(df[['GDP', 'GNP']].diff())
The output of the above code will be:
The DataFrame is: GDP GNP HPI 2015 1.5 1 2.0 2016 2.5 2 3.0 2017 3.5 3 2.0 2018 1.5 3 NaN 2019 2.5 2 2.0 2020 -1.0 -1 2.0 df['GDP'].diff() returns: 2015 NaN 2016 1.0 2017 1.0 2018 -2.0 2019 1.0 2020 -3.5 Name: GDP, dtype: float64 df[['GDP', 'GNP']].diff() returns: GDP GNP 2015 NaN NaN 2016 1.0 1.0 2017 1.0 1.0 2018 -2.0 0.0 2019 1.0 -1.0 2020 -3.5 -3.0
❮ Pandas DataFrame - Functions