Pandas Tutorial Pandas References

Pandas DataFrame - diff() function



The Pandas DataFrame diff() function calculates the difference of a Dataframe element compared with another element in the Dataframe (default is element in previous row).

Syntax

DataFrame.diff(periods=1, axis=None)

Parameters

periods Optional. Specify the period to shift for calculating difference (negative values can also be used). Default: 1
axis Optional. Specify {0 or 'index', 1 or 'columns'}. If 0 or 'index', difference is taken for each column. If 1 or 'columns', difference is taken for each row. Default: 0

Return Value

Returns the dataframe with first (or specified period) discrete difference of element.

Example: using diff() column-wise on whole DataFrame

In the example below, a DataFrame df is created. The diff() function is used to get the specified discrete difference of element.

import pandas as pd
import numpy as np

df = pd.DataFrame({
  "GDP": [1.5, 2.5, 3.5, 1.5, 2.5, -1],
  "GNP": [1, 2, 3, 3, 2, -1],
  "HPI": [2, 3, 2, np.NaN, 2, 2]},
  index= ["2015", "2016", "2017", 
          "2018", "2019", "2020"]
)

print("The DataFrame is:")
print(df)

#first discrete difference of element
print("\ndf.diff() returns:")
print(df.diff())

#second discrete difference of element
print("\ndf.diff(2) returns:")
print(df.diff(2))

The output of the above code will be:

The DataFrame is:
      GDP  GNP  HPI
2015  1.5    1  2.0
2016  2.5    2  3.0
2017  3.5    3  2.0
2018  1.5    3  NaN
2019  2.5    2  2.0
2020 -1.0   -1  2.0

df.diff() returns:
      GDP  GNP  HPI
2015  NaN  NaN  NaN
2016  1.0  1.0  1.0
2017  1.0  1.0 -1.0
2018 -2.0  0.0  NaN
2019  1.0 -1.0  NaN
2020 -3.5 -3.0  0.0

df.diff(2) returns:
      GDP  GNP  HPI
2015  NaN  NaN  NaN
2016  NaN  NaN  NaN
2017  2.0  2.0  0.0
2018 -1.0  1.0  NaN
2019 -1.0 -1.0  0.0
2020 -2.5 -4.0  NaN

Example: using diff() row-wise on whole DataFrame

To perform the operation row-wise, the axis parameter can be set to 1.

import pandas as pd
import numpy as np

df = pd.DataFrame({
  "2015": [1.5, 1, 2],
  "2016": [2.5, 2, 3],
  "2017": [3.5, 3, 2],
  "2018": [1.5, 3, np.NaN],
  "2019": [2.5, 2, 2],
  "2020": [-1, -1, 2]},
  index= ["GDP", "GNP", "HDI"]
)

print("The DataFrame is:")
print(df)

#first discrete difference of element
print("\ndf.diff(axis=1) returns:")
print(df.diff(axis=1))

#second discrete difference of element
print("\ndf.diff(2, axis=1) returns:")
print(df.diff(2, axis=1))

The output of the above code will be:

The DataFrame is:
     2015  2016  2017  2018  2019  2020
GDP   1.5   2.5   3.5   1.5   2.5    -1
GNP   1.0   2.0   3.0   3.0   2.0    -1
HDI   2.0   3.0   2.0   NaN   2.0     2

df.diff(axis=1) returns:
     2015  2016  2017  2018  2019  2020
GDP   NaN   1.0   1.0  -2.0   1.0  -3.5
GNP   NaN   1.0   1.0   0.0  -1.0  -3.0
HDI   NaN   1.0  -1.0   NaN   NaN   0.0

df.diff(2, axis=1) returns:
     2015  2016  2017  2018  2019  2020
GDP   NaN   NaN   2.0  -1.0  -1.0  -2.5
GNP   NaN   NaN   2.0   1.0  -1.0  -4.0
HDI   NaN   NaN   0.0   NaN   0.0   NaN

Example: using diff() on selected column

Instead of whole DataFrame, the diff() function can be applied on selected columns. Consider the following example.

import pandas as pd
import numpy as np

df = pd.DataFrame({
  "GDP": [1.5, 2.5, 3.5, 1.5, 2.5, -1],
  "GNP": [1, 2, 3, 3, 2, -1],
  "HPI": [2, 3, 2, np.NaN, 2, 2]},
  index= ["2015", "2016", "2017", 
          "2018", "2019", "2020"]
)

print("The DataFrame is:")
print(df)

#first discrete difference of single column
print("\ndf['GDP'].diff() returns:")
print(df['GDP'].diff())

#first discrete difference of multiple columns
print("\ndf[['GDP', 'GNP']].diff() returns:")
print(df[['GDP', 'GNP']].diff())

The output of the above code will be:

The DataFrame is:
      GDP  GNP  HPI
2015  1.5    1  2.0
2016  2.5    2  3.0
2017  3.5    3  2.0
2018  1.5    3  NaN
2019  2.5    2  2.0
2020 -1.0   -1  2.0

df['GDP'].diff() returns:
2015    NaN
2016    1.0
2017    1.0
2018   -2.0
2019    1.0
2020   -3.5
Name: GDP, dtype: float64

df[['GDP', 'GNP']].diff() returns:
      GDP  GNP
2015  NaN  NaN
2016  1.0  1.0
2017  1.0  1.0
2018 -2.0  0.0
2019  1.0 -1.0
2020 -3.5 -3.0

❮ Pandas DataFrame - Functions