Pandas Tutorial

Pandas References

Pandas - DataFrame Computation Functions

The Pandas package contains a number of computational functions which provides all the functionality required for basic computation operations on a DataFrame and Series. Below mentioned are the most frequently used computational functions.

Functions	Description
max()	Returns the maximum of the values over the specified axis.
min()	Returns the minimum of the values over the specified axis.
sum()	Returns the sum of the values over the specified axis.
prod()	Returns the product of the values over the specified axis.
count()	Returns the count of non-NA cells for each column or row.
abs()	Returns a Series/DataFrame with absolute numeric value of each element.
round()	Rounds a DataFrame to a specified number of decimal places.

Lets discuss these functions in detail:

max() and min() functions

The Pandas DataFrame, max() function returns the maximum of the values over the specified axis, whereas min() function returns the minimum of the values over the specified axis.

Syntax

DataFrame.max(axis=None, skipna=None, 
              level=None, numeric_only=None)
DataFrame.min(axis=None, skipna=None, 
              level=None, numeric_only=None)

Parameters

`axis`	`Optional.` Specify {0 or 'index', 1 or 'columns'}. If 0 or 'index', maximum/minimum of the values are generated for each column. If 1 or 'columns', maximum/minimum of the values are generated for each row. Default: 0
`skipna`	`Optional.` Specify True to exclude NA/null values when computing the result. Default is True.
`level`	`Optional.` Specify level (int or str). If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series. A str specifies the level name.
`numeric_only`	`Optional.` Specify True to include only float, int or boolean data. Default: False

Example:

In the example below, a DataFrame df is created. The max() and min functions are used to get the maximum and minimum of values for each column respectively.

import pandas as pd
import numpy as np

df = pd.DataFrame({
  "Bonus": [5, 3, 2, 4],
  "Salary": [50, 62, 65, 59]},
  index= ["John", "Marry", "Sam", "Jo"]
)

print("The DataFrame is:")
print(df)

#maximum of values column-wise
print("\ndf.max() returns:")
print(df.max())

#minimum of values column-wise
print("\ndf.min() returns:")
print(df.min())

The output of the above code will be:

The DataFrame is:
       Bonus  Salary
John       5      50
Marry      3      62
Sam        2      65
Jo         4      59

df.max() returns:
Bonus      5
Salary    65
dtype: int64

df.min() returns:
Bonus      2
Salary    50
dtype: int64

Example: using axis=1

By using axis=1, the operation can be performed row-wise. Consider the example below:

import pandas as pd
import numpy as np

df = pd.DataFrame({
  "John": [50, 5],
  "Marry": [62, 3],
  "Sam": [65, 2],
  "Jo": [59, 4]},
  index= ["Salary", "Bonus"]
)

print("The DataFrame is:")
print(df)

#maximum of values row-wise
print("\ndf.max(axis=1) returns:")
print(df.max(axis=1))

#minimum of values row-wise
print("\ndf.min(axis=1) returns:")
print(df.min(axis=1))

The output of the above code will be:

The DataFrame is:
        John  Marry  Sam  Jo
Salary    50     62   65  59
Bonus      5      3    2   4

df.max(axis=1) returns:
Salary    65
Bonus      5
dtype: int64

df.min(axis=1) returns:
Salary    50
Bonus      2
dtype: int64

sum() and prod() functions

The Pandas DataFrame sum() function returns the sum of the values over the specified axis, whereas prod() function returns the product of the values over the specified axis.

Syntax

DataFrame.sum(axis=None, skipna=None, level=None, 
              numeric_only=None, min_count=0)
DataFrame.prod(axis=None, skipna=None, level=None, 
               numeric_only=None, min_count=0)

Parameters

`axis`	`Optional.` Specify {0 or 'index', 1 or 'columns'}. If 0 or 'index', sums/products are generated for each column. If 1 or 'columns', sums/products are generated for each row. Default: 0
`skipna`	`Optional.` Specify True to exclude NA/null values when computing the result. Default is True.
`level`	`Optional.` Specify level (int or str). If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series. A str specifies the level name.
`numeric_only`	`Optional.` Specify True to include only float, int or boolean data. Default: False
`min_count`	`Optional.` Specify required number of valid values to perform the operation. If the count of non-NA values is less than the min_count, the result will be NA.

Example:

In the example below, a DataFrame df is created. The sum() and prod functions are used to get the sum and product of each column respectively.

import pandas as pd
import numpy as np

df = pd.DataFrame({
  "x": [1, 2, 3, 4],
  "y": [2, 4, 6, 8],
  "z": [3, 5, 7, 9]},
  index= ['a', 'b', 'c', 'd']
)

print("The DataFrame is:")
print(df)

#sum of values column-wise
print("\ndf.sum() returns:")
print(df.sum())

#product of values column-wise
print("\ndf.prod() returns:")
print(df.prod())

The output of the above code will be:

The DataFrame is:
   x  y  z
a  1  2  3
b  2  4  5
c  3  6  7
d  4  8  9

df.sum() returns:
x    10
y    20
z    24
dtype: int64

df.prod() returns:
x     24
y    384
z    945
dtype: int64

Example: using axis=1

By using axis=1, the operation can be performed row-wise. Consider the example below:

import pandas as pd
import numpy as np

df = pd.DataFrame({
  "x": [1, 2, 3, 4],
  "y": [2, 4, 6, 8],
  "z": [3, 5, 7, 9]},
  index= ['a', 'b', 'c', 'd']
)

print("The DataFrame is:")
print(df)

#sum of values row-wise
print("\ndf.sum(axis=1) returns:")
print(df.sum(axis=1))

#product of values row-wise
print("\ndf.prod(axis=1) returns:")
print(df.prod(axis=1))

The output of the above code will be:

The DataFrame is:
   x  y  z
a  1  2  3
b  2  4  5
c  3  6  7
d  4  8  9

df.sum(axis=1) returns:
a     6
b    11
c    16
d    21
dtype: int64

df.prod(axis=1) returns:
a      6
b     40
c    126
d    288
dtype: int64

count() function

The Pandas DataFrame count() function is used to count non-NA cells for each column or row. The values None, NaN, NaT, and optionally pandas.inf (depending on pandas.options.mode.use_inf_as_na) are considered NA.

Syntax

DataFrame.count(axis=0, level=None, 
                numeric_only=False)

Parameters

`axis`	`Optional.` Specify {0 or 'index', 1 or 'columns'}. If 0 or 'index', counts are generated for each column. If 1 or 'columns', counts are generated for each row. Default: 0
`level`	`Optional.` Specify level (int or str). If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a DataFrame. A str specifies the level name.
`numeric_only`	`Optional.` Specify True to include only float, int or boolean data. Default: False

Example:

In the example below, a DataFrame info is created. The count() function is used to get the count of non-NA values of each column.

import pandas as pd
import numpy as np

info = pd.DataFrame({
  "Person": ["John", "Mary", "Jo", "Sam"],
  "Age": [25, 24, 30, 28],
  "Bonus": ["10K", np.nan, "10K", "9K"]
})

print(info,"\n")
print(info.count())

The output of the above code will be:

  Person  Age Bonus
0   John   25   10K
1   Mary   24   NaN
2     Jo   30   10K
3    Sam   28    9K 

Person    4
Age       4
Bonus     3
dtype: int64

Example: using axis=1

To get the row-wise count, the axis parameter can be set to 1.

import pandas as pd
import numpy as np

info = pd.DataFrame({
  "Person": ["John", "Mary", "Jo", "Sam"],
  "Age": [25, 24, 30, 28],
  "Bonus": ["10K", np.nan, "10K", "9K"]
})

print(info,"\n")
print(info.count(axis=1))

The output of the above code will be:

  Person  Age Bonus
0   John   25   10K
1   Mary   24   NaN
2     Jo   30   10K
3    Sam   28    9K 

0    3
1    2
2    3
3    3
dtype: int64

abs() function

The Pandas DataFrame abs() function returns a Series/DataFrame with absolute numeric value of each element. This function only applies to elements that are all numeric.

Syntax

DataFrame.abs()

Parameters

No parameter is required.

Example:

In the example below, a DataFrame df is created. The abs() function is used to get a DataFrame with absolute numeric value of each element.

import pandas as pd
import numpy as np

df = pd.DataFrame({
  "a": [5, -4, 2, -8],
  "b": [-10, -20, 2, -5],
  "c": [10, 20, -30, -5]
})

print("The DataFrame is:")
print(df)

#Getting absolute value of whole dataframe
print("\ndf.abs() returns:")
print(df.abs())

The output of the above code will be:

The DataFrame is:
   a   b   c
0  5 -10  10
1 -4 -20  20
2  2   2 -30
3 -8  -5  -5

df1.abs() returns:
   a   b   c
0  5  10  10
1  4  20  20
2  2   2  30
3  8   5   5

round() function

The Pandas DataFrame round() function rounds a DataFrame to a specified number of decimal places.

Syntax

DataFrame.round(decimals=0)

Parameters

decimals Optional. Specify int, dict, Series to indicate number of decimal places to round each column to. If an int is provided, round each column to the same number of places. Otherwise dict and Series round to variable numbers of places. When dict is provided, keys should specify the column names which need to be rounded. When Series is provided, index should specify the column names which need to be rounded. Any columns not included in decimals will be left as is. Elements of decimals which are not columns of the input will be ignored.

Example:

In the example below, a DataFrame df is created. The round() function is used to round a DataFrame to a specified number of decimal places.

import pandas as pd
import numpy as np

df = pd.DataFrame({
  "Bonus": [5.344, 3.925, 2.150, 4.229],
  "Salary": [60.227, 62.550, 65.725, 59.328]},
  index= ["John", "Marry", "Sam", "Jo"]
)

print("The DataFrame is:")
print(df)

#rounding the whole dataframe to 0 decimal places
print("\ndf.round() returns:")
print(df.round())

#rounding the whole dataframe to 1 decimal places
print("\ndf.round(2) returns:")
print(df.round(2))

The output of the above code will be:

The DataFrame is:
       Bonus  Salary
John   5.344  60.227
Marry  3.925  62.550
Sam    2.150  65.725
Jo     4.229  59.328

df.round() returns:
       Bonus  Salary
John     5.0    60.0
Marry    4.0    63.0
Sam      2.0    66.0
Jo       4.0    59.0

df.round(2) returns:
       Bonus  Salary
John    5.34   60.23
Marry   3.92   62.55
Sam     2.15   65.72
Jo      4.23   59.33