Pandas - DataFrame Computation Functions
The Pandas package contains a number of computational functions which provides all the functionality required for basic computation operations on a DataFrame and Series. Below mentioned are the most frequently used computational functions.
Functions | Description |
---|---|
max() | Returns the maximum of the values over the specified axis. |
min() | Returns the minimum of the values over the specified axis. |
sum() | Returns the sum of the values over the specified axis. |
prod() | Returns the product of the values over the specified axis. |
count() | Returns the count of non-NA cells for each column or row. |
abs() | Returns a Series/DataFrame with absolute numeric value of each element. |
round() | Rounds a DataFrame to a specified number of decimal places. |
Lets discuss these functions in detail:
max() and min() functions
The Pandas DataFrame, max() function returns the maximum of the values over the specified axis, whereas min() function returns the minimum of the values over the specified axis.
Syntax
DataFrame.max(axis=None, skipna=None, level=None, numeric_only=None) DataFrame.min(axis=None, skipna=None, level=None, numeric_only=None)
Parameters
axis |
Optional. Specify {0 or 'index', 1 or 'columns'}. If 0 or 'index', maximum/minimum of the values are generated for each column. If 1 or 'columns', maximum/minimum of the values are generated for each row. Default: 0 |
skipna |
Optional. Specify True to exclude NA/null values when computing the result. Default is True. |
level |
Optional. Specify level (int or str). If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series. A str specifies the level name. |
numeric_only |
Optional. Specify True to include only float, int or boolean data. Default: False |
Example:
In the example below, a DataFrame df is created. The max() and min functions are used to get the maximum and minimum of values for each column respectively.
import pandas as pd import numpy as np df = pd.DataFrame({ "Bonus": [5, 3, 2, 4], "Salary": [50, 62, 65, 59]}, index= ["John", "Marry", "Sam", "Jo"] ) print("The DataFrame is:") print(df) #maximum of values column-wise print("\ndf.max() returns:") print(df.max()) #minimum of values column-wise print("\ndf.min() returns:") print(df.min())
The output of the above code will be:
The DataFrame is: Bonus Salary John 5 50 Marry 3 62 Sam 2 65 Jo 4 59 df.max() returns: Bonus 5 Salary 65 dtype: int64 df.min() returns: Bonus 2 Salary 50 dtype: int64
Example: using axis=1
By using axis=1, the operation can be performed row-wise. Consider the example below:
import pandas as pd import numpy as np df = pd.DataFrame({ "John": [50, 5], "Marry": [62, 3], "Sam": [65, 2], "Jo": [59, 4]}, index= ["Salary", "Bonus"] ) print("The DataFrame is:") print(df) #maximum of values row-wise print("\ndf.max(axis=1) returns:") print(df.max(axis=1)) #minimum of values row-wise print("\ndf.min(axis=1) returns:") print(df.min(axis=1))
The output of the above code will be:
The DataFrame is: John Marry Sam Jo Salary 50 62 65 59 Bonus 5 3 2 4 df.max(axis=1) returns: Salary 65 Bonus 5 dtype: int64 df.min(axis=1) returns: Salary 50 Bonus 2 dtype: int64
sum() and prod() functions
The Pandas DataFrame sum() function returns the sum of the values over the specified axis, whereas prod() function returns the product of the values over the specified axis.
Syntax
DataFrame.sum(axis=None, skipna=None, level=None, numeric_only=None, min_count=0) DataFrame.prod(axis=None, skipna=None, level=None, numeric_only=None, min_count=0)
Parameters
axis |
Optional. Specify {0 or 'index', 1 or 'columns'}. If 0 or 'index', sums/products are generated for each column. If 1 or 'columns', sums/products are generated for each row. Default: 0 |
skipna |
Optional. Specify True to exclude NA/null values when computing the result. Default is True. |
level |
Optional. Specify level (int or str). If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series. A str specifies the level name. |
numeric_only |
Optional. Specify True to include only float, int or boolean data. Default: False |
min_count |
Optional. Specify required number of valid values to perform the operation. If the count of non-NA values is less than the min_count, the result will be NA. |
Example:
In the example below, a DataFrame df is created. The sum() and prod functions are used to get the sum and product of each column respectively.
import pandas as pd import numpy as np df = pd.DataFrame({ "x": [1, 2, 3, 4], "y": [2, 4, 6, 8], "z": [3, 5, 7, 9]}, index= ['a', 'b', 'c', 'd'] ) print("The DataFrame is:") print(df) #sum of values column-wise print("\ndf.sum() returns:") print(df.sum()) #product of values column-wise print("\ndf.prod() returns:") print(df.prod())
The output of the above code will be:
The DataFrame is: x y z a 1 2 3 b 2 4 5 c 3 6 7 d 4 8 9 df.sum() returns: x 10 y 20 z 24 dtype: int64 df.prod() returns: x 24 y 384 z 945 dtype: int64
Example: using axis=1
By using axis=1, the operation can be performed row-wise. Consider the example below:
import pandas as pd import numpy as np df = pd.DataFrame({ "x": [1, 2, 3, 4], "y": [2, 4, 6, 8], "z": [3, 5, 7, 9]}, index= ['a', 'b', 'c', 'd'] ) print("The DataFrame is:") print(df) #sum of values row-wise print("\ndf.sum(axis=1) returns:") print(df.sum(axis=1)) #product of values row-wise print("\ndf.prod(axis=1) returns:") print(df.prod(axis=1))
The output of the above code will be:
The DataFrame is: x y z a 1 2 3 b 2 4 5 c 3 6 7 d 4 8 9 df.sum(axis=1) returns: a 6 b 11 c 16 d 21 dtype: int64 df.prod(axis=1) returns: a 6 b 40 c 126 d 288 dtype: int64
count() function
The Pandas DataFrame count() function is used to count non-NA cells for each column or row. The values None, NaN, NaT, and optionally pandas.inf (depending on pandas.options.mode.use_inf_as_na) are considered NA.
Syntax
DataFrame.count(axis=0, level=None, numeric_only=False)
Parameters
axis |
Optional. Specify {0 or 'index', 1 or 'columns'}. If 0 or 'index', counts are generated for each column. If 1 or 'columns', counts are generated for each row. Default: 0 |
level |
Optional. Specify level (int or str). If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a DataFrame. A str specifies the level name. |
numeric_only |
Optional. Specify True to include only float, int or boolean data. Default: False |
Example:
In the example below, a DataFrame info is created. The count() function is used to get the count of non-NA values of each column.
import pandas as pd import numpy as np info = pd.DataFrame({ "Person": ["John", "Mary", "Jo", "Sam"], "Age": [25, 24, 30, 28], "Bonus": ["10K", np.nan, "10K", "9K"] }) print(info,"\n") print(info.count())
The output of the above code will be:
Person Age Bonus 0 John 25 10K 1 Mary 24 NaN 2 Jo 30 10K 3 Sam 28 9K Person 4 Age 4 Bonus 3 dtype: int64
Example: using axis=1
To get the row-wise count, the axis parameter can be set to 1.
import pandas as pd import numpy as np info = pd.DataFrame({ "Person": ["John", "Mary", "Jo", "Sam"], "Age": [25, 24, 30, 28], "Bonus": ["10K", np.nan, "10K", "9K"] }) print(info,"\n") print(info.count(axis=1))
The output of the above code will be:
Person Age Bonus 0 John 25 10K 1 Mary 24 NaN 2 Jo 30 10K 3 Sam 28 9K 0 3 1 2 2 3 3 3 dtype: int64
abs() function
The Pandas DataFrame abs() function returns a Series/DataFrame with absolute numeric value of each element. This function only applies to elements that are all numeric.
Syntax
DataFrame.abs()
Parameters
No parameter is required.
Example:
In the example below, a DataFrame df is created. The abs() function is used to get a DataFrame with absolute numeric value of each element.
import pandas as pd import numpy as np df = pd.DataFrame({ "a": [5, -4, 2, -8], "b": [-10, -20, 2, -5], "c": [10, 20, -30, -5] }) print("The DataFrame is:") print(df) #Getting absolute value of whole dataframe print("\ndf.abs() returns:") print(df.abs())
The output of the above code will be:
The DataFrame is: a b c 0 5 -10 10 1 -4 -20 20 2 2 2 -30 3 -8 -5 -5 df1.abs() returns: a b c 0 5 10 10 1 4 20 20 2 2 2 30 3 8 5 5
round() function
The Pandas DataFrame round() function rounds a DataFrame to a specified number of decimal places.
Syntax
DataFrame.round(decimals=0)
Parameters
decimals |
Optional. Specify int, dict, Series to indicate number of decimal places to round each column to. If an int is provided, round each column to the same number of places. Otherwise dict and Series round to variable numbers of places. When dict is provided, keys should specify the column names which need to be rounded. When Series is provided, index should specify the column names which need to be rounded. Any columns not included in decimals will be left as is. Elements of decimals which are not columns of the input will be ignored. |
Example:
In the example below, a DataFrame df is created. The round() function is used to round a DataFrame to a specified number of decimal places.
import pandas as pd import numpy as np df = pd.DataFrame({ "Bonus": [5.344, 3.925, 2.150, 4.229], "Salary": [60.227, 62.550, 65.725, 59.328]}, index= ["John", "Marry", "Sam", "Jo"] ) print("The DataFrame is:") print(df) #rounding the whole dataframe to 0 decimal places print("\ndf.round() returns:") print(df.round()) #rounding the whole dataframe to 1 decimal places print("\ndf.round(2) returns:") print(df.round(2))
The output of the above code will be:
The DataFrame is: Bonus Salary John 5.344 60.227 Marry 3.925 62.550 Sam 2.150 65.725 Jo 4.229 59.328 df.round() returns: Bonus Salary John 5.0 60.0 Marry 4.0 63.0 Sam 2.0 66.0 Jo 4.0 59.0 df.round(2) returns: Bonus Salary John 5.34 60.23 Marry 3.92 62.55 Sam 2.15 65.72 Jo 4.23 59.33