Pandas DataFrame - agg() function
The Pandas DataFrame agg() function is used to perform aggregation using one or more operations over the specified axis. The syntax for using this function is given below:
Syntax
DataFrame.agg(func=None, axis=0)
Parameters
func |
Required. Specify function used for aggregating the data. If a function, must either work when passed a DataFrame or when passed to DataFrame.apply. Accepted combinations are:
|
axis |
Optional. Specify axis on which the function need to be applied. Default is 0. If 0 or 'index': applies function to each column. If 1 or 'columns': applies function to each row. |
Return Value
Returns following:
- Scalar when Series.agg is called with single function.
- Series when DataFrame.agg is called with a single function.
- DataFrame when DataFrame.agg is called with multiple functions.
Example: using agg() on whole DataFrame
In the example below, a DataFrame df is created. The agg() function is applied on whole DataFrame to calculate sum of each columns.
import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(5, 3), index = pd.date_range('1/1/2018', periods=5), columns = ['col1', 'col2', 'col3'] ) print("The DataFrame contains:") print(df) print("\nAggregation returns:") print(df.agg(np.sum))
The output of the above code will be:
The DataFrame contains: col1 col2 col3 2018-01-01 -0.687624 0.831343 0.369147 2018-01-02 -0.196517 1.979898 -1.000479 2018-01-03 0.258959 1.040191 0.001425 2018-01-04 0.630665 -0.739803 0.875488 2018-01-05 0.082997 -0.826209 1.453134 Aggregation returns: col1 0.088481 col2 2.285421 col3 1.698715 dtype: float64
Example: using more operations on whole DataFrame
Multiple operations can be applied on a DataFrame at the same time. Like in the example below, three operations - sum, mean and average are applied at the same time.
import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(5, 3), index = pd.date_range('1/1/2018', periods=5), columns = ['col1', 'col2', 'col3'] ) print("The DataFrame is:") print(df) print("\nAggregation returns:") print(df.agg([np.sum, np.mean, 'average']))
The output of the above code will be:
The DataFrame is: col1 col2 col3 2018-01-01 0.535302 -0.791378 -0.858626 2018-01-02 -1.465922 0.375763 0.588740 2018-01-03 -0.407567 0.452181 0.687858 2018-01-04 0.327220 0.626945 -2.319354 2018-01-05 0.337624 0.041807 0.278022 Aggregation returns: col1 col2 col3 sum -0.673343 0.705318 -1.623361 mean -0.134669 0.141064 -0.324672 average -0.134669 0.141064 -0.324672
Example: using agg() on selected columns
Instead of whole DataFrame, the agg() function can be applied on selected columns. Consider the following example.
import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(5, 3), index = pd.date_range('1/1/2018', periods=5), columns = ['col1', 'col2', 'col3'] ) print("The DataFrame contains:") print(df) #aggregation on single column print("\nAggregation on col2 returns:") print(df['col2'].agg(np.sum)) #aggregation on multiple columns print("\nAggregation on col2 and col3 returns:") print(df[['col2', 'col3']].agg(np.sum))
The output of the above code will be:
The DataFrame contains: col1 col2 col3 2018-01-01 -0.569402 -0.074556 1.224784 2018-01-02 -0.185962 -0.914699 -0.399853 2018-01-03 -1.175717 -0.145105 0.434319 2018-01-04 1.127940 -0.699489 1.235873 2018-01-05 0.983465 0.676895 1.670025 Aggregation on col2 returns: -1.1569553034087827 Aggregation on col2 and col3 returns: col2 -1.156955 col3 4.165149 dtype: float64
Example: using different operation on different column
It is possible to use different operation on different column. Consider the following example.
import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(5, 3), index = pd.date_range('1/1/2018', periods=5), columns = ['col1', 'col2', 'col3'] ) print("The DataFrame contains:") print(df) #different operation on different columns print("\nAggregation on col2 and col3 returns:") print(df.agg({'col2':np.sum, 'col3':'average'}))
The output of the above code will be:
The DataFrame contains: col1 col2 col3 2018-01-01 1.120440 -0.229896 -0.133962 2018-01-02 0.568975 -0.577267 1.605496 2018-01-03 -0.077285 -0.439441 0.763634 2018-01-04 -1.538413 2.900758 -0.848652 2018-01-05 -0.135597 0.477658 -0.108792 Aggregation on col2 and col3 returns: col2 2.131813 col3 0.255545 dtype: float64
❮ Pandas DataFrame - Functions