Pandas - DataFrame Aggregations
The Pandas DataFrame aggregate() function is used to perform aggregations using one or more operations over the specified axis. The syntax for using this function is given below:
Note: The agg() function is an alias for aggregate() function.
Syntax
DataFrame.aggregate(func=None, axis=0)
Parameters
func |
Required. Specify function used for aggregating the data. If a function, must either work when passed a DataFrame or when passed to DataFrame.apply. Accepted combinations are:
|
axis |
Optional. Specify axis on which the function need to be applied. Default is 0. If 0 or 'index': applies function to each column. If 1 or 'columns': applies function to each row. |
Return Value
Returns following:
- Scalar when Series.aggregate is called with single function.
- Series when DataFrame.aggregate is called with a single function.
- DataFrame when DataFrame.aggregate is called with multiple functions.
Example: using aggregate() on whole DataFrame
In the example below, a DataFrame df is created. The aggregate() function is applied on whole DataFrame to calculate sum of each columns.
import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(5, 3), index = pd.date_range('1/1/2018', periods=5), columns = ['col1', 'col2', 'col3']) print("The DataFrame contains:") print(df) print("\nAggregation returns:") print(df.aggregate(np.sum))
The output of the above code will be:
The DataFrame contains: col1 col2 col3 2018-01-01 -0.687624 0.831343 0.369147 2018-01-02 -0.196517 1.979898 -1.000479 2018-01-03 0.258959 1.040191 0.001425 2018-01-04 0.630665 -0.739803 0.875488 2018-01-05 0.082997 -0.826209 1.453134 Aggregation returns: col1 0.088481 col2 2.285421 col3 1.698715 dtype: float64
Example: using more operations on whole DataFrame
Multiple operations can be applied on a DataFrame at the same time. Like in the example below, three operations - sum, mean and average are applied at the same time.
import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(5, 3), index = pd.date_range('1/1/2018', periods=5), columns = ['col1', 'col2', 'col3']) print("The DataFrame is:") print(df) print("\nAggregation returns:") print(df.aggregate([np.sum, np.mean, 'average']))
The output of the above code will be:
The DataFrame is: col1 col2 col3 2018-01-01 0.535302 -0.791378 -0.858626 2018-01-02 -1.465922 0.375763 0.588740 2018-01-03 -0.407567 0.452181 0.687858 2018-01-04 0.327220 0.626945 -2.319354 2018-01-05 0.337624 0.041807 0.278022 Aggregation returns: col1 col2 col3 sum -0.673343 0.705318 -1.623361 mean -0.134669 0.141064 -0.324672 average -0.134669 0.141064 -0.324672
Example: using aggregate() on selected columns
Instead of whole DataFrame, the aggregate() function can be applied on selected columns. Consider the following example.
import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(5, 3), index = pd.date_range('1/1/2018', periods=5), columns = ['col1', 'col2', 'col3']) print("The DataFrame contains:") print(df) #aggregation on single column print("\nAggregation on col2 returns:") print(df['col2'].aggregate(np.sum)) #aggregation on multiple columns print("\nAggregation on col2 and col3 returns:") print(df[['col2', 'col3']].aggregate(np.sum))
The output of the above code will be:
The DataFrame contains: col1 col2 col3 2018-01-01 -0.495941 0.600591 -0.193495 2018-01-02 0.057907 1.990024 1.523120 2018-01-03 0.592138 0.260888 -0.547469 2018-01-04 -0.225838 -1.233463 -0.152349 2018-01-05 0.454969 -0.500580 0.703518 Aggregation on col2 returns: 1.11745945804 Aggregation on col2 and col3 returns: col2 1.117459 col3 1.333325 dtype: float64
Example: using different operation on different column
It is possible to use different operation on different column. Consider the following example.
import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(5, 3), index = pd.date_range('1/1/2018', periods=5), columns = ['col1', 'col2', 'col3']) print("The DataFrame contains:") print(df) #different operation on different columns print("\nAggregation on col2 and col3 returns:") print(df.aggregate({'col2':np.sum, 'col3':'average'}))
The output of the above code will be:
The DataFrame contains: col1 col2 col3 2018-01-01 1.120440 -0.229896 -0.133962 2018-01-02 0.568975 -0.577267 1.605496 2018-01-03 -0.077285 -0.439441 0.763634 2018-01-04 -1.538413 2.900758 -0.848652 2018-01-05 -0.135597 0.477658 -0.108792 Aggregation on col2 and col3 returns: col2 2.131813 col3 0.255545 dtype: float64