Pandas - Introduction
Pandas is a fast, powerful, flexible and easy to use data analysis and data manipulation tool. It is written for the Python programming language. The name is derived from the econometrics term "panel data", which means data sets that include observations over multiple time periods for the variable. Wes McKinney started building pandas in 2008 at AQR Capital when in need of high performance, flexible tool for analysis of data. It is open sourced software released under the three-clause BSD license. Python with Pandas is used in a wide range of domains including finance, economics, analytics and statistics etc.
The following are the important features of the pandas package:
- DataFrame object for data manipulation with integrated indexing.
- Tools for reading and writing data between in-memory data structures and different file formats.
- Fast and efficient DataFrame object.
- Data alignment and integrated handling of missing data.
- Group by data for aggregation and transformations.
- Reshaping and pivoting of data sets.
- Data set merging and joining.
- Label-based slicing, fancy indexing, and subsetting of large data sets.
- Data structure column insertion and deletion.
- Group by engine allowing split-apply-combine operations on data sets.
- Hierarchical axis indexing to work with high-dimensional data in a lower-dimensional data structure.
- Time Series functionality: Date range generation and frequency conversion, moving window statistics, moving window linear regressions, date shifting and lagging.
- Provides data filtration.