Member-only story
Library of the week #13: Polars 🐻❄️
Lightning-fast DataFrame library for Python

Why Polars?
While Pandas is the number 1 Python library for data manipulation ( and the first article of Library of the Week! ), Polars emerges as a compelling alternative that demands attention. The choice of a data manipulation library holds significant weight, and Polars is increasingly proving itself as a worthy contender, with better performance (especially for larger datasets) and a more modern (user-friendly) syntax.
What is Polars
At its essence, Polars is a dynamic data manipulation library designed for efficient handling of large datasets. The project started in 2021.
A feature that sets Polars apart is lazy computation : it optimizes query within a new data type : the LazyFrame . This feature significantly enhances the library’s capability for intricate data transformations and analyses.
Polars also has the familiar (for Pandas users) concepts of DataFrame and Series. This design choice facilitates a smooth transition for users familiar with Pandas, making Polars a flexible addition to any data professional’s toolkit. Additionally, Polars excels in its support for various data sources, allowing easy loading and saving of data in formats ranging from CSV files to Parquet files and databases.
Polars distinguishes itself by using Apache Arrow and Rust . Apache Arrow is an efficient memory format for both table-like and tree-like data, optimized for modern CPUs and GPUs. Rust is a high-performance programming language, it optimizes critical components of its operations, ensuring efficient memory management, low-level control, and high-speed execution, contributing to a substantial boost in overall performance. Polars efficiency is also related to its use of parallelization (doing several operations at the same time, instead of one after the other).