Member-only story
These Methods Will Change How You Use Pandas
Make Pandas Faster with a Simple Change
Written by: Amal Hasni & Dhia Hmila

Pandas
is the go-to Python package for manipulating data. It provides multiple methods to perform different operations on DataFrame
objects. In this article, we'll discuss different ways of efficiently filtering data and creating new columns.
Building dataset
First, let’s create a dataset:
It should look something like this, where each line describes the height, weight, and hip circumference of an individual:
. ------- . ------- . ----------------- .
| height | weight | hip_circumference |
| ------- | ------- | ----------------- |
| 2.16584 | 257.631 | 98.9404 |
| 1.39118 | 245.736 | 107.727 |
| 1.25381 | 162.01 | 106.492 |
| 1.51138 | 269.898 | 103.189 |
| 1.52707 | 151.803 | 103.018 |
. ------- | ------- | ----------------- .
1 - Filtering a DataFrame
Suppose, we want to only keep overweight individuals based on the BMI (Body Mass Index) to perform further analysis. We want to filter our Dataframe and then create a new copy.
💡 The BMI is the Body Mass Index and is computed using the following formula:
This is fairly easy to do in classic pandas:
bmi = df['weight'] / (df['height'] ** 2)
new_df = df[bmi >= 25].copy()
The alternative way of doing this is to use the query
method:
new_df = df.query('(weight / height ** 2) >= 25')
You notice that instead of using regular python to define our filter, we use an expression that is directly evaluated by pandas
(or to be precise numexpr
which is the engine used by pandas).
Why would you want to use this?
- Well you might find it more readable without the visual clutter (notice there are no…