Pandas eval() and query() are NOT faster

Eric Lee
2 min readMay 5, 2020

The purpose of eval() and query() is to improve performance by using C within the bounds of NumPy.

However, a reduction in computational time is not guaranteed for all situations especially since these libraries are continually updated. Referring to dated articles and books does not accurately reflect the current performance of these libraries.

I will ignore discussing Numba which is also a powerful performance enhancing tool.

NumPy vs. NumExpr

NumPy is built for mathematical computations and takes advantage of vectorization and broadcasting. One downside being additional overhead costs due to memory allocation for each computational step.

This is where NumExpr is supposed to come in handy.

NumExpr does not evaluate the arrays using their full size which should reduce computational time when arrays are large.

If we test the code from a well-known data science book Python Data Science Handbook written by Jake VanderPlas, the efficiency advantage no longer exists.

Image from Python Data Science Handbook
Screenshot of locally run code

Tested on a Macbook Pro 2018

In the book, code block 2 evaluated to 87.1 ms and 42.2 ms for code block 3.

When to use eval() and query()

The author recommends using these functions when the size of the data frames are significant in comparison to system memory.

However, even after increasing the size of the data frames above 1 gigabyte, the execution times are almost identical.

Keep this example in mind as you build your data analytics skills.

--

--