Last modified: May 11, 2026 By Alexander Williams

Polars vs Pandas: Real Benchmarks

Data processing speed matters. When your dataset grows, every second counts. Pandas has been the go-to library for years. But Polars is a new challenger. It promises faster performance and lower memory use. This article puts them head to head. We use real-world benchmarks. You will see clear differences.

We test common tasks. These include filtering, grouping, joining, and file I/O. We use a dataset with 10 million rows. It has mixed data types. The tests run on a standard laptop. No special hardware needed. Let's dive in.

Setup and Data

We create a synthetic dataset. It has columns for id, category, value, and date. We use 10 million rows. This size shows real performance gaps. Both libraries load the same data from a CSV file.


import pandas as pd
import polars as pl
import numpy as np
from datetime import datetime, timedelta

# Create sample data
np.random.seed(42)
n = 10_000_000
data = {
    "id": range(n),
    "category": np.random.choice(["A", "B", "C", "D"], n),
    "value": np.random.randn(n) * 100,
    "date": [datetime(2020, 1, 1) + timedelta(days=int(x)) for x in np.random.randint(0, 1095, n)]
}

# Save to CSV (run once)
df_pd = pd.DataFrame(data)
df_pd.to_csv("benchmark_data.csv", index=False)
print("CSV saved with 10 million rows.")

Benchmark 1: Reading CSV

Reading data is the first step. Pandas uses single-threaded parsing. Polars uses multi-threading. This gives Polars a big advantage.


import time

# Pandas read
start = time.time()
df_pd = pd.read_csv("benchmark_data.csv")
pandas_time = time.time() - start
print(f"Pandas read CSV: {pandas_time:.2f} seconds")

# Polars read
start = time.time()
df_pl = pl.read_csv("benchmark_data.csv")
polars_time = time.time() - start
print(f"Polars read CSV: {polars_time:.2f} seconds")

Pandas read CSV: 4.12 seconds
Polars read CSV: 1.05 seconds

Polars is nearly 4x faster at reading CSV files. This is because it reads data in parallel. For large files, this difference grows. If you work with big CSVs, Polars saves time from the start. You can also use scan_csv for even larger files. Learn more in our guide on Scan Large Files with Polars Without Memory Load.

Benchmark 2: Filtering Rows

Filtering selects rows based on conditions. This is a common operation. We filter for category "A" and value greater than 50.


# Pandas filter
start = time.time()
filtered_pd = df_pd[(df_pd["category"] == "A") & (df_pd["value"] > 50)]
pandas_time = time.time() - start
print(f"Pandas filter: {pandas_time:.4f} seconds")

# Polars filter
start = time.time()
filtered_pl = df_pl.filter((pl.col("category") == "A") & (pl.col("value") > 50))
polars_time = time.time() - start
print(f"Polars filter: {polars_time:.4f} seconds")

Pandas filter: 0.0892 seconds
Polars filter: 0.0341 seconds

Polars is about 2.6x faster for filtering. Both are fast. But on larger datasets, this gap widens. Polars uses vectorized operations and avoids copying data. This makes it efficient.

Benchmark 3: GroupBy Aggregation

Grouping and aggregating is a heavy task. We group by category and compute mean and sum of value.


# Pandas groupby
start = time.time()
grouped_pd = df_pd.groupby("category")["value"].agg(["mean", "sum"])
pandas_time = time.time() - start
print(f"Pandas groupby: {pandas_time:.4f} seconds")

# Polars groupby
start = time.time()
grouped_pl = df_pl.group_by("category").agg([
    pl.col("value").mean().alias("mean"),
    pl.col("value").sum().alias("sum")
])
polars_time = time.time() - start
print(f"Polars groupby: {polars_time:.4f} seconds")

Pandas groupby: 0.2101 seconds
Polars groupby: 0.0489 seconds

Polars is 4.3x faster for groupby. It uses multi-threaded aggregation. Pandas is single-threaded here. For complex aggregations, Polars shines. It also handles window functions well. Check our guide on Polars Window Functions & Rolling Computations for more.

Benchmark 4: Joining DataFrames

Joins are common in data analysis. We create a small lookup table and join it. This simulates a typical enrichment task.


# Create lookup table
lookup = pd.DataFrame({
    "category": ["A", "B", "C", "D"],
    "description": ["Alpha", "Beta", "Gamma", "Delta"]
})
lookup_pl = pl.from_pandas(lookup)

# Pandas join
start = time.time()
joined_pd = df_pd.merge(lookup, on="category", how="left")
pandas_time = time.time() - start
print(f"Pandas join: {pandas_time:.4f} seconds")

# Polars join
start = time.time()
joined_pl = df_pl.join(lookup_pl, on="category", how="left")
polars_time = time.time() - start
print(f"Polars join: {polars_time:.4f} seconds")

Pandas join: 0.3512 seconds
Polars join: 0.0987 seconds

Polars is 3.6x faster for joins. It uses a hash join algorithm. Pandas also uses hash join, but Polars parallelizes it. This makes a big difference on large datasets.

Benchmark 5: Writing to Parquet

Parquet is a modern columnar format. It's efficient for storage and I/O. We write the full dataset to Parquet.


# Pandas write parquet
start = time.time()
df_pd.to_parquet("output_pandas.parquet")
pandas_time = time.time() - start
print(f"Pandas write parquet: {pandas_time:.2f} seconds")

# Polars write parquet
start = time.time()
df_pl.write_parquet("output_polars.parquet")
polars_time = time.time() - start
print(f"Polars write parquet: {polars_time:.2f} seconds")

Pandas write parquet: 3.45 seconds
Polars write parquet: 1.12 seconds

Polars is 3x faster at writing Parquet. It uses Arrow's native format. This gives it a speed boost. Pandas uses PyArrow, but Polars has tighter integration. Learn more in our Polars Arrow Interoperability Guide.

Benchmark Summary

Here is a quick comparison table. All times are in seconds.

OperationPandasPolarsSpeedup
Read CSV4.121.053.9x
Filter rows0.0890.0342.6x
GroupBy agg0.2100.0494.3x
Join DataFrames0.3510.0993.6x
Write Parquet3.451.123.1x

Polars wins in every test. The speedup ranges from 2.6x to 4.3x. This is because Polars is built for modern hardware. It uses all CPU cores. It also avoids unnecessary data copies.

Why Polars Is Faster

Polars is written in Rust. It uses Apache Arrow as its memory model. This gives two key advantages. First, Arrow is a columnar format. It works well with modern CPUs. Second, Polars uses a query optimizer. It rewrites your code for efficiency. Pandas, in contrast, is built on NumPy. NumPy is row-oriented and single-threaded. This limits its performance.

Polars also has a lazy API. This lets you build a query plan. The optimizer then runs it efficiently. For example, you can chain multiple operations. The optimizer reorders them for speed. Read more in Polars Lazy vs Eager API: When to Use.

When to Use Pandas

Pandas is still useful. It has a huge ecosystem. Many libraries depend on it. If you need tight integration with scikit-learn or matplotlib, Pandas is easier. It also has more community support. For small datasets (under 1 million rows), the speed difference is small. Pandas is also more forgiving with messy data.

When to Use Polars

Use Polars for large datasets. If you work with millions of rows, it's a game changer. It also handles streaming data well. The lazy API is great for complex pipelines. Polars also has better memory management. It can process data that exceeds RAM. This is hard with Pandas.

Conclusion

Polars is faster than Pandas in real-world benchmarks. It is 2-4x faster on common tasks. This makes it ideal for data engineering and large-scale analysis. Pandas remains a solid choice for small data and prototyping. But for performance, Polars is the clear winner. Try it on your next project. The speed difference will surprise you.