Last modified: May 11, 2026 By Alexander Williams
Polars Multi-threading & Performance Tuning
Polars is built for speed. It uses all CPU cores by default. But default settings may not be optimal for your hardware or data. This guide shows you how to tune Polars for maximum performance.
How Polars Uses Multi-Threading
Polars uses Rayon, a Rust library for parallel computing. It splits work across all available cores. This makes Polars faster than Pandas for most operations.
By default, Polars uses all CPU threads. This is great for big datasets. But for small data, overhead from threading can slow things down.
Check Current Thread Count
You can see how many threads Polars uses:
import polars as pl
# Check current thread count
print(pl.thread_pool_size())
16
This shows 16 threads on an 8-core CPU with hyper-threading.
Controlling Thread Count
You can set the thread pool size. This helps when running Polars alongside other processes.
import polars as pl
# Set thread count to 4
pl.set_thread_pool_size(4)
# Verify
print(pl.thread_pool_size())
4
Set this early in your script. Changing it mid-execution may not affect running queries.
When to Reduce Threads
Reduce threads when:
- Your dataset fits in cache (under 1 GB)
- You run multiple Polars scripts at once
- Other apps need CPU time
- You do I/O-bound work like reading many small files
More threads do not always mean faster. Overhead from context switching hurts small tasks.
Performance Tuning with Streaming
For datasets larger than RAM, use streaming. Streaming processes data in chunks. It avoids memory overload.
import polars as pl
# Create a large lazy query
q = (
pl.scan_csv("large_file.csv")
.group_by("category")
.agg(pl.sum("value"))
)
# Execute with streaming
result = q.collect(streaming=True)
print(result)
shape: (10, 2)
┌──────────┬───────┐
│ category ┆ value │
╞══════════╪═══════╡
│ A ┆ 54321 │
│ B ┆ 12345 │
└──────────┴───────┘
Streaming uses less memory. It can be slower for small data. Use it only when data exceeds RAM.
Profiling Query Performance
Use profile() to see where time goes. This prints a detailed breakdown of each step.
import polars as pl
df = pl.DataFrame({
"group": ["A", "B", "A", "B"] * 100,
"value": range(400)
})
# Profile the query
q = df.lazy().group_by("group").agg(pl.mean("value"))
profile = q.profile()
print(profile[1]) # The profile data
shape: (3, 2)
┌─────────────────────────────────┬──────────────┐
│ node ┆ start_end │
╞═════════════════════════════════╪══════════════╡
│ CsvReader ┆ 0.000123 │
│ GroupBy ┆ 0.000456 │
│ Collect ┆ 0.000789 │
└─────────────────────────────────┴──────────────┘
Look for slow nodes. GroupBy and joins are common bottlenecks.
Optimize with Lazy Evaluation
Lazy evaluation lets Polars optimize your query plan. It reorders operations for speed. Use lazy() and collect() instead of eager methods.
Learn more about Polars LazyFrame Query Optimization for deeper insights.
import polars as pl
df = pl.DataFrame({
"name": ["Alice", "Bob", "Charlie"],
"score": [85, 92, 78]
})
# Eager (faster for small data)
result_eager = df.filter(pl.col("score") > 80)
print(result_eager)
shape: (2, 2)
┌───────┬───────┐
│ name ┆ score │
╞═══════╪═══════╡
│ Alice ┆ 85 │
│ Bob ┆ 92 │
└───────┴───────┘
Use Expressions Over Custom Functions
Polars expressions are fast. They run in Rust. Custom Python functions are slow because they break parallelism.
For complex logic, see Polars Custom Functions with map_elements & map_batches.
import polars as pl
df = pl.DataFrame({"x": [1, 2, 3, 4]})
# Fast: Expression
result_fast = df.with_columns(
(pl.col("x") * 2).alias("double_x")
)
# Slow: Custom function
def double(val):
return val * 2
result_slow = df.with_columns(
pl.col("x").map_elements(double).alias("double_x_slow")
)
print(result_fast)
print(result_slow)
shape: (4, 2)
┌─────┬──────────┐
│ x ┆ double_x │
╞═════╪══════════╡
│ 1 ┆ 2 │
│ 2 ┆ 4 │
│ 3 ┆ 6 │
│ 4 ┆ 8 │
└─────┴──────────┘
shape: (4, 3)
┌─────┬──────────┬───────────────┐
│ x ┆ double_x ┆ double_x_slow │
╞═════╪══════════╪═══════════════╡
│ 1 ┆ 2 ┆ 2 │
│ 2 ┆ 4 ┆ 4 │
│ 3 ┆ 6 ┆ 6 │
│ 4 ┆ 8 ┆ 8 │
└─────┴──────────┴───────────────┘
Use expressions whenever possible. They are 10-100x faster.
Memory Management Tips
Polars manages memory efficiently. But you can help it:
- Use
scan_csv()instead ofread_csv()for large files - Drop unused columns early
- Use
shrink_to_fit()to reduce memory after filtering
For large files, read Scan Large Files with Polars Without Memory Load.
import polars as pl
df = pl.DataFrame({
"id": range(1000),
"value": range(1000),
"unused": ["x"] * 1000
})
# Drop unused column
df_clean = df.drop("unused")
# Shrink memory
df_shrunk = df_clean.shrink_to_fit()
print(df_shrunk.estimated_size("mb"))
0.015
Choosing the Right Data Types
Polars uses Arrow for data. Arrow supports many types. Using the smallest type saves memory and speeds up operations.
import polars as pl
df = pl.DataFrame({
"big_int": pl.Series([1, 2, 3], dtype=pl.Int64),
"small_int": pl.Series([1, 2, 3], dtype=pl.Int8),
})
print(df.schema)
{'big_int': Int64, 'small_int': Int8}
Use Int8 for small ranges. Use Float32 instead of Float64 when precision is not critical.
Conclusion
Polars is fast by default. But you can make it faster. Control thread count for your hardware. Use streaming for big data. Profile queries to find bottlenecks. Prefer expressions over custom functions. Choose the right data types.
These tuning steps will help you process data faster. Start with profiling. Then apply changes one by one. Measure the impact. You will see big improvements.