Last modified: May 11, 2026 By Alexander Williams
Polars Custom Functions with map_elements & map_batches
Polars is fast. But sometimes you need custom logic. Built-in expressions don't cover everything. That's when you use map_elements and map_batches. These functions let you apply your own Python code to columns or whole DataFrames.
This guide explains both. You will learn when to use each. You will see clear examples. By the end, you can add custom functions without slowing down your pipeline.
Why Use Custom Functions in Polars?
Polars is designed for speed. It uses vectorized operations. But real-world data needs special rules. Maybe you need to parse a complex string. Maybe you need to call an external API. Custom functions fill this gap.
They let you break out of Polars' expression system. Use them sparingly. They are slower than native Polars. But they are essential for unique tasks.
Understanding map_elements
map_elements works on a single column. It applies a function to each element. Think of it like Python's map() but for a Polars Series. The function takes one value and returns one value.
Use map_elements when your logic is per-row. For example, cleaning a single text field. It is simple and direct.
Example: Cleaning a Name Column
Imagine you have a column with messy names. You want to remove numbers and make them lowercase.
import polars as pl
# Sample DataFrame
df = pl.DataFrame({
"name": ["Alice123", "Bob456", "Charlie789"]
})
# Custom function to clean a single name
def clean_name(name: str) -> str:
# Remove digits and convert to lowercase
cleaned = ''.join(c for c in name if not c.isdigit())
return cleaned.lower()
# Apply using map_elements
df_cleaned = df.with_columns(
pl.col("name").map_elements(clean_name, return_dtype=pl.Utf8).alias("clean_name")
)
print(df_cleaned)
shape: (3, 2)
┌───────────┬────────────┐
│ name ┆ clean_name │
│ --- ┆ --- │
│ str ┆ str │
╞═══════════╪════════════╡
│ Alice123 ┆ alice │
│ Bob456 ┆ bob │
│ Charlie789┆ charlie │
└───────────┴────────────┘
The function runs on each row. You must specify the return_dtype. This helps Polars plan the operation. Without it, Polars infers the type, which can be slower.
Important:map_elements is not vectorized. It calls Python for each row. For large datasets, this is slow. Use it only when you cannot use native expressions.
Understanding map_batches
map_batches works on a whole Series or DataFrame at once. It receives a Series (or DataFrame) and returns a Series (or DataFrame). This is more efficient than map_elements because you can use vectorized operations inside your function.
Use map_batches when your logic needs the whole column. For example, applying a machine learning model to a batch of rows.
Example: Scaling a Column with Custom Logic
Suppose you want to normalize a numeric column using min-max scaling. You can do this with native expressions, but this shows the pattern.
import polars as pl
# Sample DataFrame
df = pl.DataFrame({
"value": [10, 20, 30, 40, 50]
})
# Custom batch function
def min_max_scale(series: pl.Series) -> pl.Series:
min_val = series.min()
max_val = series.max()
# Vectorized operation on the whole Series
scaled = (series - min_val) / (max_val - min_val)
return scaled
# Apply using map_batches
df_scaled = df.with_columns(
pl.col("value").map_batches(min_max_scale).alias("scaled_value")
)
print(df_scaled)
shape: (5, 2)
┌───────┬──────────────┐
│ value ┆ scaled_value │
│ --- ┆ --- │
│ i64 ┆ f64 │
╞═══════╪══════════════╡
│ 10 ┆ 0.0 │
│ 20 ┆ 0.25 │
│ 30 ┆ 0.5 │
│ 40 ┆ 0.75 │
│ 50 ┆ 1.0 │
└───────┴──────────────┘
Here, map_batches passes the whole Series. You can use NumPy or other libraries inside the function. This is much faster than iterating per element.
When to Use Which?
Choose map_elements for simple, per-element logic. Choose map_batches for batch operations. The rule is simple: if you can use vectorized operations, use map_batches. If you must process each element individually, use map_elements.
For performance, map_batches is almost always better. It reduces Python overhead. It lets you use efficient libraries like NumPy.
If you work with complex data structures, see our guide on Nested Data in Polars: Lists & Structs. It helps when your custom functions deal with nested columns.
Performance Considerations
Custom functions break Polars' query optimization. They force eager execution. This can slow down your pipeline. Always try native expressions first.
For large datasets, map_batches is faster than map_elements. But both are slower than native Polars. Use them only when necessary.
If you work with lazy queries, custom functions can cause issues. Learn more about Polars LazyFrame Query Optimization to keep your pipeline fast.
Error Handling in Custom Functions
Your custom functions may fail. A bad value can crash the whole operation. Use try-except blocks inside your functions. Return a default value or None on error.
import polars as pl
df = pl.DataFrame({
"value": ["1", "2", "not_a_number", "4"]
})
def safe_parse(s: str) -> float:
try:
return float(s)
except ValueError:
return None
df_parsed = df.with_columns(
pl.col("value").map_elements(safe_parse, return_dtype=pl.Float64).alias("parsed")
)
print(df_parsed)
shape: (4, 2)
┌──────────────┬────────┐
│ value ┆ parsed │
│ --- ┆ --- │
│ str ┆ f64 │
╞══════════════╪════════╡
│ 1 ┆ 1.0 │
│ 2 ┆ 2.0 │
│ not_a_number ┆ null │
│ 4 ┆ 4.0 │
└──────────────┴────────┘
This keeps your pipeline running. Handle errors gracefully. It is a best practice.
Working with DataFrames in map_batches
map_batches can also work on whole DataFrames. Use it when your function needs multiple columns. For example, combining two columns into one with custom logic.
import polars as pl
df = pl.DataFrame({
"first": ["Alice", "Bob"],
"last": ["Smith", "Jones"]
})
def combine_names(df_in: pl.DataFrame) -> pl.Series:
# Combine first and last name
return pl.Series([f"{f} {l}" for f, l in zip(df_in["first"], df_in["last"])])
df_combined = df.with_columns(
pl.struct(["first", "last"]).map_batches(lambda s: combine_names(s.struct.to_frame())).alias("full_name")
)
print(df_combined)
shape: (2, 3)
┌───────┬───────┬────────────┐
│ first ┆ last ┆ full_name │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞═══════╪═══════╪════════════╡
│ Alice ┆ Smith ┆ Alice Smith│
│ Bob ┆ Jones ┆ Bob Jones │
└───────┴───────┴────────────┘
This pattern is powerful. It lets you combine columns in custom ways. For more on reshaping data, check Reshape Data in Polars: Pivot, Melt & Transpose.
Conclusion
map_elements and map_batches are your tools for custom logic in Polars. Use map_elements for per-element tasks. Use map_batches for batch operations. Both are slower than native expressions. But they are essential when you need flexibility.
Always prefer vectorized operations inside map_batches. Handle errors gracefully. Keep your functions simple. This keeps your code clean and your data pipeline fast.
Now you can add custom Python functions to Polars with confidence. Start with small examples. Test on sample data. Then scale up.