Last modified: Nov 25, 2025 By Alexander Williams
Batch Process Excel Files with Python pyexcel
Working with multiple Excel files can be time-consuming. Python pyexcel makes it easy. You can automate batch processing tasks efficiently.
This guide shows you how to handle many files at once. You will learn to read, process, and combine Excel data in bulk.
Why Batch Process Excel Files?
Manual Excel file handling is slow and error-prone. Batch processing saves time and reduces mistakes. It ensures consistency across all files.
Common use cases include monthly reports, data consolidation, and bulk data cleaning. Automation makes these tasks manageable.
Python pyexcel provides simple tools for these operations. You don't need advanced programming skills to get started.
Installing pyexcel
First, install the pyexcel library. Use pip for installation. The command below installs pyexcel with Excel support.
# Install pyexcel and xlsx support
pip install pyexcel pyexcel-xlsx
If you encounter installation issues, check our guide on Fix Python ImportError: No Module Named pyexcel.
Reading Multiple Excel Files
Start by reading all Excel files from a folder. Use Python's glob module to find files. Then process each file with pyexcel.
import glob
import pyexcel as pe
# Get all Excel files in directory
excel_files = glob.glob("data/*.xlsx")
# Process each file
for file_path in excel_files:
# Load spreadsheet
sheet = pe.get_sheet(file_name=file_path)
print(f"Processing: {file_path}")
print(f"Rows: {sheet.number_of_rows}, Columns: {sheet.number_of_columns}")
Processing: data/sales_january.xlsx
Rows: 150, Columns: 8
Processing: data/sales_february.xlsx
Rows: 145, Columns: 8
Processing: data/sales_march.xlsx
Rows: 160, Columns: 8
Combining Data from Multiple Files
Often you need to combine data from multiple files. Create a master dataset by appending rows from all files. This is useful for quarterly reports.
import glob
import pyexcel as pe
# Create empty master sheet
master_sheet = pe.Sheet()
excel_files = glob.glob("reports/*.xlsx")
header_added = False
for file_path in excel_files:
sheet = pe.get_sheet(file_name=file_path)
if not header_added:
# Add header from first file
master_sheet.row += sheet.row[0]
header_added = True
# Add data rows (skip header)
for row in sheet.rows()[1:]:
master_sheet.row += row
# Save combined data
master_sheet.save_as("combined_data.xlsx")
print(f"Combined {len(excel_files)} files into master sheet")
This approach works well for files with identical structures. All data ends up in one comprehensive spreadsheet.
Batch Data Transformation
You can apply transformations to multiple files. Clean data, add columns, or calculate values across all files. This ensures consistent processing.
import glob
import pyexcel as pe
def process_sales_file(file_path):
"""Process individual sales file"""
sheet = pe.get_sheet(file_name=file_path)
# Add calculated column (example: total sales)
sheet.column += ["Total Sales"]
# Calculate total for each row (price * quantity)
for i, row in enumerate(sheet.rows()[1:], 1): # Skip header
price = float(row[2])
quantity = float(row[3])
total = price * quantity
sheet[i, 5] = total # Add to new column
# Save processed file
output_path = file_path.replace(".xlsx", "_processed.xlsx")
sheet.save_as(output_path)
return output_path
# Process all sales files
sales_files = glob.glob("sales_data/*.xlsx")
processed_files = []
for file in sales_files:
result = process_sales_file(file)
processed_files.append(result)
print(f"Processed {len(processed_files)} files")
For more advanced data cleaning, see Clean Normalize Spreadsheet Data Python pyexcel.
Filtering Data Across Multiple Files
Filter specific data from multiple Excel files. Extract records that meet certain criteria. This helps in focused analysis.
import glob
import pyexcel as pe
def filter_high_value_orders(input_files, output_file, threshold=1000):
"""Extract high value orders from multiple files"""
result_sheet = pe.Sheet()
header_set = False
for file_path in input_files:
sheet = pe.get_sheet(file_name=file_path)
if not header_set:
result_sheet.row += sheet.row[0] # Add header
header_set = True
# Filter rows where amount > threshold (assuming amount in column 4)
for row in sheet.rows()[1:]:
if float(row[3]) > threshold:
result_sheet.row += row
result_sheet.save_as(output_file)
return result_sheet.number_of_rows - 1 # Exclude header
# Filter high value orders
files = glob.glob("orders/*.xlsx")
high_value_count = filter_high_value_orders(files, "high_value_orders.xlsx", 1000)
print(f"Found {high_value_count} high value orders")
Found 47 high value orders
Batch File Format Conversion
Convert multiple Excel files to other formats. Pyexcel supports CSV, JSON, and other formats. This is useful for system integration.
import glob
import pyexcel as pe
def convert_excel_to_csv(excel_files, output_dir):
"""Convert Excel files to CSV format"""
converted_files = []
for excel_file in excel_files:
# Load Excel file
sheet = pe.get_sheet(file_name=excel_file)
# Create CSV filename
csv_file = excel_file.replace(".xlsx", ".csv")
csv_file = csv_file.split("/")[-1] # Get filename only
csv_path = f"{output_dir}/{csv_file}"
# Save as CSV
sheet.save_as(csv_path)
converted_files.append(csv_path)
return converted_files
# Convert all Excel files to CSV
excel_files = glob.glob("source_data/*.xlsx")
csv_files = convert_excel_to_csv(excel_files, "csv_output")
print(f"Converted {len(csv_files)} files to CSV")
Learn more about format conversion in Python pyexcel Guide: Convert CSV XLSX JSON.
Error Handling in Batch Processing
Batch processing should handle errors gracefully. Skip corrupt files and continue processing. Log errors for review.
import glob
import pyexcel as pe
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
def safe_batch_process(files):
"""Process files with error handling"""
successful = 0
failed = 0
for file_path in files:
try:
sheet = pe.get_sheet(file_name=file_path)
# Your processing logic here
logger.info(f"Successfully processed {file_path}")
successful += 1
except Exception as e:
logger.error(f"Failed to process {file_path}: {str(e)}")
failed += 1
continue
return successful, failed
# Process with error handling
files = glob.glob("data/*.xlsx")
successful, failed = safe_batch_process(files)
print(f"Processing complete: {successful} successful, {failed} failed")
INFO:__main__:Successfully processed data/file1.xlsx
ERROR:__main__:Failed to process data/corrupt.xlsx: File is not a zip file
INFO:__main__:Successfully processed data/file2.xlsx
Processing complete: 2 successful, 1 failed
Performance Tips for Large Batches
Processing many large files can be slow. Use these tips to improve performance. They help with memory usage and speed.
Process files sequentially rather than loading all at once. This reduces memory usage significantly for large datasets.
Use generator expressions for large data operations. They process data incrementally without loading everything into memory.
Consider using pyexcel-io for better performance with very large files. It provides optimized input/output operations.
Conclusion
Batch processing Excel files with Python pyexcel saves tremendous time. It automates repetitive tasks and ensures data consistency.
You can read multiple files, combine data, apply transformations, and convert formats. Error handling makes your scripts robust.
Start with small batches and gradually tackle larger projects. The techniques shown here scale to handle enterprise-level data processing needs.
Pyexcel's simple API makes Excel automation accessible to all Python users. Batch processing becomes straightforward and reliable.