Last modified: Nov 25, 2025 By Alexander Williams

Batch Process Excel Files with Python pyexcel

Working with multiple Excel files can be time-consuming. Python pyexcel makes it easy. You can automate batch processing tasks efficiently.

This guide shows you how to handle many files at once. You will learn to read, process, and combine Excel data in bulk.

Why Batch Process Excel Files?

Manual Excel file handling is slow and error-prone. Batch processing saves time and reduces mistakes. It ensures consistency across all files.

Common use cases include monthly reports, data consolidation, and bulk data cleaning. Automation makes these tasks manageable.

Python pyexcel provides simple tools for these operations. You don't need advanced programming skills to get started.

Installing pyexcel

First, install the pyexcel library. Use pip for installation. The command below installs pyexcel with Excel support.

 
# Install pyexcel and xlsx support
pip install pyexcel pyexcel-xlsx

If you encounter installation issues, check our guide on Fix Python ImportError: No Module Named pyexcel.

Reading Multiple Excel Files

Start by reading all Excel files from a folder. Use Python's glob module to find files. Then process each file with pyexcel.

 
import glob
import pyexcel as pe

# Get all Excel files in directory
excel_files = glob.glob("data/*.xlsx")

# Process each file
for file_path in excel_files:
    # Load spreadsheet
    sheet = pe.get_sheet(file_name=file_path)
    
    print(f"Processing: {file_path}")
    print(f"Rows: {sheet.number_of_rows}, Columns: {sheet.number_of_columns}")

Processing: data/sales_january.xlsx
Rows: 150, Columns: 8
Processing: data/sales_february.xlsx
Rows: 145, Columns: 8
Processing: data/sales_march.xlsx
Rows: 160, Columns: 8

Combining Data from Multiple Files

Often you need to combine data from multiple files. Create a master dataset by appending rows from all files. This is useful for quarterly reports.

 
import glob
import pyexcel as pe

# Create empty master sheet
master_sheet = pe.Sheet()

excel_files = glob.glob("reports/*.xlsx")
header_added = False

for file_path in excel_files:
    sheet = pe.get_sheet(file_name=file_path)
    
    if not header_added:
        # Add header from first file
        master_sheet.row += sheet.row[0]
        header_added = True
    
    # Add data rows (skip header)
    for row in sheet.rows()[1:]:
        master_sheet.row += row

# Save combined data
master_sheet.save_as("combined_data.xlsx")
print(f"Combined {len(excel_files)} files into master sheet")

This approach works well for files with identical structures. All data ends up in one comprehensive spreadsheet.

Batch Data Transformation

You can apply transformations to multiple files. Clean data, add columns, or calculate values across all files. This ensures consistent processing.

 
import glob
import pyexcel as pe

def process_sales_file(file_path):
    """Process individual sales file"""
    sheet = pe.get_sheet(file_name=file_path)
    
    # Add calculated column (example: total sales)
    sheet.column += ["Total Sales"]
    
    # Calculate total for each row (price * quantity)
    for i, row in enumerate(sheet.rows()[1:], 1):  # Skip header
        price = float(row[2])
        quantity = float(row[3])
        total = price * quantity
        sheet[i, 5] = total  # Add to new column
    
    # Save processed file
    output_path = file_path.replace(".xlsx", "_processed.xlsx")
    sheet.save_as(output_path)
    return output_path

# Process all sales files
sales_files = glob.glob("sales_data/*.xlsx")
processed_files = []

for file in sales_files:
    result = process_sales_file(file)
    processed_files.append(result)

print(f"Processed {len(processed_files)} files")

For more advanced data cleaning, see Clean Normalize Spreadsheet Data Python pyexcel.

Filtering Data Across Multiple Files

Filter specific data from multiple Excel files. Extract records that meet certain criteria. This helps in focused analysis.

 
import glob
import pyexcel as pe

def filter_high_value_orders(input_files, output_file, threshold=1000):
    """Extract high value orders from multiple files"""
    result_sheet = pe.Sheet()
    header_set = False
    
    for file_path in input_files:
        sheet = pe.get_sheet(file_name=file_path)
        
        if not header_set:
            result_sheet.row += sheet.row[0]  # Add header
            header_set = True
        
        # Filter rows where amount > threshold (assuming amount in column 4)
        for row in sheet.rows()[1:]:
            if float(row[3]) > threshold:
                result_sheet.row += row
    
    result_sheet.save_as(output_file)
    return result_sheet.number_of_rows - 1  # Exclude header

# Filter high value orders
files = glob.glob("orders/*.xlsx")
high_value_count = filter_high_value_orders(files, "high_value_orders.xlsx", 1000)

print(f"Found {high_value_count} high value orders")

Found 47 high value orders

Batch File Format Conversion

Convert multiple Excel files to other formats. Pyexcel supports CSV, JSON, and other formats. This is useful for system integration.

 
import glob
import pyexcel as pe

def convert_excel_to_csv(excel_files, output_dir):
    """Convert Excel files to CSV format"""
    converted_files = []
    
    for excel_file in excel_files:
        # Load Excel file
        sheet = pe.get_sheet(file_name=excel_file)
        
        # Create CSV filename
        csv_file = excel_file.replace(".xlsx", ".csv")
        csv_file = csv_file.split("/")[-1]  # Get filename only
        csv_path = f"{output_dir}/{csv_file}"
        
        # Save as CSV
        sheet.save_as(csv_path)
        converted_files.append(csv_path)
    
    return converted_files

# Convert all Excel files to CSV
excel_files = glob.glob("source_data/*.xlsx")
csv_files = convert_excel_to_csv(excel_files, "csv_output")

print(f"Converted {len(csv_files)} files to CSV")

Learn more about format conversion in Python pyexcel Guide: Convert CSV XLSX JSON.

Error Handling in Batch Processing

Batch processing should handle errors gracefully. Skip corrupt files and continue processing. Log errors for review.

 
import glob
import pyexcel as pe
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

def safe_batch_process(files):
    """Process files with error handling"""
    successful = 0
    failed = 0
    
    for file_path in files:
        try:
            sheet = pe.get_sheet(file_name=file_path)
            # Your processing logic here
            logger.info(f"Successfully processed {file_path}")
            successful += 1
            
        except Exception as e:
            logger.error(f"Failed to process {file_path}: {str(e)}")
            failed += 1
            continue
    
    return successful, failed

# Process with error handling
files = glob.glob("data/*.xlsx")
successful, failed = safe_batch_process(files)

print(f"Processing complete: {successful} successful, {failed} failed")

INFO:__main__:Successfully processed data/file1.xlsx
ERROR:__main__:Failed to process data/corrupt.xlsx: File is not a zip file
INFO:__main__:Successfully processed data/file2.xlsx
Processing complete: 2 successful, 1 failed

Performance Tips for Large Batches

Processing many large files can be slow. Use these tips to improve performance. They help with memory usage and speed.

Process files sequentially rather than loading all at once. This reduces memory usage significantly for large datasets.

Use generator expressions for large data operations. They process data incrementally without loading everything into memory.

Consider using pyexcel-io for better performance with very large files. It provides optimized input/output operations.

Conclusion

Batch processing Excel files with Python pyexcel saves tremendous time. It automates repetitive tasks and ensures data consistency.

You can read multiple files, combine data, apply transformations, and convert formats. Error handling makes your scripts robust.

Start with small batches and gradually tackle larger projects. The techniques shown here scale to handle enterprise-level data processing needs.

Pyexcel's simple API makes Excel automation accessible to all Python users. Batch processing becomes straightforward and reliable.