Last modified: Feb 08, 2026 By Alexander Williams

Convert Parquet to JSON in Python

Data formats are key in modern programming. You often need to switch between them. Parquet is great for storage. JSON is perfect for web APIs. Converting between them is a common task.

This guide shows you how. We will use Python. The process is simple and powerful.

Why Convert Parquet to JSON?

Parquet is a columnar storage format. It is highly efficient for analytics. It compresses data well. It is fast for reading specific columns.

JSON is a universal data interchange format. It is human-readable. It is the standard for web services. Many applications consume JSON data directly.

You might convert Parquet to JSON for a web API. Or for data sharing with a non-analytical tool. Sometimes, you just need a readable format for debugging.

Understanding data type conversion is also crucial. For instance, when your JSON output requires specific number formats, you might need to convert integers to binary or handle float to integer conversions.

Prerequisites for the Conversion

You need Python installed. We will use two main libraries. They are pandas and pyarrow.

Install them using pip. Open your terminal or command prompt. Run the following command.


pip install pandas pyarrow

Pandas provides the DataFrame structure. PyArrow is the engine that reads Parquet files. It is much faster than the older `fastparquet` engine.

Method 1: Using Pandas and PyArrow

This is the most common method. Pandas has a simple read_parquet() function. It also has a to_json() method for DataFrames.

Let's walk through a full example. We will create a sample Parquet file first. Then we will convert it.


import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq

# 1. Create a sample DataFrame
data = {
    'id': [1, 2, 3, 4],
    'name': ['Alice', 'Bob', 'Charlie', 'Diana'],
    'score': [85.5, 92.0, 78.5, 88.0],
    'active': [True, False, True, True]
}
df = pd.DataFrame(data)

# 2. Write the DataFrame to a Parquet file
df.to_parquet('sample_data.parquet', engine='pyarrow')
print("Sample Parquet file 'sample_data.parquet' created.")


Sample Parquet file 'sample_data.parquet' created.

Now, let's read this Parquet file and convert it to JSON.


# 3. Read the Parquet file back into a DataFrame
df_from_parquet = pd.read_parquet('sample_data.parquet', engine='pyarrow')

# 4. Convert the DataFrame to JSON
json_string = df_from_parquet.to_json(orient='records', indent=2)

# 5. Print and save the JSON
print("Converted JSON:")
print(json_string)

# Save to a file
with open('output_data.json', 'w') as f:
    f.write(json_string)
print("\nJSON saved to 'output_data.json'.")


Converted JSON:
[
  {
    "id":1,
    "name":"Alice",
    "score":85.5,
    "active":true
  },
  {
    "id":2,
    "name":"Bob",
    "score":92.0,
    "active":false
  },
  {
    "id":3,
    "name":"Charlie",
    "score":78.5,
    "active":true
  },
  {
    "id":4,
    "name":"Diana",
    "score":88.0,
    "active":true
  }
]

JSON saved to 'output_data.json'.

The to_json() method has a key parameter: orient. We used `'records'`. This creates a list of JSON objects. Each object is a row from the DataFrame.

Other useful `orient` values are `'split'`, `'index'`, and `'columns'`. Choose based on your needs. The `indent` parameter makes the JSON pretty-printed.

Method 2: Using PyArrow Directly

Pandas is user-friendly. But PyArrow can be more efficient for large files. It avoids the full pandas DataFrame overhead.

Here is how to do it with PyArrow's Table API.


import pyarrow.parquet as pq
import json

# 1. Read the Parquet file into a PyArrow Table
table = pq.read_table('sample_data.parquet')

# 2. Convert the PyArrow Table to a list of dictionaries
# This is a more manual but memory-efficient approach for large datasets
list_of_dicts = []
for batch in table.to_batches():
    # Convert RecordBatch to Pandas DataFrame chunk
    df_chunk = batch.to_pandas()
    # Extend the main list with records from this chunk
    list_of_dicts.extend(df_chunk.to_dict('records'))

# 3. Convert the list to a JSON string
json_string_direct = json.dumps(list_of_dicts, indent=2)

print("JSON from PyArrow direct method (first 200 chars):")
print(json_string_direct[:200] + "...")

# Save it
with open('output_pyarrow.json', 'w') as f:
    f.write(json_string_direct)


JSON from PyArrow direct method (first 200 chars):
[
  {
    "id": 1,
    "name": "Alice",
    "score": 85.5,
    "active": true
  },
  {
    "id": 2,
    "name": "Bob",
    "score": 92.0,
    "active": false
  },
...

This method uses pq.read_table(). It reads the file as a PyArrow Table. We then process it in batches. This is good for very large datasets that don't fit in memory.

The to_batches() method is key. It allows chunked processing.

Handling Data Types and Potential Issues

Data types can cause surprises. Parquet has precise types like `int32`, `timestamp[ns]`. JSON is more limited. It mainly has number, string, boolean, null, array, and object.

Pandas and PyArrow handle most conversions automatically. But you should be aware.

Dates and Timestamps: These become strings in JSON. The format is ISO 8601 by default.

Nested Structures: Parquet can store lists and structs. The default `orient='records'` may not work perfectly. You might need `orient='values'` or custom serialization.

Large Numbers: JSON has no integer size limit. But some parsers may struggle. Ensure your numbers are within a safe range.

Sometimes, data comes as strings that need reformatting. For example, you might need to convert a string to a float before analysis, or convert a number back to a string for specific JSON formatting.

Performance Tips and Best Practices

Conversion can be slow for big files. Follow these tips.

Use PyArrow Engine: Always specify `engine='pyarrow'` in `read_parquet()`. It is faster than 'fastparquet'.

Convert Only Needed Columns: Read a subset of columns. Use the `columns` parameter in `read_parquet()`.


# Read only 'id' and 'name' columns
df_subset = pd.read_parquet('sample_data.parquet', columns=['id', 'name'], engine='pyarrow')

Process in Chunks: For huge files, use the PyArrow batch method shown earlier. Or use pandas with `chunksize` (if reading from a compatible source).

Choose the Right JSON Orientation: The `orient` affects file size and structure. `'records'` is common for APIs. `'split'` can be more efficient for certain operations.

Compress JSON Output: If file size matters, skip `indent`. Use `indent=None`. The JSON will be a single line. It is much smaller.

Just like optimizing file conversions in other contexts, such as when you convert images between file formats or convert Sass to CSS, the right tool and method save time and resources.

Conclusion

Converting Parquet to JSON in Python is straightforward. The pandas library makes it easy with read_parquet() and to_json().

For advanced use or large datasets, use PyArrow directly. It offers more control and better memory management.

Remember to consider data types. Watch out for dates and nested structures. Use the performance tips for big files.

Now you can bridge the gap between efficient storage and universal data exchange. Your data pipelines just became more flexible.