Last modified: Jun 14, 2026

Install OnnxRuntime-GenAI in Python

Installing OnnxRuntime-GenAI in Python allows you to run generative AI models efficiently. This guide covers all steps from prerequisites to verification. You will learn to set up the library on Windows, macOS, and Linux.

OnnxRuntime-GenAI extends the ONNX Runtime with generation capabilities. It supports models like GPT-2, LLaMA, and more. The installation process is straightforward but requires attention to dependencies.

Prerequisites

Before you start, ensure Python 3.8 or higher is installed. Check your Python version with this command:


python --version

You also need pip updated to the latest version. Run this command to upgrade pip:


python -m pip install --upgrade pip

For GPU acceleration, install CUDA 11.8 or later and cuDNN 8.6 or later. This is optional but recommended for faster inference.

Installation Steps

Step 1: Create a Virtual Environment

Isolate your project dependencies. Use venv to create a clean environment. Run these commands:


python -m venv onnx_genai_env

Activate the environment. On Windows use:


onnx_genai_env\Scripts\activate

On macOS and Linux use:


source onnx_genai_env/bin/activate

Step 2: Install OnnxRuntime-GenAI

The simplest method is using pip. Install the package directly from PyPI:


pip install onnxruntime-genai

This installs the CPU version. For GPU support, install the CUDA variant:


pip install onnxruntime-genai-cuda

If you need the latest development version, install from GitHub:


pip install git+https://github.com/microsoft/onnxruntime-genai.git

Step 3: Verify Installation

Test the installation with a simple Python script. Create a file called test_install.py:


# test_install.py
import onnxruntime_genai as og

# Check version
print("OnnxRuntime-GenAI version:", og.__version__)

# Verify basic functionality
print("Installation successful!")

Run the script:


python test_install.py

Expected output:


OnnxRuntime-GenAI version: 0.1.0
Installation successful!

Example: Run a Generative Model

Now let's test with a real model. Download a pre-converted ONNX model like GPT-2. Use the og.Model class to load and generate text.

First, install the model downloader:


pip install huggingface_hub

Download a small model:


# download_model.py
from huggingface_hub import hf_hub_download

# Download GPT-2 ONNX model
model_path = hf_hub_download(
    repo_id="onnx-community/gpt2",
    filename="model.onnx"
)
print("Model downloaded to:", model_path)

Now run inference with the downloaded model:


# run_inference.py
import onnxruntime_genai as og

# Load the model
model = og.Model("gpt2.onnx")

# Create a tokenizer
tokenizer = og.Tokenizer(model)

# Prepare input text
input_text = "Artificial intelligence is"
tokens = tokenizer.encode(input_text)

# Generate text
output = model.generate(tokens, max_length=50)

# Decode and print result
result = tokenizer.decode(output[0])
print("Generated text:", result)

Expected output (may vary):


Generated text: Artificial intelligence is a field of computer science that focuses on creating intelligent machines that can perform tasks that typically require human intelligence.

Troubleshooting Common Issues

Import Error: No module named 'onnxruntime_genai'

This usually means the package was not installed correctly. Verify the installation with pip list. Ensure you activated the correct virtual environment. Reinstall if needed.

CUDA Errors

If you get CUDA-related errors, check your GPU drivers. Update CUDA toolkit to version 11.8 or higher. Install the correct cuDNN version matching your CUDA installation.

Version Conflicts

OnnxRuntime-GenAI may conflict with other ONNX packages. Use a fresh virtual environment to avoid dependency issues. If conflicts persist, install without dependencies using pip install --no-deps onnxruntime-genai.

Best Practices

Always use a virtual environment for each project. This prevents version clashes. Pin the OnnxRuntime-GenAI version in your requirements.txt file for reproducibility.

For production, use the GPU version if available. It provides up to 10x speed improvement. Monitor memory usage with large models like LLaMA-7B.

Conclusion

Installing OnnxRuntime-GenAI in Python is simple when you follow these steps. Create a virtual environment, install the package via pip, and verify with a test script. For GPU acceleration, install the CUDA variant. The library enables fast generative AI inference with ONNX models. Start building your GenAI applications today.