Last modified: Jun 14, 2026
Install OnnxRuntime-GenAI in Python
Installing OnnxRuntime-GenAI in Python allows you to run generative AI models efficiently. This guide covers all steps from prerequisites to verification. You will learn to set up the library on Windows, macOS, and Linux.
OnnxRuntime-GenAI extends the ONNX Runtime with generation capabilities. It supports models like GPT-2, LLaMA, and more. The installation process is straightforward but requires attention to dependencies.
Prerequisites
Before you start, ensure Python 3.8 or higher is installed. Check your Python version with this command:
python --version
You also need pip updated to the latest version. Run this command to upgrade pip:
python -m pip install --upgrade pip
For GPU acceleration, install CUDA 11.8 or later and cuDNN 8.6 or later. This is optional but recommended for faster inference.
Installation Steps
Step 1: Create a Virtual Environment
Isolate your project dependencies. Use venv to create a clean environment. Run these commands:
python -m venv onnx_genai_env
Activate the environment. On Windows use:
onnx_genai_env\Scripts\activate
On macOS and Linux use:
source onnx_genai_env/bin/activate
Step 2: Install OnnxRuntime-GenAI
The simplest method is using pip. Install the package directly from PyPI:
pip install onnxruntime-genai
This installs the CPU version. For GPU support, install the CUDA variant:
pip install onnxruntime-genai-cuda
If you need the latest development version, install from GitHub:
pip install git+https://github.com/microsoft/onnxruntime-genai.git
Step 3: Verify Installation
Test the installation with a simple Python script. Create a file called test_install.py:
# test_install.py
import onnxruntime_genai as og
# Check version
print("OnnxRuntime-GenAI version:", og.__version__)
# Verify basic functionality
print("Installation successful!")
Run the script:
python test_install.py
Expected output:
OnnxRuntime-GenAI version: 0.1.0
Installation successful!
Example: Run a Generative Model
Now let's test with a real model. Download a pre-converted ONNX model like GPT-2. Use the og.Model class to load and generate text.
First, install the model downloader:
pip install huggingface_hub
Download a small model:
# download_model.py
from huggingface_hub import hf_hub_download
# Download GPT-2 ONNX model
model_path = hf_hub_download(
repo_id="onnx-community/gpt2",
filename="model.onnx"
)
print("Model downloaded to:", model_path)
Now run inference with the downloaded model:
# run_inference.py
import onnxruntime_genai as og
# Load the model
model = og.Model("gpt2.onnx")
# Create a tokenizer
tokenizer = og.Tokenizer(model)
# Prepare input text
input_text = "Artificial intelligence is"
tokens = tokenizer.encode(input_text)
# Generate text
output = model.generate(tokens, max_length=50)
# Decode and print result
result = tokenizer.decode(output[0])
print("Generated text:", result)
Expected output (may vary):
Generated text: Artificial intelligence is a field of computer science that focuses on creating intelligent machines that can perform tasks that typically require human intelligence.
Troubleshooting Common Issues
Import Error: No module named 'onnxruntime_genai'
This usually means the package was not installed correctly. Verify the installation with pip list. Ensure you activated the correct virtual environment. Reinstall if needed.
CUDA Errors
If you get CUDA-related errors, check your GPU drivers. Update CUDA toolkit to version 11.8 or higher. Install the correct cuDNN version matching your CUDA installation.
Version Conflicts
OnnxRuntime-GenAI may conflict with other ONNX packages. Use a fresh virtual environment to avoid dependency issues. If conflicts persist, install without dependencies using pip install --no-deps onnxruntime-genai.
Best Practices
Always use a virtual environment for each project. This prevents version clashes. Pin the OnnxRuntime-GenAI version in your requirements.txt file for reproducibility.
For production, use the GPU version if available. It provides up to 10x speed improvement. Monitor memory usage with large models like LLaMA-7B.
Conclusion
Installing OnnxRuntime-GenAI in Python is simple when you follow these steps. Create a virtual environment, install the package via pip, and verify with a test script. For GPU acceleration, install the CUDA variant. The library enables fast generative AI inference with ONNX models. Start building your GenAI applications today.