Last modified: Jun 11, 2026

Install Bitsandbytes in Python

Bitsandbytes is a powerful Python library. It helps you run large machine learning models with less memory. It uses quantization to reduce model size. This makes training and inference faster. In this guide, you will learn how to install Bitsandbytes in Python. We will cover all important steps. This article is perfect for beginners.

What is Bitsandbytes?

Bitsandbytes is a library for CUDA-based quantization. It allows you to load models in 8-bit or 4-bit precision. This reduces GPU memory usage. For example, a 7B parameter model can fit on a single GPU. It also supports optimizers like AdamW in 8-bit. This saves memory during training.

The library is widely used with Hugging Face Transformers. It works well with PyTorch. Many developers use it for large language models (LLMs). Installing it correctly is key to avoid errors.

Prerequisites for Installation

Before you install Bitsandbytes, check your system. You need a CUDA-compatible GPU. Bitsandbytes only works with NVIDIA GPUs. It requires CUDA toolkit version 11 or higher. Also, install Python 3.8 or later. Use a virtual environment for clean setup.

Make sure you have PyTorch installed. Bitsandbytes depends on PyTorch. Check your CUDA version with this command:


nvidia-smi

This shows your GPU and CUDA version. If you don't have CUDA, install it from NVIDIA's website. Then install PyTorch with CUDA support.

Install Bitsandbytes via pip

The easiest way is using pip. Open your terminal. Activate your virtual environment. Run this command:


pip install bitsandbytes

This installs the latest stable version. If you have a specific CUDA version, use a wheel. For CUDA 11.8, run:


pip install bitsandbytes-cuda118

Replace "118" with your CUDA version. For example, CUDA 12.1 uses bitsandbytes-cuda121. This ensures compatibility. After install, verify it works.

Example: Verify Installation

Run this Python code to test:


import bitsandbytes as bnb
print(bnb.__version__)

Expected output:


0.43.0

If you see a version number, installation succeeded. If you get an error, check the next sections.

Install from Source (Optional)

Sometimes pip fails. You can build from source. This gives you the latest features. First, clone the repository:


git clone https://github.com/TimDettmers/bitsandbytes.git
cd bitsandbytes

Then install with pip:


pip install -e .

This builds the CUDA kernels. It may take a few minutes. Ensure you have gcc and CUDA toolkit installed. This method is useful for custom setups.

Common Installation Errors and Fixes

Error: "CUDA not found"

Bitsandbytes needs CUDA. If you see this error, check your CUDA installation. Run nvcc --version in terminal. If it's missing, install CUDA toolkit. Also, set the CUDA_HOME environment variable.

Example for Linux:


export CUDA_HOME=/usr/local/cuda
export PATH=$CUDA_HOME/bin:$PATH

Then reinstall Bitsandbytes.

Error: "libcudart.so not found"

This means the CUDA runtime library is missing. Install the CUDA toolkit properly. On Ubuntu, use:


sudo apt install nvidia-cuda-toolkit

Then retry installation.

Error: "bitsandbytes not compatible with this PyTorch version"

Use a compatible version. Check the Bitsandbytes GitHub for compatibility. For PyTorch 2.0+, use Bitsandbytes 0.41+. Upgrade PyTorch if needed.

Using Bitsandbytes for Model Quantization

After installation, you can use it. Here's a simple example with a Hugging Face model. We load a model in 8-bit precision.


from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    load_in_8bit=True,  # Use 8-bit quantization
    device_map="auto"
)

# Test inference
input_text = "Hello, how are you?"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=50)
print(tokenizer.decode(outputs[0]))

This code loads GPT-2 in 8-bit. It uses less GPU memory. The device_map="auto" handles placement. Output will be a generated text.

Example Output


Hello, how are you? I am fine, thank you. How can I help you today?

You can also use 4-bit quantization. Change load_in_8bit to load_in_4bit. This saves even more memory. Bitsandbytes makes this simple.

Installing on Windows

Windows support is limited. Bitsandbytes works best on Linux. For Windows, use WSL2 (Windows Subsystem for Linux). Install WSL2 and Ubuntu. Then follow the Linux instructions. Alternatively, use Google Colab for free GPU access.

Steps for WSL2:

Install WSL2 from Microsoft Store.
Open Ubuntu terminal.
Install CUDA toolkit inside WSL2.
Install PyTorch and Bitsandbytes as above.

This gives you a Linux-like environment. It avoids Windows-specific issues.

Installing on macOS

Bitsandbytes does not support macOS. It requires CUDA, which is not available on Mac. Use cloud services like AWS or Google Colab. Or use a Linux virtual machine. This is a limitation to note.

Best Practices for Installation

Always use a virtual environment. This prevents package conflicts. Use venv or conda. For example:


python -m venv bnb_env
source bnb_env/bin/activate  # On Linux/Mac
bnb_env\Scripts\activate     # On Windows

Then install Bitsandbytes inside. Also, keep your CUDA driver updated. Check your GPU compute capability. Bitsandbytes supports compute 5.0 and above. Older GPUs may not work.

Important: Always match Bitsandbytes version with PyTorch. Use pip list to check versions. Upgrade if needed.

Conclusion

Installing Bitsandbytes in Python is straightforward. Use pip for most cases. Build from source for custom needs. Ensure CUDA is installed and compatible. This library saves memory and speeds up models. It is essential for large language model work. Follow this guide to avoid errors. Now you can quantize models with ease. Happy coding!