Last modified: Feb 21, 2026 By Alexander Williams

Python Character Encoding Guide for Beginners

Working with text is a core part of programming. You might read a file, get data from the web, or process user input. In Python, handling text correctly means understanding character encoding. This guide explains what encoding is and how to use it.

We will cover the basics. You will learn about common encodings like UTF-8. We will also solve frequent errors like UnicodeDecodeError. By the end, you will handle text with confidence.

What is Character Encoding?

A computer stores everything as numbers. Text characters like 'A' or '😀' must be converted into numbers for storage. This mapping is called character encoding. It is a set of rules.

An encoding defines how to represent text as bytes. A byte is a sequence of 8 bits. Different encodings map characters to different byte sequences. Using the wrong one causes errors.

Think of it like a secret code. The sender (your program) and receiver (a file or network) must use the same codebook (encoding) to understand each other.

Key Encoding Standards

You will encounter several encoding standards. The most important ones are ASCII and UTF-8.

ASCII: The Basic Set

ASCII was one of the first encodings. It uses 7 bits to represent 128 characters. This includes English letters, digits, and basic symbols. It cannot handle characters from other languages like 'é' or 'α'.

Unicode: The Universal Standard

Unicode is not an encoding. It is a universal character set. It assigns a unique number (code point) to every character from every writing system. For example, 'A' is U+0041 and '😀' is U+1F600.

Unicode solves the limitation of ASCII. It can represent millions of characters.

UTF-8: The Recommended Encoding

UTF-8 is an encoding that implements Unicode. It is the dominant standard on the web and in modern systems. It is a variable-width encoding.

This means common ASCII characters use 1 byte. Other characters use 2, 3, or 4 bytes. UTF-8 is backward compatible with ASCII. An ASCII text file is also a valid UTF-8 file.

You should use UTF-8 by default for all your text in Python 3.

Strings and Bytes in Python

Python 3 made a crucial distinction. It has two main types for representing text and binary data.

The str Type (Unicode Strings)

The str type holds Unicode text. When you write a string literal like s = "hello", you create a str object. It is a sequence of Unicode code points.


# Creating a Unicode string
text = "Python 🐍 is fun!"
print(type(text))  # Output will show 
print(text)
    

Python 🐍 is fun!
    

The bytes Type (Raw Data)

The bytes type holds raw byte sequences. It is used for binary data, like images or encoded text. You create it with a b prefix.


# Creating a bytes object
data = b"Hello"
print(type(data))  # Output will show 
print(data)
    

b'Hello'
    

You cannot store a non-ASCII character directly in a bytes literal without encoding it first.

Converting Between str and bytes

You must convert text to bytes for saving or sending. You convert bytes back to text for reading. This is done with the encode() and decode() methods.

Encoding: str to bytes

Use the encode() method on a string. You must specify an encoding, like 'utf-8'.


# Encoding a string to bytes
unicode_text = "Café & Python"
byte_data = unicode_text.encode('utf-8')  # Encode using UTF-8
print(byte_data)
print(f"Encoded to {len(byte_data)} bytes")
    

b'Caf\xc3\xa9 & Python'
Encoded to 14 bytes
    

The 'é' character is encoded as two bytes: \xc3\xa9.

Decoding: bytes to str

Use the decode() method on a bytes object. You must use the same encoding that was used for encoding.


# Decoding bytes back to a string
received_bytes = b'Caf\xc3\xa9 & Python'
decoded_text = received_bytes.decode('utf-8')  # Decode using UTF-8
print(decoded_text)
    

Café & Python
    

If you use the wrong encoding to decode, you will get an error or garbled text.

Common Encoding Errors and Solutions

Errors happen when the encoding process breaks. The two most common are UnicodeDecodeError and UnicodeEncodeError.

UnicodeDecodeError

This happens when you try to decode a sequence of bytes into a string using the wrong encoding. Python cannot map some bytes to a valid character.


# This will cause a UnicodeDecodeError
# We have bytes encoded in 'latin-1', but we try to decode as 'utf-8'
latin_bytes = 'café'.encode('latin-1')  # Encoded as latin-1
try:
    text = latin_bytes.decode('utf-8')  # Wrong encoding!
except UnicodeDecodeError as e:
    print(f"Error: {e}")
    

Error: 'utf-8' codec can't decode byte 0xe9 in position 3: invalid continuation byte
    

Solution: Know the correct encoding of your data. If unsure, try common ones like 'utf-8', 'latin-1', or 'cp1252'. You can also use libraries like `chardet` to guess.

UnicodeEncodeError

This happens when you try to encode a string containing characters that cannot be represented in the target encoding.


# This will cause a UnicodeEncodeError
# The string has a character not supported by ASCII
text_with_emoji = "Hello 😀"
try:
    data = text_with_emoji.encode('ascii')  # ASCII can't encode emoji
except UnicodeEncodeError as e:
    print(f"Error: {e}")
    

Error: 'ascii' codec can't encode character '\U0001f600' in position 6: ordinal not in range(128)
    

Solution: Use an encoding that supports all your characters, like UTF-8. You can also use the `errors` parameter to handle problematic characters.

Working with Files and Encoding

When reading or writing text files, you must specify the encoding. Use the `encoding` parameter in the open() function.


# Writing to a file with UTF-8 encoding
content = "This file contains an emoji: 🎉"
with open('example.txt', 'w', encoding='utf-8') as f:
    f.write(content)
print("File written successfully.")

# Reading from the file with the same encoding
with open('example.txt', 'r', encoding='utf-8') as f:
    read_content = f.read()
print(f"Read from file: {read_content}")
    

File written successfully.
Read from file: This file contains an emoji: 🎉
    

If you omit the `encoding` parameter, Python uses the system's default encoding. This can cause errors on different machines. Always specify encoding='utf-8' for text files.

Best Practices for Encoding in Python

Follow these rules to avoid common pitfalls.

  • Use UTF-8 Everywhere: Make UTF-8 your default for files, network data, and databases.
  • Decode Early, Encode Late: In your program, work with str objects. Only convert to bytes when you must output data.
  • Specify Encoding Explicitly: Never rely on the system default. Always pass `encoding='utf-8'` to open().
  • Handle Errors Gracefully: Use the `errors` parameter (e.g., `errors='ignore'` or `errors='replace'`) if you must process malformed data.

Conclusion

Character encoding is a fundamental concept. Python 3's clear separation of str and bytes helps you manage it. Remember that str is for text and bytes is for data.

Always encode and decode with purpose. Use UTF-8 as your standard encoding. This will prevent most common errors.

When you see a UnicodeDecodeError, check the source of your data. Find the correct encoding and use it in your decode() call. With this knowledge, you can handle text from any source confidently.