Last modified: Feb 16, 2026 By Alexander Williams

Python GitHub MS Word Accessibility Verification

Creating accessible documents is crucial. It ensures everyone can read your content. This includes people using screen readers. Microsoft Word is a common tool for creating documents. But how do you check if a Word file is accessible?

Manually checking is slow and error-prone. This is where Python and GitHub come in. You can automate the verification process. This guide shows you how.

Why Automate Accessibility Checks?

Accessibility is often an afterthought. Manual checks are tedious. You might forget to add alt text to an image. Or you might use incorrect heading levels. Automated scripts catch these errors quickly.

Using Python, you can build a script. This script will analyze a Word document. It will check for common accessibility issues. You can then run this script on GitHub. This ensures every document update is checked automatically.

This method saves time. It also makes your content more inclusive. It is a win-win for developers and readers.

Setting Up Your Python Environment

First, you need the right tools. You will use the python-docx library. This library lets Python read and write Word files. Install it using pip.


pip install python-docx

Create a new Python file. You can name it check_accessibility.py. This file will contain your verification logic.

You will also need a Word document to test. Create a simple document with some text, a heading, and an image. Save it as sample.docx.

Core Accessibility Checks with Python

What makes a Word document accessible? Several key elements. Your script will check for these.

Alt Text for Images: Screen readers describe images using alt text. Images without alt text are inaccessible. Your script must find all images. It must then check if they have a description.

Proper Heading Structure: Headings create a document outline. They should follow a logical order (H1, then H2, etc.). Your script should verify this hierarchy.

Table Headers: Tables should have defined header rows. This helps screen readers interpret table data correctly.

Link Descriptions: Hyperlinks should have meaningful text. "Click here" is not helpful. The link text should describe the destination.

Building the Verification Script

Let's write the Python code. Start by importing the necessary module.


from docx import Document

def check_word_accessibility(docx_path):
    """
    Checks a Word document for basic accessibility features.
    Args:
        docx_path (str): The file path to the .docx document.
    Returns:
        dict: A summary of findings and errors.
    """
    doc = Document(docx_path)
    findings = {
        'images_without_alt': [],
        'headings': [],
        'tables_without_headers': [],
        'links': []
    }
    
    # Check paragraphs for headings and links
    for i, paragraph in enumerate(doc.paragraphs):
        # Check for heading styles
        if paragraph.style.name.startswith('Heading'):
            level = paragraph.style.name
            findings['headings'].append({'text': paragraph.text[:50], 'level': level, 'para_index': i})
        
        # Check for hyperlinks in runs
        for run in paragraph.runs:
            if run.hyperlink:
                link_text = run.text if run.text else "[No Text]"
                findings['links'].append({'text': link_text, 'address': run.hyperlink.target})
    
    # Check for images and their alt text
    # Note: python-docx has limited direct access to alt text.
    # This is a conceptual check. In practice, you might need a different library like `docx2python`.
    rels = doc.part.rels
    for rel in rels.values():
        if "image" in rel.target_ref:
            # This is a placeholder. Actual alt text retrieval is more complex.
            findings['images_without_alt'].append(rel.target_ref)
    
    # Check tables for header row designation
    for table_index, table in enumerate(doc.tables):
        # Assume first row should be headers. Check if style indicates it.
        # This is a simplified check.
        if table.rows and len(table.rows[0].cells) > 0:
            # In a real scenario, you would check the `_tc` properties for `tblHeader`
            pass
        else:
            findings['tables_without_headers'].append(table_index)
    
    return findings

# Example usage
if __name__ == "__main__":
    results = check_word_accessibility("sample.docx")
    print("Accessibility Check Results:")
    print(f"Headings found: {len(results['headings'])}")
    print(f"Images potentially without alt text: {len(results['images_without_alt'])}")
    print(f"Hyperlinks found: {len(results['links'])}")
    print(f"Tables without header check: {len(results['tables_without_headers'])}")

The script loads a document. It iterates through paragraphs to find headings and links. It also looks for images and tables. The results are stored in a dictionary.

Run the script from your terminal.


python check_accessibility.py

You should see an output like this.


Accessibility Check Results:
Headings found: 2
Images potentially without alt text: 1
Hyperlinks found: 3
Tables without header check: 0

Integrating with GitHub Actions

The real power comes with automation. You can use GitHub Actions. This runs your script automatically. It runs when you push new code or update a document.

Create a new file in your repo. The path is .github/workflows/check-accessibility.yml.


name: Word Accessibility Check

on:
  push:
    paths:
      - '**.docx' # Trigger only when .docx files are changed
  pull_request:
    paths:
      - '**.docx'

jobs:
  verify:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout repository
        uses: actions/checkout@v3

      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.9'

      - name: Install dependencies
        run: pip install python-docx

      - name: Run accessibility checker
        run: python check_accessibility.py

      # Optional: Fail the check if issues are found
      - name: Evaluate Results
        run: |
          # Add logic here to parse script output and exit with an error if criteria are not met
          echo "Accessibility check completed. Review the output above."

This workflow is triggered by changes to .docx files. It sets up Python, installs the library, and runs your script. The results appear in the Actions tab of your GitHub repository.

Limitations and Advanced Tools

The python-docx library has limits. It cannot easily read alt text for images. For a more robust check, consider other tools.

You could use the docx2python library. It might offer better access to document properties. Another option is to convert the .docx to HTML. Then use an HTML accessibility checker.

For enterprise needs, commercial software exists. But for most projects, a simple Python script is a great start. It raises awareness and catches major issues.

Conclusion

Automating MS Word accessibility checks is smart. It uses Python for the analysis. It uses GitHub for the automation. This ensures your documents are inclusive.

Start with the basic script provided. Check for headings, links, and images. Integrate it into your GitHub workflow. This creates a safety net for your content.

Remember, accessibility is not a feature. It is a fundamental requirement. Automated checks help you meet this requirement consistently. They free you to focus on creating great content.

Explore more about Python automation to enhance your projects. Dive deeper into web accessibility guidelines to understand all requirements. Your readers will thank you.