Last modified: Jan 12, 2026 By Alexander Williams

Scrape AJAX Content with BeautifulSoup

BeautifulSoup is a great tool for parsing HTML. But it has a big limit. It cannot run JavaScript. This is a problem with AJAX.

AJAX loads content dynamically after the page loads. The initial HTML is often empty. This guide shows you how to work around this.

What is AJAX Content?

AJAX stands for Asynchronous JavaScript and XML. It lets websites update content without reloading the page. Data is fetched from a server in the background.

This data is often in JSON format. It is then inserted into the page using JavaScript. This creates dynamic, fast user experiences.

For a scraper, the initial page source is incomplete. The data you see in your browser is not in the HTML. BeautifulSoup alone cannot see it.

The Core Problem with BeautifulSoup

BeautifulSoup only parses static HTML. It is a library for pulling data out of HTML and XML files. It does not execute any client-side code.

When you use requests.get() and feed it to BeautifulSoup, you get the initial HTML. This HTML often lacks the dynamic content.

You might see empty <div> tags or loading placeholders. The real data is missing. This is the main challenge.

Strategy 1: Find the Data Source

The first method is the most efficient. Do not try to render the page. Instead, find where the data comes from.

Modern web apps often load data from a JSON API. This data is embedded in the page or fetched separately. You can find it.

Open your browser's Developer Tools (F12). Go to the Network tab. Reload the page and look for XHR or Fetch requests.

You will see requests to API endpoints. These often return clean JSON. You can call these endpoints directly with Python.

Example: Finding Embedded JSON

Sometimes, JSON data is embedded in a <script> tag. It might be assigned to a JavaScript variable. You can extract it.

Look for patterns like window.DATA = {...} or var products = [...] inside script tags. Use BeautifulSoup to find these scripts.

Then, use string methods or regex to isolate the JSON string. Finally, parse it with Python's json module.


import requests
from bs4 import BeautifulSoup
import json
import re

url = 'https://example.com/ajax-page'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

# Find all script tags
scripts = soup.find_all('script')
for script in scripts:
    if script.string:  # Check if the script tag contains text
        # Look for a common JSON pattern
        match = re.search(r'var myData\s*=\s*(\{.*?\});', script.string, re.DOTALL)
        if match:
            json_str = match.group(1)
            data = json.loads(json_str)
            print("Found embedded JSON data!")
            print(data)
            break

This code searches for a JavaScript variable assignment. It extracts the JSON part and parses it. This is very effective.

For more advanced techniques on handling JSON within HTML, see our guide on Scrape JSON Data in HTML with BeautifulSoup.

Strategy 2: Simulate the AJAX Request

If the data comes from a separate API call, you can mimic it. Inspect the Network tab to see the request details.

Note the Request URL, method (GET/POST), and headers. Often, you need to copy the exact request to get a valid response.

You might need to include specific headers like X-Requested-With. The API might require certain parameters or authentication.

Example: Direct API Request

Let's say you find an API endpoint that returns product data. You can call it directly with the requests library.


import requests
import json

# The API endpoint found in Network tab
api_url = 'https://example.com/api/products'
# Headers often needed for AJAX requests
headers = {
    'X-Requested-With': 'XMLHttpRequest',
    'User-Agent': 'Mozilla/5.0'
}
# Possible query parameters
params = {
    'page': 1,
    'limit': 50
}

response = requests.get(api_url, headers=headers, params=params)
if response.status_code == 200:
    products = response.json()  # Parse JSON response
    for product in products:
        print(f"Product: {product.get('name')}, Price: {product.get('price')}")
else:
    print(f"Request failed with status {response.status_code}")


Product: Laptop, Price: 999.99
Product: Mouse, Price: 24.99
Product: Keyboard, Price: 79.99

This method is fast and gets clean data. It avoids parsing HTML altogether. It is the preferred method for scraping AJAX content.

When making many requests, be respectful. To learn how to avoid being blocked, read Avoid Getting Blocked While Scraping BeautifulSoup.

Strategy 3: Use a Headless Browser (Last Resort)

Sometimes, data is too complex to extract. It might be generated by convoluted JavaScript. In these rare cases, use a headless browser.

Tools like Selenium or Playwright can control a real browser. They execute JavaScript and render the full page. Then you can parse the final HTML with BeautifulSoup.

This method is slow and resource-heavy. Use it only when the first two strategies fail. It should be your last option.

Example: Selenium with BeautifulSoup


from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
import time

# Setup Selenium driver (ensure you have the correct driver installed)
driver = webdriver.Chrome()  # Or use Firefox, etc.
driver.get("https://example.com/infinite-scroll-page")

# Wait for dynamic content to load
try:
    # Wait for a specific element that appears after AJAX load
    element = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.CLASS_NAME, "product-item"))
    )
    # Optional: Scroll to trigger more AJAX loads
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(2)  # Let new content load
finally:
    # Get the final page source
    page_source = driver.page_source
    driver.quit()

# Now parse with BeautifulSoup
soup = BeautifulSoup(page_source, 'html.parser')
products = soup.find_all('div', class_='product-item')
for product in products:
    name = product.find('h3').text
    print(name)

This script uses Selenium to load the page and wait for content. It then passes the HTML to BeautifulSoup for parsing. It is powerful but inefficient for large-scale tasks.

For large projects, consider the best practices outlined in BeautifulSoup Large-Scale Scraping Best Practices.

Conclusion

Scraping AJAX content requires a shift in thinking. Do not fight the missing HTML. Look for the source of the data.

The best approach is to find and call the underlying API directly. It is fast, reliable, and gets clean data.

If the data is embedded in a script tag, use regex to extract the JSON. Use a headless browser only as a final, last-resort solution.

Always check the website's robots.txt and terms of service. Scrape responsibly and ethically. Use delays between requests to avoid overloading servers.

With these techniques, you can scrape almost any dynamic website. Combine BeautifulSoup with smart network analysis for success.