Handling Heavy Workloads in Async Python (FastAPI + WebSockets)


Handling Heavy Workloads in Async Python (FastAPI + WebSockets)

(disclaimer: This article has been )

When building WebSocket handlers in FastAPI (or any async framework), one common issue is mixing blocking work with async I/O. If blocking code runs inside your async handler, the event loop stalls, causing dropped connections (e.g. WebSocket error 1006).

The Problem

Async event loops need to stay responsive to service sockets and timers. If you run blocking or CPU-heavy code directly in your handler, the loop can’t do its job — leading to timeouts or disconnects.

The Naïve Fix: Make Everything Async

A common first reaction is to “sprinkle” async everywhere — converting synchronous methods into async, and chaining await calls throughout the code.

This usually makes things worse:

  • Sync code becomes needlessly cluttered.

  • Some methods are async, some are sync, forcing awkward boundaries.

  • The overall design turns into a hodge-podge of mismatched styles.

A better approach is to keep your core logic synchronous and clean, and only offload where necessary.

Why Threads Alone Don’t Solve It

It’s tempting to think “I’ll just launch a thread for heavy work.” However, that only works for I/O-bound tasks.

For CPU-bound Python code, threads won’t help — the Global Interpreter Lock (GIL) ensures only one thread can execute Python bytecode at a time. If a worker thread is busy crunching numbers, the main thread can’t run the async loop, and your WebSocket will still stall.

This is why CPU-heavy work needs a process pool, not just threads.

Options for Running Work

1. I/O-bound work (API calls, disk, DB, network)

  • Use asyncio.to_thread() to offload to the default thread pool.

  • This is syntactic sugar over loop.run_in_executor(None, ...), which uses the default ThreadPoolExecutor.

  • If you want explicit control (e.g. pool size, custom executor), you can use ThreadPoolExecutor directly.

# simple way
result = await asyncio.to_thread(blocking_io_func, data)

# explicit control
from concurrent.futures import ThreadPoolExecutor
loop = asyncio.get_running_loop()
with ThreadPoolExecutor(max_workers=5) as pool:
    result = await loop.run_in_executor(pool, blocking_io_func, data)

2. CPU-bound work in pure Python (loops, math in Python code)

  • Use ProcessPoolExecutor to sidestep the GIL.

  • Threads won’t help here; the GIL would block the event loop.

loop = asyncio.get_running_loop()
result = await loop.run_in_executor(process_pool, cpu_bound_func, data)

3. CPU-bound work in C extensions (NumPy, Pandas, TensorFlow, cryptography, etc.)

  • Call them directly.

  • Well-written C libraries release the GIL while crunching, so they won’t block the event loop.

result = numpy_heavy_operation(arr)

Caveat: not all C extensions release the GIL. The “good citizens” (NumPy, Pandas, OpenCV, SciPy, TensorFlow, PyTorch, lxml, cryptography, etc.) do. But smaller libraries may not — in that case, you’re effectively back in case (2), and a process pool is safer.

(In CPython’s C API, this is done by wrapping work with Py_BEGIN_ALLOW_THREADS / Py_END_ALLOW_THREADS — if a library does that, you’re safe.)

Rule of Thumb

  • I/O-bound → asyncio.to_thread (or ThreadPoolExecutor if you need control)

  • CPU-bound in Python → ProcessPoolExecutor

  • CPU-bound in C libraries → direct call (but check if the library releases the GIL)

Takeaway

Don’t turn your entire codebase async just to keep WebSockets alive — that leads to clutter and confusion. Instead, keep your business logic synchronous and clean, and only offload to threads or processes when necessary.

This way, your async handlers stay thin, your WebSockets stay alive, and your overall design remains maintainable.

Example: I/O-bound Work in WebSockets

Here’s a minimal FastAPI WebSocket handler that runs a blocking API call without stalling the event loop:

from fastapi import FastAPI, WebSocket
import asyncio
import requests  # blocking HTTP client

app = FastAPI()

def blocking_api_call(url: str) -> str:
    # Simulate a slow external API
    response = requests.get(url, timeout=10)
    return response.text[:100]  # return first 100 chars

@app.websocket("/ws")
async def websocket_endpoint(ws: WebSocket):
    await ws.accept()
    while True:
        data = await ws.receive_text()
        # offload blocking work to a thread
        result = await asyncio.to_thread(blocking_api_call, data)
        await ws.send_text(f"Got: {result}")
  • The blocking requests.get() runs in a worker thread via asyncio.to_thread.

  • The async event loop remains free to handle WebSocket pings/pongs and other clients.

  • From the client’s perspective, the connection stays alive and responsive.

Example: CPU-bound Work in WebSockets

For heavy number crunching, threads won’t help — you need a process pool:

from fastapi import FastAPI, WebSocket
import asyncio
from concurrent.futures import ProcessPoolExecutor

app = FastAPI()
process_pool = ProcessPoolExecutor()

def cpu_heavy_task(n: int) -> int:
    # Simulate heavy number crunching
    total = 0
    for i in range(10**7):
        total += (i * n) % 97
    return total

@app.websocket("/ws-cpu")
async def websocket_endpoint(ws: WebSocket):
    await ws.accept()
    while True:
        data = await ws.receive_text()
        n = int(data)

        # Offload CPU work to a separate process
        loop = asyncio.get_running_loop()
        result = await loop.run_in_executor(process_pool, cpu_heavy_task, n)

        await ws.send_text(f"Result: {result}")
  • cpu_heavy_task would block the event loop if run inline.

  • A thread wouldn’t help — the GIL would still block the loop.

  • By running it in a process, the work happens outside the interpreter, keeping the async loop responsive.


Disclaimer

This article was written with the assistance of ChatGPT. The problem described and the order in which it was explored are based on my real development experience. My main contribution was in defining the sequence, scope, and relevance of the content.

In my work, I think of this as Collaborative AI Use: treating AI not as a replacement for my thinking, but as a partner that helps structure ideas, explore options, and summarize effectively. The responsibility for correctness and clarity remains with me, while AI serves as a tool to enhance and accelerate the work.

I chose to share this article because the summary and takeaway provide a useful reference for others who may face similar issues with FastAPI, WebSockets, and async workloads.