cuesheet - case study · Giorgos Moustakas

/ THE CHALLENGE

Testing LLM-calling code is expensive and flaky. Every test run hits a real API: tokens cost money, requests can fail due to rate limits, and model responses drift across versions. Most teams either skip testing the LLM layer entirely or pay for every CI run.

/ WHAT I BUILT

A Python library that sits at the httpx transport layer and intercepts all HTTP traffic to LLM providers. The first time a test runs, it records the full request and response to a YAML file you commit to your repo. Every run after that replays from the file with no network calls. Because it works at the transport layer, any SDK built on httpx is covered without per-provider code: Anthropic, OpenAI, Gemini, Mistral, Cohere, Groq, and more. Streaming responses are recorded as raw SSE chunks and played back in order. The pytest plugin auto-discovers cassette files and wires them to test functions by convention. A local web UI lists every cassette, updates live as tests run, and shows the full request and response side by side.

/ OUTCOME

Tests that previously burned API tokens run in under 100ms with no network calls. CI pipelines no longer fail due to rate limits or model response drift. Cassette files are committed to git as readable YAML, so pull request reviewers can see exactly what changed in an LLM interaction. The library is open source, MIT-licensed, and available on PyPI.

/ KEY DECISIONS

httpx transport layer
Intercepting at the transport layer means the library works with every httpx-based SDK automatically. A new provider that ships an httpx client costs zero additional code to support.
YAML cassette format
YAML is more readable than JSON in git diffs. Since cassettes get committed and reviewed in pull requests, readability beats compactness.
MIT license
The goal was adoption in CI pipelines and commercial repos. A restrictive license would block the most common use case.

cuesheet.

Want something like this?