
cuesheet.
cuesheet is a Python library for testing LLM-calling code without burning the API. You wrap your test in `@cuesheet.cassette(...)`. The first run hits the real provider and saves the request and response to a YAML file you commit to your repo. Every run after that replays from the file, byte for byte. No network calls, no rate-limit flakes, no per-PR token cost. It works with any SDK that sits on top of httpx, so the same library covers Anthropic, OpenAI, Gemini, Mistral, Cohere, Groq, DeepSeek, Together, and anything else built on the standard transport. The pytest plugin auto-discovers cassettes in tests/cassettes/. Streaming responses are recorded as raw SSE chunks and replayed in order. API keys, JWTs, and emails are scrubbed before write. Cassettes are diff-friendly YAML so code review actually shows what changed. The library ships with a local web UI for browsing every recorded cassette. The dashboard watches the filesystem and pushes change events over SSE, so the page updates live as your tests run in another terminal. Open source, MIT, on GitHub.
pip install cuesheetTesting LLM-calling code is expensive and flaky. Every test run hits a real API: tokens cost money, requests can fail due to rate limits, and model responses drift across versions. Most teams either skip testing the LLM layer entirely or pay for every CI run.
A Python library that sits at the httpx transport layer and intercepts all HTTP traffic to LLM providers. The first time a test runs, it records the full request and response to a YAML file you commit to your repo. Every run after that replays from the file with no network calls. Because it works at the transport layer, any SDK built on httpx is covered without per-provider code: Anthropic, OpenAI, Gemini, Mistral, Cohere, Groq, and more. Streaming responses are recorded as raw SSE chunks and played back in order. The pytest plugin auto-discovers cassette files and wires them to test functions by convention. A local web UI lists every cassette, updates live as tests run, and shows the full request and response side by side.
Tests that previously burned API tokens run in under 100ms with no network calls. CI pipelines no longer fail due to rate limits or model response drift. Cassette files are committed to git as readable YAML, so pull request reviewers can see exactly what changed in an LLM interaction. The library is open source, MIT-licensed, and available on PyPI.
- httpx transport layer
Intercepting at the transport layer means the library works with every httpx-based SDK automatically. A new provider that ships an httpx client costs zero additional code to support.
- YAML cassette format
YAML is more readable than JSON in git diffs. Since cassettes get committed and reviewed in pull requests, readability beats compactness.
- MIT license
The goal was adoption in CI pipelines and commercial repos. A restrictive license would block the most common use case.


Want something like this?
I'm available for custom backend, web, and the weird integration work nobody else wants to touch.