Show HN: Baml_vcr -Record your LLM calls and play them back during tests

4 months ago 4

A recording and playback system for BAML function calls, inspired by the VCR pattern in testing. BAML VCR allows you to capture LLM interactions during test runs and replay them without making actual API calls, making your tests faster, more reliable, and cost-effective.

Source: https://github.com/BoundaryML/baml/tree/canary

  • Record & Replay: Capture BAML function calls and their responses, then replay them in subsequent test runs
  • Streaming Support: Full support for streaming BAML functions with chunk-by-chunk recording
  • Multiple Recording Modes: Choose between "once", "new_episodes", "none", or "all" recording strategies
  • Async Support: Works with both synchronous and asynchronous BAML functions
  • YAML Storage: Human-readable cassette files stored in YAML format
  • Automatic Test Discovery: Cassettes are automatically named based on test class and method names
  • Type Preservation: Preserves complex BAML response types during serialization

BAML VCR is not yet available on PyPI. To install, clone the repository and install from source:

git clone https://github.com/gr-b/baml_vcr.git cd baml_vcr pip install -e .

Or install directly from GitHub:

pip install git+https://github.com/gr-b/baml_vcr.git
from baml_vcr import baml_vcr import baml_client from baml_client import b class TestMyBAMLFunctions: @baml_vcr.use_cassette() def test_simple_function(self): # First run: makes real LLM call and saves to cassette result = b.MyBAMLFunction(arg1="value1", arg2="value2") assert result.success # Subsequent runs: loads from cassette without LLM call
  • Records interactions if cassette doesn't exist
  • Replays from cassette if it exists
  • Perfect for standard test scenarios
  • Replays existing interactions
  • Records any new, unmatched calls
  • Useful when adding new test cases
  • Only replays, never records
  • Raises error if interaction not found
  • Use in CI/CD pipelines
  • Always records, overwrites existing cassette
  • Useful for refreshing test data
@baml_vcr.use_cassette(cassette_name="custom_test_name") def test_with_custom_name(self): result = b.MyFunction(input="test")

Different Recording Modes

@baml_vcr.use_cassette(record_mode="new_episodes") def test_incremental_recording(self): # Existing calls are replayed result1 = b.Function1(input="test") # New calls are recorded result2 = b.Function2(input="new test")
@baml_vcr.use_cassette() async def test_streaming_function(self): stream = b.stream.StreamingFunction(prompt="Generate a story") # First run: records each chunk async for chunk in stream: print(chunk.message) final = await stream.get_final_response() # Subsequent runs: replays chunks with realistic timing

Cassettes are stored in baml_cassettes/ directory next to your test files:

tests/ ├── test_my_functions.py └── baml_cassettes/ ├── TestClass_test_method.cassette.yaml └── TestClass_test_streaming.streaming.cassette.yaml
version: '1.0' interactions: - function_name: ExtractUserInfo args: text: "John Doe, 30 years old, [email protected]" response: _type: UserInfo _module: baml_client.types name: John Doe age: 30 email: [email protected] response_type: baml_client.types.UserInfo usage: input_tokens: 15 output_tokens: 12 is_streaming: false created_at: '2024-01-15T10:30:00.123456'
  1. Interception: BAML VCR patches BAML client functions at runtime
  2. Recording: When recording is enabled, it uses BAML's Collector to capture function calls and responses
  3. Storage: Interactions are serialized to YAML, preserving type information
  4. Playback: On replay, responses are returned from the cassette without making API calls
  5. Streaming: For streaming functions, individual chunks are recorded and replayed with realistic timing
  1. Commit Cassettes: Include cassette files in version control for consistent test behavior
  2. Refresh Periodically: Use record_mode="all" occasionally to update test data
  3. Separate Test Data: Use different cassettes for different test scenarios
  4. Review Changes: Check cassette diffs when updating to ensure expected behavior
  5. CI/CD: Use record_mode="none" in CI to ensure deterministic tests

If you see "No recorded response found", either:

  • Delete the cassette to re-record
  • Change record_mode to "once" or "all"
  • Check that the function arguments match exactly

Streaming cassettes are saved with .streaming.cassette.yaml extension. Ensure you're not mixing streaming and non-streaming calls in the same test.

BAML VCR preserves type information during serialization. If you encounter type errors, check that your BAML client version matches between recording and playback.

Contributions are welcome! Please feel free to submit a Pull Request. If it does not break existing functionality, I will merge it.

MIT License - see LICENSE file for details

Read Entire Article