Catching Prompt Regressions Before They Ship: Semantic Diffing for LLM Workflows

4 months ago 4

Before stepping through the commands, the demonstration will rely on LLM Prompt Semantic Diff, an open-source CLI that packages prompts and computes embedding-based similarity scores. Clone the repository once (https://github.com/aatakansalar/llm-prompt-semantic-diff) and the following example will run exactly as shown on any Python 3.10+ environment.

Install the open-source CLI once:

pip install llm-prompt-semantic-diff

Step 1 — scaffold

prompt init refund-policy

Step 2 — package with embeddings

prompt pack refund-policy.prompt # creates refund-policy.pp.json

Step 3 — iterate and guard

cp refund-policy.prompt refund-policy-v2.prompt
# …edit text…
prompt pack refund-policy-v2.prompt

prompt diff refund-policy.pp.json refund-policy-v2.pp.json --threshold 0.80

Typical console output when meaning drifts:

Semantic similarity: 76.5 %
Threshold: 80.0 %
Exit code: 1

Any CI job that treats non-zero exit codes as failures will block the merge automatically.

Read Entire Article