A composable, pipeline-based testing framework for AI systems and APIs.
Build complex test scenarios using simple, reusable blocks with semantic validation.
npm install @blade47/semantic-test
Testing AI systems is hard. Responses are non-deterministic, you need to validate tool usage, and semantic meaning matters more than exact text matching.
SemanticTest solves this with:
Composable blocks for HTTP, parsing, validation, and AI evaluation
Pipeline architecture where data flows through named slots
LLM Judge to evaluate responses semantically using GPT-4
JSON test definitions that are readable and version-controllable
npm install @blade47/semantic-test
{
"name" : " API Test" ,
"version" : " 1.0.0" ,
"context" : {
"BASE_URL" : " https://api.example.com"
},
"tests" : [
{
"id" : " get-user" ,
"name" : " Get User" ,
"pipeline" : [
{
"id" : " request" ,
"block" : " HttpRequest" ,
"input" : {
"url" : " ${BASE_URL}/users/1" ,
"method" : " GET"
},
"output" : " response"
},
{
"id" : " parse" ,
"block" : " JsonParser" ,
"input" : " ${response.body}" ,
"output" : " user"
},
{
"id" : " validate" ,
"block" : " ValidateContent" ,
"input" : {
"from" : " user.parsed.name" ,
"as" : " text"
},
"config" : {
"contains" : " John"
},
"output" : " validation"
}
],
"assertions" : {
"response.status" : 200 ,
"user.parsed.id" : 1 ,
"validation.passed" : true
}
}
]
}
Tests are pipelines of blocks that execute in sequence:
HttpRequest → JsonParser → Validate → Assert
Each block:
Reads inputs from named slots
Does one thing well
Writes outputs to named slots
Data flows through a DataBus with named slots:
{
"pipeline" : [
{
"id" : " fetch" ,
"block" : " HttpRequest" ,
"output" : " response" // Writes to 'response' slot
},
{
"id" : " parse" ,
"block" : " JsonParser" ,
"input" : " ${response.body}" , // Reads from 'response.body'
"output" : " data" // Writes to 'data' slot
}
]
}
1. String - becomes { body: value }
"input" : " ${response.body}"
2. From/As - maps slot to parameter
"input" : {
"from" : " response.body" ,
"as" : " text"
}
3. Object - deep resolves all values
"input" : {
"url" : " ${BASE_URL}/api" ,
"method" : " POST" ,
"headers" : {
"Authorization" : " Bearer ${token}"
}
}
1. String - stores entire output
2. Object - maps output fields to slots
"output" : {
"parsed" : " data" ,
"error" : " parseError"
}
3. Default - uses block ID
{
"id" : " parse"
// Output stored in 'parse' slot
}
HttpRequest - Make HTTP requests
{
"block" : " HttpRequest" ,
"input" : {
"url" : " https://api.example.com/users" ,
"method" : " POST" ,
"headers" : {
"Authorization" : " Bearer ${token}"
},
"body" : {
"name" : " John Doe"
},
"timeout" : 5000
}
}
JsonParser - Parse JSON
{
"block" : " JsonParser" ,
"input" : " ${response.body}"
}
StreamParser - Parse streaming responses
{
"block" : " StreamParser" ,
"input" : " ${response.body}" ,
"config" : {
"format" : " sse-vercel" // or "sse-openai", "sse"
}
}
Outputs: text, toolCalls, chunks, metadata
ValidateContent - Validate text
{
"block" : " ValidateContent" ,
"input" : {
"from" : " data.message" ,
"as" : " text"
},
"config" : {
"contains" : [" success" , " confirmed" ],
"notContains" : [" error" , " failed" ],
"minLength" : 10 ,
"maxLength" : 1000 ,
"matches" : " ^[A-Z].*"
}
}
ValidateTools - Validate AI tool usage
{
"block" :
" ValidateTools" ,
"input" : {
"from" :
" parsed.toolCalls" ,
"as" :
" toolCalls"
},
"config" : {
"expected" : [
" search_database" ,
" send_email" ],
"forbidden" : [
" delete_all" ],
"order" : [
" search_database" ,
" send_email" ],
"minTools" :
1 ,
"maxTools" :
5 ,
"validateArgs" : {
"send_email" : {
"to" :
" [email protected] "
}
}
}
}
LLMJudge - Semantic evaluation with GPT-4
{
"block" : " LLMJudge" ,
"input" : {
"text" : " ${response.text}" ,
"toolCalls" : " ${response.toolCalls}" ,
"expected" : {
"expectedBehavior" : " Should greet the user and offer to help with their calendar"
}
},
"config" : {
"model" : " gpt-4o-mini" ,
"criteria" : {
"accuracy" : 0.4 ,
"completeness" : 0.3 ,
"relevance" : 0.3
}
}
}
Returns: score (0-1), reasoning, shouldContinue, nextPrompt
Loop - Loop back to previous blocks
{
"block" : " Loop" ,
"config" : {
"target" : " retry-request" ,
"maxIterations" : 3
}
}
Organize multiple tests with shared setup/teardown:
{
"name" : " User API Tests" ,
"version" : " 1.0.0" ,
"context" : {
"BASE_URL" : " ${env.API_URL}" ,
"API_KEY" : " ${env.API_KEY}"
},
"setup" : [
{
"id" : " auth" ,
"block" : " HttpRequest" ,
"input" : {
"url" : " ${BASE_URL}/auth/login" ,
"method" : " POST" ,
"body" : {
"username" : " test" ,
"password" : " test123"
}
},
"output" : " auth"
}
],
"tests" : [
{
"id" : " create-user" ,
"name" : " Create User" ,
"pipeline" : [
{
"id" : " request" ,
"block" : " HttpRequest" ,
"input" : {
"url" : " ${BASE_URL}/users" ,
"method" : " POST" ,
"headers" : {
"Authorization" : " Bearer ${auth.body.token}"
},
"body" : {
"name" : " Jane Doe"
}
},
"output" : " createResponse"
}
],
"assertions" : {
"createResponse.status" : 201
}
},
{
"id" : " get-user" ,
"name" : " Get User" ,
"pipeline" : [
{
"id" : " request" ,
"block" : " HttpRequest" ,
"input" : {
"url" : " ${BASE_URL}/users/${createResponse.body.id}" ,
"method" : " GET" ,
"headers" : {
"Authorization" : " Bearer ${auth.body.token}"
}
},
"output" : " getResponse"
}
],
"assertions" : {
"getResponse.status" : 200 ,
"getResponse.body.name" : " Jane Doe"
}
}
],
"teardown" : [
{
"id" : " cleanup" ,
"block" : " HttpRequest" ,
"input" : {
"url" : " ${BASE_URL}/users/${createResponse.body.id}" ,
"method" : " DELETE" ,
"headers" : {
"Authorization" : " Bearer ${auth.body.token}"
}
}
}
]
}
Validate final results with operators:
{
"assertions" : {
"response.status" : 200 , // Equality
"data.count" : { "gt" : 10 }, // Greater than
"data.count" : { "lt" : 100 }, // Less than
"data.message" : { "contains" : " success" }, // Contains
"data.email" : { "matches" : " .*@.*\\ .com" } // Regex
}
}
Use .env file:
API_URL=https://api.example.com
API_KEY=secret123
OPENAI_API_KEY=sk-...
Reference in tests:
{
"context" : {
"BASE_URL" : " ${env.API_URL}" ,
"API_KEY" : " ${env.API_KEY}"
}
}
{
"name" : " AI Chat Tests" ,
"context" : {
"CHAT_URL" : " ${env.CHAT_API_URL}" ,
"API_KEY" : " ${env.API_KEY}"
},
"tests" : [
{
"id" : " chat-test" ,
"name" : " Chat with Tool Usage" ,
"pipeline" : [
{
"id" : " chat" ,
"block" : " HttpRequest" ,
"input" : {
"url" : " ${CHAT_URL}" ,
"method" : " POST" ,
"headers" : {
"Authorization" : " Bearer ${API_KEY}"
},
"body" : {
"messages" : [
{
"role" : " user" ,
"content" : " Search for users named John"
}
]
}
},
"output" : " chatResponse"
},
{
"id" : " parse" ,
"block" : " StreamParser" ,
"input" : " ${chatResponse.body}" ,
"config" : {
"format" : " sse-vercel"
},
"output" : " parsed"
},
{
"id" : " validate-tools" ,
"block" : " ValidateTools" ,
"input" : {
"from" : " parsed.toolCalls" ,
"as" : " toolCalls"
},
"config" : {
"expected" : [" search_users" ]
},
"output" : " toolValidation"
},
{
"id" : " judge" ,
"block" : " LLMJudge" ,
"input" : {
"text" : " ${parsed.text}" ,
"toolCalls" : " ${parsed.toolCalls}" ,
"expected" : {
"expectedBehavior" : " Should use search_users tool and confirm searching for John"
}
},
"config" : {
"model" : " gpt-4o-mini"
},
"output" : " judgement"
}
],
"assertions" : {
"chatResponse.status" : 200 ,
"toolValidation.passed" : true ,
"judgement.score" : { "gt" : 0.7 }
}
}
]
}
AI outputs vary. Exact text matching fails. Instead, use another LLM to evaluate semantic meaning :
"2:00 PM", "2 PM", "14:00" are all acceptable
Focuses on intent and helpfulness
Provides reasoning for failures
Configurable scoring criteria
// blocks/custom/MyBlock.js
import { Block } from '@blade47/semantic-test' ;
export class MyBlock extends Block {
static get inputs ( ) {
return {
required : [ 'data' ] ,
optional : [ 'config' ]
} ;
}
static get outputs ( ) {
return {
produces : [ 'result' , 'metadata' ]
} ;
}
async process ( inputs , context ) {
const { data, config } = inputs ;
// Your logic
const result = await processData ( data , config ) ;
return {
result,
metadata : { timestamp : Date . now ( ) }
} ;
}
}
import { blockRegistry } from '@blade47/semantic-test' ;
import { MyBlock } from './blocks/custom/MyBlock.js' ;
blockRegistry . register ( 'MyBlock' , MyBlock ) ;
{
"block" : " MyBlock" ,
"input" : {
"data" : " ${previous.output}" ,
"config" : { "mode" : " fast" }
},
"output" : " myResult"
}
See blocks/examples/ for complete examples.
# Run single test
npx semtest test.json
# Run multiple tests
npx semtest tests/* .json
# Generate HTML report
npx semtest test.json --html
# Custom output file
npx semtest test.json --html --output report.html
# Debug mode
LOG_LEVEL=DEBUG npx semtest test.json
import { PipelineBuilder } from '@blade47/semantic-test' ;
import fs from 'fs/promises' ;
const testDef = JSON . parse ( await fs . readFile ( 'test.json' , 'utf-8' ) ) ;
const pipeline = PipelineBuilder . fromJSON ( testDef ) ;
const result = await pipeline . execute ( ) ;
if ( result . success ) {
console . log ( 'Test passed!' ) ;
} else {
console . error ( 'Test failed:' , result . error ) ;
}
See examples/ directory:
simple-api-test.json - Basic REST API testing
test-llm-judge.json - AI response evaluation
test-error-reporting.json - Error handling
test-reporting.json - Rich output formatting
{
"block" : " LLMJudge" ,
"input" : {
"text" : " ${response.text}" ,
"history" : [
{ "role" : " user" , "content" : " Hello" },
{ "role" : " assistant" , "content" : " Hi there!" },
{ "role" : " user" , "content" : " What's the weather?" }
]
},
"config" : {
"continueConversation" : true ,
"maxTurns" : 5
}
}
import { StreamParser } from '@blade47/semantic-test' ;
function myCustomParser ( body ) {
// Parse your custom format
return {
text : extractedText ,
toolCalls : extractedTools ,
chunks : allChunks ,
metadata : { format : 'custom' }
} ;
}
StreamParser . register ( 'my-format' , myCustomParser ) ;
Use it:
{
"block" : " StreamParser" ,
"config" : {
"format" : " my-format"
}
}
{
"pipeline" : [
{
"id" : " attempt" ,
"block" : " HttpRequest" ,
"input" : { "url" : " ${API_URL}" }
},
{
"id" : " check" ,
"block" : " ValidateContent" ,
"input" : { "from" : " attempt.body" , "as" : " text" },
"config" : { "contains" : " success" }
},
{
"id" : " retry" ,
"block" : " Loop" ,
"config" : {
"target" : " attempt" ,
"maxIterations" : 3
}
}
]
}
1. Use Meaningful Slot Names
// Good
"output" : " userProfile"
"output" : " authToken"
// Bad
"output" : " data"
"output" : " result"
{
"pipeline" : [
{ "block" : " HttpRequest" , "output" : " response" },
{ "block" : " JsonParser" , "output" : " data" },
{ "block" : " ValidateContent" }, // Validate before expensive operations
{ "block" : " LLMJudge" } // Expensive: calls GPT-4
]
}
Always clean up test data:
{
"setup" : [
{ "id" : " create-test-data" , "block" : " ..." }
],
"tests" : [ /* ... */ ],
"teardown" : [
{ "id" : " delete-test-data" , "block" : " ..." }
]
}
4. Semantic Validation for AI
Don't match exact text:
// Bad - too brittle
{
"assertions" : {
"response.text" : " The meeting is scheduled for 2:00 PM"
}
}
// Good - semantic validation
{
"block" : " LLMJudge" ,
"input" : {
"expected" : {
"expectedBehavior" : " Should confirm meeting is scheduled for 2 PM"
}
}
}
git clone https://github.com/blade47/semantic-test.git
cd semantic-test
npm install
npm test
Create block in blocks/[category]/YourBlock.js
Add tests in tests/unit/blocks/YourBlock.test.js
Register in src/core/BlockRegistry.js
Document in README
npm test # All tests
npm run test:unit # Unit tests only
npm run test:integration # Integration tests
npm run test:watch # Watch mode
MIT
Built for testing AI systems that don't play by traditional rules.