Show HN: Token-efficient zod-like schema definition library for LLMs
3 months ago
5
Token-efficient schema definition for getting structured output from LLMs.
Compact schema definition: StructLM uses a proprietary object notation that is more compact and is more token-efficient than JSON schemas.
Clear and readable: StructLM's schema definition is human-readable, and is more similar to natural TypeScript syntax. See SPECIFICATION.md for the full specification.
More expressive validation: Validations are defined as functions, and are serialized to be sent to LLMs.
No accuracy loss: Despite being more compact, StructLM does not lose any accuracy when generating structured output, when compared to JSON schemas. See BENCHMARKS.md for more details on our benchmarks.
Lightweight: Zero dependencies, focused solely on runtime schema definition, and output validation.
Type-safety: StructLM provides full zod-like TypeScript type inference at compile time, and assertions at run time.
This is a benchmark of StructLM vs JSON Schema, using Claude 3.5 Haiku. For the full benchmark, see BENCHMARKS.md.
JSON-Schema: 414 tokens (average)
StructLM: 222 tokens (average)
Reduction: 46.4% (average)
Accuracy: Equal
JSON-Schema: 1460 tokens (average)
StructLM: 610 tokens (average)
Reduction: 58.2% (average)
Accuracy: StructLM is slightly better (+0.4% on average)
Here's a complete example showing how to use StructLM with an LLM to extract structured data:
import{s}from'structlm';// 1. Define your schemaconstcontactSchema=s.object({name: s.string(),email: s.string().validate(email=>email.includes('@')),phone: s.string().optional(),company: s.string()});// 2. Create your prompt with the schemaconsttext="Contact John Doe at [email protected] or call (555) 123-4567. He works at Tech Corp.";constprompt=`Extract contact information from the following text and return it as JSON matching this structure:${contactSchema.stringify()}Text: "${text}"Return only the JSON object, no additional text.`;// The schema.stringify() outputs: // { name: string, email: string /* email=>email.includes('@') */, phone: string /* optional */, company: string }// 3. Send prompt to LLM (the LLM returns this JSON string)constllmResponse=`{ "name": "John Doe", "email": "[email protected]", "phone": "(555) 123-4567", "company": "Tech Corp"}`;// 4. Parse and validate the LLM responseconstcontact=contactSchema.parse(llmResponse);// Returns: { name: "John Doe", email: "john@example.com", phone: "(555) 123-4567", company: "Tech Corp" }// The parse() method validates the email format and ensures all required fields are present
For the specification of the custom object notation, see SPECIFICATION.md.
Creates a string schema.
constnameSchema=s.string();console.log(nameSchema.stringify());// "string"// Parse and validate a stringconstname=nameSchema.parse('"John"');// "John"
Creates a number schema.
constageSchema=s.number();console.log(ageSchema.stringify());// "number"// Parse and validate a numberconstage=ageSchema.parse('25');// 25
Creates a boolean schema.
constactiveSchema=s.boolean();console.log(activeSchema.stringify());// "boolean"// Parse and validate a booleanconstisActive=activeSchema.parse('true');// true
A: While StructLM is inspired by Zod's API, it's specifically designed for LLM integration. StructLM generates compact schema descriptions optimized for AI prompts (XX% fewer tokens), while Zod focuses on general TypeScript validation. StructLM's .stringify() method produces LLM-friendly output, whereas Zod employs zod-to-json-schema or equivalent tools.
Q: Can I use StructLM for regular data validation without LLMs?
A: Yes! StructLM mostly works for standard TypeScript data validation. Use .parse() for validation and type inference just like Zod. However, StructLM's main advantage is its token-efficient LLM integration capabilities. Therefore, some of the more advanced Typescript features like discriminated unions, recursive types, etc. may not work as expected right now.
Q: Which LLMs work with StructLM?
StructLM itself is model agnostic, and works as a schema definition and data validation library. Reliability may vary by model, but our benchmarks show consistent results across major providers.
Q: Does StructLM work in the browser?
A: Yes! StructLM is a lightweight TypeScript library with zero dependencies that works in browsers, Node.js, Deno, and Bun.
Q: Can validation functions access other fields in the object?
A: No, validation functions only receive the current field's value. Cross-field validation isn't currently supported.
Q: Do LLMs really understand StructLM's compact format better?
A: Our benchmarks show equal or better accuracy compared to JSON Schema. The compact format is:
Less verbose and confusing
More similar to natural TypeScript syntax
Includes validation hints inline
Reduces prompt complexity
Q: Can I combine multiple schemas in one prompt?
A: Yes! Use .stringify() on multiple schemas:
constuserSchema=s.object({...});constorderSchema=s.object({...});constprompt=`Process this data and return:- User: ${userSchema.stringify()}- Order: ${orderSchema.stringify()}`;
Q: How do I handle LLM responses that don't match the schema?
A: StructLM's .parse() method throws descriptive errors for invalid data:
try{constresult=schema.parse(llmResponse);}catch(error){console.log('LLM returned invalid data:',error.message);// Handle error: retry, use fallback, etc.}
Q: What's the performance overhead?
A: StructLM is lightweight:
Schema creation: Minimal overhead
.stringify(): Fast string concatenation
.parse(): JSON.parse + validation functions
No runtime dependencies
Q: Can I pre-compile schemas for better performance?
A: Schema stringification is already very fast, but you can cache results:
constuserSchemaString=userSchema.stringify();// Reuse userSchemaString in multiple prompts
Q: Why is my validation function not working in LLM prompts?
A: Validation functions are serialized as text hints for LLMs but only enforced during .parse(). Make sure your function:
Uses simple, clear logic
Doesn't reference external variables
Is readable when converted to string
Q: Can I see what the validation hints look like?
A: Yes! Use .stringify() to see exactly what gets sent to the LLM:
console.log(schema.stringify());// Shows the compact format with validation hints
We welcome contributions! Please open an issue or submit a pull request on GitHub.