Summary
I propose extending the LLMs.txt standard to include inline HTML data attributes that provide AI-friendly structured data directly within web page elements. This would complement the existing /llms.txt file approach by solving context preservation issues, particularly for complex content like comparison tables, pricing information, and structured data.
Problem Statement
Current LLMs.txt helps AI systems locate important content, but doesn't address the fundamental challenge of semantic disambiguation within that content. Specifically:
Table and Structured Content Issues
- Lost context: When RAG systems scrape comparison tables, they often confuse "our pricing" with "competitor pricing"
- Relationship fragmentation: Table headers become disconnected from their data during embedding
- Ambiguous ownership: Content like "$50/month" loses meaning without knowing which company/product it refers to
Real-World Example
On pages like comparison tables (e.g., "Formester vs Fillout"), current AI systems might incorrectly extract:
- ❌ "Formester costs $20/month" (actually Fillout's price)
- ❌ "Our basic plan includes 20MB uploads" (actually competitor's feature)
Proposed Solution: data-llm Attributes
Add standardized data-llm attributes to HTML elements containing structured JSON that provides AI-friendly context and semantics.
Basic Syntax
Example Implementations
Pricing Comparison Tables
Product Information
Contact Information
Benefits
1. Solves Context Preservation
- AI systems can definitively distinguish "our" vs "competitor" information
- Table relationships are explicitly maintained in structured form
- No more pricing confusion in RAG responses
2. Backward Compatible
- Doesn't interfere with existing HTML, CSS, or JavaScript
- Works alongside current LLMs.txt files
- Search engines ignore unknown data attributes
3. Developer Friendly
- Easy to implement during development
- Single source of truth - update once, both human and AI versions stay current
- No separate file management required
4. Scalable
- Works for any type of content, not just tables
- Extensible schema system for different content types
- Can be validated against JSON schemas
Integration with LLMs.txt
This proposal complements rather than replaces LLMs.txt:
- LLMs.txt - Guides AI to important pages and sections
- data-llm attributes - Provides semantic understanding of content within those pages
Updated LLMs.txt Example
Implementation Strategy
Phase 1: Schema Definition
- Define common content types (pricing_comparison, our_product, company_contact, etc.)
- Create JSON schema specifications for validation
- Document best practices and examples
Phase 2: Tooling
- Build parsers for common RAG frameworks
- Create validation tools for developers
- Develop browser extensions for testing
Phase 3: Community Adoption
- Share with RAG system builders
- Integrate
.png)


