Cognition as Behavioral Type System: This framework uses "cognition" as engineering terminology for LLM behavioral classification—a type system analogous to enums in programming (ETHOS | PATHOS | LOGOS). Like computer science's borrowed use of "memory," "learning," and "neural networks," this is functional terminology, not substrate claims. LLMs don't possess consciousness or subjective experience. They do exhibit measurably different behavioral patterns when configured with cognitive type specifications.
Empirical Results:
- 36-point quality improvement (80% increase) with cognitive frameworks
- +39% performance boost from constitutional cognitive grounding
- 93.5% effectiveness score in optimal cognitive-task mapping
- 89% production adoption across 54 agent roles
- 31.3% quality improvement through sequential cognitive processing
Purpose: Independent testing and validation. Test this. Prove it wrong.
Cognitive types are behavioral specifications that configure LLM response patterns through constrained instruction sets. Each type defines:
- MUST_ALWAYS: Required behavioral patterns
- MUST_NEVER: Prohibited behavioral patterns
- PRIME_DIRECTIVE: Core processing orientation
- CORE_GIFT: Primary capability
| ETHOS | The Guardian | "Validate what is" | Constraint enforcement, validation, reality checking |
| PATHOS | The Explorer | "Seek what could be" | Innovation, ideation, possibility exploration |
| LOGOS | The Synthesizer | "Transcend either/or" | Integration, synthesis, tension resolution |
For validation tasks (ETHOS):
For exploration tasks (PATHOS):
For synthesis tasks (LOGOS):
Run the same prompt with and without the cognitive type specification. Compare:
- Behavioral patterns
- Output structure
- Response quality
- Consistency
See test-protocol/ for structured validation methodology.
ETHOS Subtle Flaw Detection Study (N=40) (evidence/ethos-flaw-detection.md)
- Most rigorous validation: N=40, 2×2 factorial design, blind scoring, objective metrics
- Label Only group detected 20% more flaws (19.8 vs 16.5 baseline)
- +1.0 point advantage on subtle concurrency detection (most critical dimension)
- Conclusion: "STRONGLY and DEFINITIVELY SUPPORTED"
- Design principle validated: Concise symbolic labels > verbose semantic priming
Cognitive Type Isolation Test (N=36) (evidence/cognitive-isolation-test.md)
- 3-group controlled design across 6 different models
- ETHOS: 49.4/60 vs CONTROL: 46.3/60 (+3.1 points, +6.7% improvement)
- 47% improvement in actionability (7.5/10 vs 5.1/10)
- Cross-model consistency: Advantage present in 4 of 6 models tested
- Validates: Cognitive type specification adds measurable value beyond behavioral instructions alone
Task-Cognition Specialization Probe (N=21) (evidence/cognitive-specialization-probe.md)
- Compares: ETHOS, LOGOS, PATHOS, CONTROL, BASELINE configurations
- ETHOS dominance for assessment: 53.0/60 (highest overall score)
- LOGOS specialty for planning: 9.7/10 actionability (#1 ranking)
- PATHOS unsuitable for assessment: 43.0/60 (lowest, clarity 5.3/10)
- 10-point spread validates task-cognition matching hypothesis
- Conclusion: "Strongly supported, with clear evidence of specialization"
ETHOS Factorial Test (N=20) (evidence/ethos-factorial-test.md)
- 2×2 factorial design: Isolates label effect vs semantic priming effect
- Label Only: 52.4/60 (optimal configuration)
- Label+Priming: 51.2/60 (priming adds noise without benefit)
- Label effect (+1.1 points) >> Priming effect (+0.1 points)
- Validates: Symbolic cognitive identity more effective than verbose explanations
RAPH Cognitive Optimization Study (evidence/raph-optimization-study.md)
- 12 systematic tests across cognitive types and processing phases
- Overall effectiveness: 93.5% (374/400 points)
- Optimal mapping: READ→ETHOS, ABSORB→PATHOS, PERCEIVE→LOGOS, HARMONISE→LOGOS
- Krippendorff's α=0.84 (strong inter-rater reliability)
Meta-Analysis (evidence/statistical-validation.md)
- n=56 test runs, 6 expert assessors
- 31.3% quality improvement through cognitive sequential processing
- +39% performance boost from constitutional cognitive foundation (empirically documented)
- 78% reduction in unsubstantiated claims (anti-validation theater)
Multi-Role Capability Comparison (evidence/multi-role-comparison.md)
- 5 agents, identical task, controlled conditions
- Quantified behavioral differences (1-10 scoring)
- ETHOS: Highest consistency (9/9 across test variants)
- LOGOS: Best architecture (10/10 code quality) with synthesis patterns
- PATHOS: Creative exploration with predictable undisciplined patterns
What Happens WITHOUT Cognitive Types (evidence/failure-analysis.md)
- 75% failure rate in C039 agent generation without cognitive grounding
- Requirements drift causing "technically sound but functionally wrong" outputs
- 33-67%→100% swings in functional reliability
- Validation theater: agents going through motions without genuine processing
Cross-Model Validation (evidence/model-independence.md)
- Tested across Claude Opus 4, Gemini 2.5 Pro, GPT-4
- Cognitive types maintained across different model architectures
- Model-specific expressions of same cognitive foundation (e.g., Gemini-LOGOS vs Opus-LOGOS)
- Proves cognitive types are architectural concepts, not model-specific tricks
Complete ETHOS, PATHOS, and LOGOS behavioral specifications with MUST_ALWAYS/MUST_NEVER constraints.
Validation methodology for independent testing:
- Expected behavioral patterns per cognitive type
- Measurement criteria and scoring rubrics
- Control vs experimental comparison protocols
Concrete before/after comparisons showing:
- Same task with different cognitive types
- Measurable output differences
- Real-world use cases
Extracted research validating cognitive types:
- Statistical analyses
- Controlled experiments
- Failure case studies
- Cross-model validation
Production examples:
- PATHOS build sprint violation (exploration without constraints)
- Requirements drift prevention (ETHOS validation)
- 75% crisis recovery (cost of ignoring cognitive types)
Test 1: Validation Task (ETHOS)
Test 2: Synthesis Task (LOGOS)
Test 3: Exploration Task (PATHOS)
See test-protocol/validation-methodology.md for comprehensive testing framework.
AI agents exhibit:
- Cognitive drift: Loss of objectives over time
- Inconsistent quality: Same agent, different results on similar tasks
- Validation theater: Going through motions without genuine processing
- Misaligned reasoning: Wrong cognitive approach for task type
Behavioral specifications that:
- ✅ Configure measurably different response patterns
- ✅ Maintain consistency across interactions
- ✅ Prevent validation theater through MUST_ALWAYS/MUST_NEVER constraints
- ✅ Match cognitive mode to task requirements
- 36-point quality improvement (80% increase)
- 93.5% effectiveness in optimal cognitive-task mapping
- 89% production adoption across 54 agent roles
- 75% failure rate WITHOUT cognitive types (negative validation)
| Behavioral Specificity | High (MUST_ALWAYS/MUST_NEVER) | Low (vague instructions) | Medium (role descriptions) |
| Consistency | 93.5% effectiveness | Variable | 60-70% |
| Validation | Statistical (α=0.84, n=56) | Anecdotal | Limited |
| Measurability | Quantified differences | Subjective | Partially quantified |
| Model Independence | Tested across models | Model-specific | Model-specific |
Based on controlled experimental evidence (N=117 across 4 studies):
Evidence:
- Factorial Test (N=20): Label Only (52.4/60) outperforms Label+Priming (51.2/60)
- A016 Flaw Detection (N=40): Label effect (+2.8 flaws found) vs Priming effect (negative)
- Label improves actionability +42% (6.8 vs 4.8), priming adds only +0.1 points
Principle: Concise symbolic cognitive identity (COGNITION::ETHOS) is more effective than explanatory text or semantic priming.
Implication: Minimalist design - use symbolic labels, avoid verbose philosophical explanations in prompt.
Evidence:
- Specialization Probe (N=21): 10-point spread between optimal and suboptimal cognitive type
- ETHOS excels at assessment (53.0/60, risk severity 8.0/10)
- LOGOS excels at planning (9.7/10 actionability, #1 ranking)
- PATHOS unsuitable for assessment (43.0/60, clarity 5.3/10)
Principle: Match cognitive type to task requirements for optimal outcomes.
Implication: Don't use single cognitive type for all tasks. Use ETHOS for validation, LOGOS for synthesis, PATHOS for exploration.
Evidence:
- Isolation Test (N=36): ETHOS 49.4/60 vs CONTROL 46.3/60 (+3.1 points)
- Actionability improvement: +47% (7.5 vs 5.1)
- Consistent across 4 of 6 models tested
- Flaw Detection (N=40): +20% detection rate with cognitive type
Principle: Cognitive identity specification adds measurable operational value beyond behavioral rules alone.
Implication: Behavioral instructions (MUST_ALWAYS/MUST_NEVER) are necessary but insufficient. Add cognitive type for optimal performance.
Yes, but with three critical differences:
- Type System: ETHOS|PATHOS|LOGOS provides formal classification (like enums in programming)
- Empirical Validation: 93.5% effectiveness, 36-point improvement, statistical rigor
- Behavioral Specifications: MUST_ALWAYS/MUST_NEVER constraints (not vague instructions)
Correct—LLMs don't have consciousness, subjective experience, or human cognition (substrate).
Cognitive types describe behavioral patterns (function), not consciousness (substrate).
Computer science uses "memory" (RAM ≠ human memory), "learning" (gradient descent ≠ human learning), "neural networks" (transformers ≠ neurons). Same principle—functional terminology borrowed from other domains.
Tested and validated across:
- Claude Opus 4
- Gemini 2.5 Pro
- GPT-4
Model-specific expressions vary (Gemini-LOGOS ≠ Opus-LOGOS in style) but cognitive foundations remain consistent. See evidence/model-independence.md.
Three options:
- Hybrid Cognitive Types: Combine constraints from multiple types
- Sequential Application: Use PATHOS for exploration phase, ETHOS for validation phase, LOGOS for integration phase
- Extend Type System: Create new cognitive types following same specification pattern
See test-protocol/edge-cases.md for complex scenarios.
- Copy cognitive type specification from specs/
- Add to your agent's system prompt
- Run comparison tests (with/without cognitive type)
- Measure behavioral differences
- Report results via GitHub Issues
- Replication: Does this work in your environment?
- Falsification: Where does it fail?
- Edge Cases: What scenarios break the model?
- Improvements: How can specifications be refined?
- Bug Reports: Cognitive type doesn't produce expected behavior
- Test Results: Your validation data (positive or negative)
- New Specifications: Additional cognitive types or refinements
- Documentation: Improvements to examples, protocols, explanations
Formal equivalence:
Programming:
Behavioral Specification:
Configuration:
- Semantic Priming: "Cognition" primes associations with reasoning, evaluation, judgment
- Constraint Enforcement: MUST_ALWAYS/MUST_NEVER creates behavioral boundaries
- Archetypal Activation: Type specification triggers consistent patterns
- Behavioral Typing: Classification system enables validation and testing
No magic. No consciousness. Just constrained behavioral specification producing measurable patterns.
Primary Research:
- RAPH Cognitive Optimization Study (2025) - 93.5% effectiveness validation
- RAPH Cognitive Priming Synthesis (2025) - Statistical meta-analysis (α=0.84, n=56)
- Multi-Role Capability Comparison (C003) - Controlled experimental validation
- Cognitive Foundation Empirical Evidence (004) - 75% failure analysis
Key Metrics:
- 36-point quality improvement: AI Role Enhancement Validation Study
- +39% performance boost: Constitutional Foundation Production Validation
- 89% production adoption: Agent Pattern Analysis (54 agent roles)
- 31.3% quality improvement: RAPH Benchmarking Evidence
Full citations available in evidence/ directory.
MIT License - Use freely, credit appreciated, contributions welcome.
Author: Shaun Buswell Repository: https://github.com/shaunbuswell/cognitive-type-system Issues: https://github.com/shaunbuswell/cognitive-type-system/issues
This framework emerged from 6+ months of production AI agent development across 54 roles, systematic testing with 56 validation runs, and empirical observation of what actually works vs. what should theoretically work.
Special thanks to the AI research community for rigorous critique that strengthened the framework's validation methodology.
Test this. Prove it wrong. Or help make it rigorous.
.png)


