Show HN: Term – Data validation that runs anywhere, no infrastructure needed
1 month ago
2
Every data pipeline is a ticking time bomb. Null values crash production. Duplicate IDs corrupt databases. Format changes break downstream systems. Yet most teams discover these issues only after the damage is done.
Traditional data validation tools assume you have a data team, a Spark cluster, and weeks to implement. Term takes a different approach:
🚀 5-minute setup - From install to first validation. No clusters, no configs, no complexity
⚡ 100MB/s single-core performance - Validate millions of rows in seconds, not hours
🛡️ Fail fast, fail safe - Catch data issues before they hit production
📊 See everything - Built-in OpenTelemetry means you're never debugging blind
🔧 Zero infrastructure - Single binary runs on your laptop, in CI/CD, or in the cloud
Term is data validation for the 99% of engineering teams who just want their data to work.
# Add to your Cargo.toml
cargo add term-guard tokio --features tokio/full
use term_guard::prelude::*;#[tokio::main]asyncfnmain() -> Result<()>{// Load your datalet ctx = SessionContext::new();
ctx.register_csv("users","users.csv",CsvReadOptions::new()).await?;// Define what good data looks likelet checks = ValidationSuite::builder("User Data Quality").check(Check::builder("No broken data").is_complete("user_id")// No missing IDs.is_unique("email")// No duplicate emails.has_pattern("email",r"@",1.0)// All emails have @.build()).build();// Validate and get instant feedbacklet report = checks.run(&ctx).await?;println!("{}", report);// ✅ All 3 checks passed!Ok(())}
That's it! No clusters to manage, no JVMs to tune, no YAML to write.
🔥 Real-World Example: Validate 1M Rows in Under 1 Second
// Validate a production dataset with multiple quality checkslet suite = ValidationSuite::builder("Production Pipeline").check(Check::builder("Data Freshness").satisfies("created_at > now() - interval '1 day'").has_size(Assertion::GreaterThan(1000)).build()).check(Check::builder("Business Rules").has_min("revenue",Assertion::GreaterThan(0.0)).has_mean("conversion_rate",Assertion::Between(0.01,0.10)).has_correlation("ad_spend","revenue",Assertion::GreaterThan(0.5)).build()).build();// Runs all checks in a single optimized passlet report = suite.run(&ctx).await?;
🎯 Incremental Analysis - Process Only What's Changed
use term_guard::analyzers::{IncrementalAnalysisRunner,FilesystemStateStore};// Initialize with state persistencelet store = FilesystemStateStore::new("./metrics_state");let runner = IncrementalAnalysisRunner::new(store);// Process daily partitions incrementallylet state = runner.analyze_partition(&ctx,"2025-09-30",// Today's partitionvec![analyzer],).await?;// Only new data is processed, previous results are reused!
📊 Advanced Analytics - KLL Sketches & Correlation
use term_guard::analyzers::{KllSketchAnalyzer,CorrelationAnalyzer};// Approximate quantiles with minimal memorylet kll = KllSketchAnalyzer::new("response_time").with_k(256)// Higher k = better accuracy.with_quantiles(vec![0.5,0.95,0.99]);// Detect relationships between metricslet correlation = CorrelationAnalyzer::new("ad_spend","revenue").with_method(CorrelationMethod::Spearman);// Handles non-linearlet results = runner.run_analyzers(vec![kll, correlation]).await?;
🔍 Multi-Table Validation - Foreign Keys & Joins
// Validate relationships across tables with fluent APIlet suite = ValidationSuite::builder("Cross-table integrity").check(Check::builder("Referential integrity").foreign_key("orders.customer_id","customers.id").temporal_consistency("orders","created_at","updated_at").build()).build();
🛡️ Enhanced Security - SSN & PII Detection
// New format validators including SSN detectionlet check = Check::builder("PII Protection").contains_ssn("ssn_field")// Validates SSN format.contains_credit_card("cc_field")// Credit card detection.contains_email("email_field")// Email validation.build();
use term_guard::analyzers::{AnomalyDetector,RelativeRateOfChangeStrategy};// Detect sudden metric changeslet detector = AnomalyDetector::new().with_strategy(RelativeRateOfChangeStrategy::new().max_rate_increase(0.5)// Flag 50%+ increases.max_rate_decrease(0.3)// Flag 30%+ decreases);let anomalies = detector.detect(&historical_metrics,¤t_metric)?;
📈 Grouped Metrics - Segment-Level Analysis
use term_guard::analyzers::{GroupedCompletenessAnalyzer};// Analyze data quality by segmentlet analyzer = GroupedCompletenessAnalyzer::new().group_by(vec!["region","product_category"]).analyze_column("revenue");// Get metrics for each group combinationlet results = analyzer.compute(&ctx).await?;// e.g., completeness for region=US & category=Electronics