Show HN: Ato – A thin ML layer. When results change, it tells you why
2 hours ago
2
Minimal reproducibility for ML.
Tracks config structure, code, and runtime so you can explain why runs differ — without a platform.
# Your experiment breaks. Here's how to debug it:pythontrain.pymanual# See exactly how configs mergedfinder.get_trace_statistics('my_project', 'train_step') # See which code versions ranfinder.find_similar_runs(run_id=123) # Find experiments with same structure
One question: "Why did this result change?"
Ato fingerprints config structure, function logic, and runtime outputs to answer it.
Ato is a thin layer that fingerprints your config structure, function logic, and runtime outputs.
It doesn't replace your stack; it sits beside it to answer one question: "Why did this result change?"
Three pieces, zero coupling:
ADict — Structural hashing for configs (tracks architecture changes, not just values)
Scope — Priority-based config merging with reasoning and code fingerprinting
SQLTracker — Local-first experiment tracking in SQLite (zero setup, zero servers)
Each works alone. Together, they explain why experiments diverge.
Config is not logging — it's reasoning.
Ato makes config merge order, priority, and causality visible.
Config Superpowers (That Make Reproducibility Real)
These aren't features. They're how Ato is built:
Capability
What It Does
Why It Matters
Structural hashing
Hash based on keys + types, not values
Detect when experiment architecture changes, not just hyperparameters
Priority/merge reasoning
Explicit merge order with manual inspection
See why a config value won — trace the entire merge path
Namespace isolation
Each scope owns its keys
Team/module collisions are impossible — no need for model_lr vs data_lr prefixes
python train.py manual # See exactly how configs merged
Step 5: Track experiments locally
fromato.db_routers.sql.managerimportSQLLoggerlogger=SQLLogger(config)
run_id=logger.run(tags=['baseline'])
# Your training looplogger.log_metric('loss', loss, step=epoch)
logger.finish(status='completed')
Three lines to tracked experiments:
fromato.scopeimportScopescope=Scope()
@scope.observe(default=True)defconfig(config):
config.lr=0.001config.batch_size=32config.model='resnet50'@scopedeftrain(config):
print(f"Training {config.model} with lr={config.lr}")
if__name__=='__main__':
train()
Run it:
python train.py # Uses defaults
python train.py lr=0.01 # Override from CLI
python train.py manual # See config merge order
Note: Lazy evaluation requires Python 3.8 or higher.
Compute configs after CLI args are applied:
@scope.observe()defbase_config(config):
config.model='resnet50'config.dataset='imagenet'@scope.observe(lazy=True) # Evaluated AFTER CLI argsdefcomputed_config(config):
# Adjust based on datasetifconfig.dataset=='imagenet':
config.num_classes=1000config.image_size=224elifconfig.dataset=='cifar10':
config.num_classes=10config.image_size=32
@scope.observe()defmy_config(config):
config.model='resnet50'config.num_layers=50withScope.lazy(): # Evaluated after CLIifconfig.model=='resnet101':
config.num_layers=101
MultiScope: Namespace Isolation
Manage completely separate configuration namespaces with independent priority systems.
Use case: Different teams own different scopes without key collisions.
fromato.scopeimportScope, MultiScopemodel_scope=Scope(name='model')
data_scope=Scope(name='data')
scope=MultiScope(model_scope, data_scope)
@model_scope.observe(default=True)defmodel_config(model):
model.backbone='resnet50'model.lr=0.1# Model-specific learning rate@data_scope.observe(default=True)defdata_config(data):
data.dataset='cifar10'data.lr=0.001# Data augmentation learning rate (no conflict!)@scopedeftrain(model, data): # Named parameters match scope names# Both have 'lr' but in separate namespaces!print(f"Model LR: {model.lr}, Data LR: {data.lr}")
Key advantage: model.lr and data.lr are completely independent. No naming prefixes needed.
CLI with MultiScope:
# Override model scope only
python train.py model.backbone=%resnet101%
# Override both
python train.py model.backbone=%resnet101% data.dataset=%imagenet%
Config Documentation & Debugging
The manual command visualizes the exact order of configuration application.
@scope.observe(default=True)defconfig(config):
config.lr=0.001config.batch_size=32config.model='resnet50'@scope.manualdefconfig_docs(config):
config.lr='Learning rate for optimizer'config.batch_size='Number of samples per batch'config.model='Model architecture (resnet50, resnet101, etc.)'
Output:
--------------------------------------------------
[Scope "config"]
(The Applying Order of Views)
config → (CLI Inputs)
(User Manuals)
lr: Learning rate for optimizer
batch_size: Number of samples per batch
model: Model architecture (resnet50, resnet101, etc.)
--------------------------------------------------
Why this matters:
When debugging "why is this config value not what I expect?", you see exactly which function set it and in what order.
Generates a fingerprint of the function's logic, not its name or formatting:
# These three functions have IDENTICAL fingerprints@scope.trace(trace_id='train_step')@scopedeftrain_v1(config):
loss=model(data)
returnloss@scope.trace(trace_id='train_step')@scopedeftrain_v2(config):
# Added commentsloss=model(data) # Compute lossreturnloss@scope.trace(trace_id='train_step')@scopedefcompletely_different_name(config):
loss=model(data) # Different whitespacereturnloss
All three produce the same fingerprint because the underlying logic is identical.
Now fingerprints differ — you've changed the actual computation.
Example: Catching refactoring bugs
# Original implementation@scope.trace(trace_id='forward_pass')@scopedefforward(model, x):
out=model(x)
returnout# Safe refactoring: Added comments, changed variable name, different whitespace@scope.trace(trace_id='forward_pass')@scopedefforward(model,x):
# Forward pass through modelresult=model(x) # No spacesreturnresult
These have the same fingerprint because the underlying logic is identical — only cosmetic changes.
# Unsafe refactoring: Logic changed@scope.trace(trace_id='forward_pass')@scopedefforward(model, x):
features=model.backbone(x) # Now calling backbone + head separately!logits=model.head(features)
returnlogits
This has a different fingerprint — the logic changed. If you expected them to be equivalent but they have different fingerprints, you've caught a refactoring bug.
Track what the function produces, not what it does.
Runtime Tracing (@scope.runtime_trace)
importnumpyasnp# Basic: Track full output@scope.runtime_trace(trace_id='predictions')@scopedefevaluate(model, data):
returnmodel.predict(data)
# With init_fn: Fix randomness for reproducibility@scope.runtime_trace(trace_id='predictions',init_fn=lambda: np.random.seed(42) # Initialize before execution)@scopedefevaluate_with_dropout(model, data):
returnmodel.predict(data) # Now deterministic# With inspect_fn: Track specific parts of output@scope.runtime_trace(trace_id='predictions',inspect_fn=lambdapreds: preds[:100] # Only hash first 100 predictions)@scopedefevaluate_large_output(model, data):
returnmodel.predict(data)
# Advanced: Type-only checking (ignore values)@scope.runtime_trace(trace_id='predictions',inspect_fn=lambdapreds: type(preds).__name__# Track output type only)@scopedefevaluate_structure(model, data):
returnmodel.predict(data)
Parameters:
init_fn: Optional function called before execution (e.g., seed fixing, device setup)
inspect_fn: Optional function to extract/filter what to track (e.g., first N items, specific fields, types only)
Even if code hasn't changed, if predictions differ, the runtime fingerprint changes.
Use @scope.trace() when:
You want to track code changes automatically
You're refactoring and want to isolate performance impact
You need to audit "which code produced this result?"
You want to ignore cosmetic changes (comments, whitespace, renaming)
Use @scope.runtime_trace() when:
You want to detect silent failures (code unchanged, output wrong)
You're debugging non-determinism
You need to verify model behavior across versions
You care about what the function produces, not how it's written
Use both when:
Building production ML systems
Running long-term research experiments
Multiple people modifying the same codebase
SQL Tracker: Local Experiment Tracking
Lightweight experiment tracking using SQLite.
Zero Setup: Just a SQLite file, no servers
Full History: Track all runs, metrics, and artifacts
Smart Search: Find similar experiments by config structure
Code Versioning: Track code changes via fingerprints
fromato.db_routers.sql.managerimportSQLFinderfinder=SQLFinder(config)
# Get all runs in projectruns=finder.get_runs_in_project('image_classification')
forruninruns:
print(f"Run {run.id}: {run.config.model} - {run.status}")
# Find best performing runbest_run=finder.find_best_run(
project_name='image_classification',
metric_key='val_accuracy',
mode='max'# or 'min' for loss
)
print(f"Best config: {best_run.config}")
# Find similar experiments (same config structure)similar=finder.find_similar_runs(run_id=123)
print(f"Found {len(similar)} runs with similar config structure")
# Trace statistics (code fingerprints)stats=finder.get_trace_statistics('image_classification', trace_id='model_forward')
print(f"Model forward pass has {stats['static_trace_versions']} versions")
Feature
Description
Structural Hash
Auto-track config structure changes
Metric Logging
Time-series metrics with step tracking
Artifact Management
Track model checkpoints, plots, data files
Fingerprint Tracking
Version control for code (static & runtime)
Smart Search
Find similar configs, best runs, statistics
Hyperparameter Optimization
Built-in Hyperband algorithm for efficient hyperparameter search with early stopping.
Hyperband uses successive halving:
Start with many configs, train briefly
Keep top performers, discard poor ones
Train survivors longer
Repeat until one winner remains
fromato.adictimportADictfromato.hyperopt.hyperbandimportHyperBandfromato.scopeimportScopescope=Scope()
# Define search spacesearch_spaces=ADict(
lr=ADict(
param_type='FLOAT',
param_range=(1e-5, 1e-1),
num_samples=20,
space_type='LOG'# Logarithmic spacing
),
batch_size=ADict(
param_type='INTEGER',
param_range=(16, 128),
num_samples=5,
space_type='LOG'
),
model=ADict(
param_type='CATEGORY',
categories=['resnet50', 'resnet101', 'efficientnet_b0']
)
)
# Create Hyperband optimizerhyperband=HyperBand(
scope,
search_spaces,
halving_rate=0.3, # Keep top 30% each roundnum_min_samples=3, # Stop when <= 3 configs remainmode='max'# Maximize metric (use 'min' for loss)
)
@hyperband.maindeftrain(config):
# Your training codemodel=create_model(config.model)
optimizer=Adam(lr=config.lr)
# Use __num_halved__ for early stoppingnum_epochs=compute_epochs(config.__num_halved__)
# Train and return metricval_acc=train_and_evaluate(model, optimizer, num_epochs)
returnval_accif__name__=='__main__':
# Run hyperparameter searchbest_result=train()
print(f"Best config: {best_result.config}")
print(f"Best metric: {best_result.metric}")
Type
Description
Example
FLOAT
Continuous values
Learning rate, dropout
INTEGER
Discrete integers
Batch size, num layers
CATEGORY
Categorical choices
Model type, optimizer
Space types:
LOG: Logarithmic spacing (good for learning rates)
Yes. Ato has ~100 unit tests that pass on every release.
Python codebase is ~10 files — small, readable, auditable.
What's the performance overhead?
Minimal:
Config fingerprinting: microseconds
Code fingerprinting: happens once at decoration time
Runtime fingerprinting: depends on inspect_fn complexity
SQLite logging: milliseconds per metric
Ato runs entirely locally. There's nothing to host.
If you need centralized tracking, use MLflow/W&B alongside Ato.
Every release passes 100+ unit tests.
No unchecked code. No silent failure.
This isn't a feature. It's a commitment.
When you fingerprint experiments, you're trusting the fingerprints are correct.
When you merge configs, you're trusting the merge order is deterministic.
When you trace code, you're trusting the bytecode hashing is stable.
Ato has zero tolerance for regressions.
Tests cover every module — ADict, Scope, MultiScope, SQLTracker, HyperBand — and every edge case we've encountered in production use.
python -m pytest unit_tests/ # Run locally. Always passes.
If a test fails, the release doesn't ship. Period.
Codebase size: ~10 Python files
Small, readable, auditable. No magic, no metaprogramming.