If you find useful, consider giving it a ⭐ on GitHub — it helps others discover the project!
A lightweight library to make PyTorch training memory visible in real time (in CLI and Notebook).
Training large machine learning models often feels like a black box. One minute everything's running and the next, you're staring at a cryptic "CUDA out of memory" error.
Pinpointing which part of the model is consuming too much memory or slowing things down is frustrating and time-consuming. Traditional profiling tools can be overly complex or lack the granularity deep learning developers need.
traceml is designed to give you real-time, granular insights into memory usage without heavy overhead. It works both in the terminal (CLI) and inside Jupyter notebooks, so you can pick the workflow that fits you best:
✅ System + process-level usage (CPU, RAM, GPU)
✅ PyTorch layer-level memory allocation (via decorator/instance tracing)
✅ Live activation & gradient memory
No config, no setup, just plug-and-trace.
For developer mode:
To capture memory usage, you first need to register your model with TraceML. There are two simple ways:
✅ Any instance of TinyNet will now be automatically traced.
✅ Best when you build models dynamically or don't want to decorate the class.
Then, choose whichever fits your workflow.
Run TraceML directly in Jupyter/Colab:
Wrap your training script to see live dashboards in your terminal:
You can also run TraceML inside Jupyter/Colab. See the full example notebook for a working demo.
Notebook output will refresh live per interval, similar to the terminal dashboard.
TraceML introduces samplers that collect memory usage at intervals, not layer-by-layer traces only:
-
SystemSampler → CPU, RAM, GPU usage sampled at a fixed frequency.
-
LayerMemorySampler → Parameter allocation (per module, not per parameter).
-
ActivationMemorySampler → Tracks per-layer forward activations. Maintains current and global peak values, and estimates total activation memory for a forward pass.
-
GradientMemorySampler → Tracks per-layer backward gradients. Maintains current and global peak values, and estimates total gradient memory during backpropagation.
This means what you see in your terminal is a rolling snapshot of memory over time, giving you:
-
Live per-layer breakdowns
-
Current vs global peaks
-
Running totals of activation + gradient memory
This design makes TraceML lightweight compared to full profilers — you get practical insights without slowing training to a crawl.
- Live CPU, RAM, and GPU usage (System + Current Process)
- PyTorch module-level memory tracking
- Live activation memory tracking (per layer, plus totals)
- Live gradient memory tracking (per layer, plus totals)
- Real-time terminal dashboards via Rich
- Notebook support
- Step & operation timers (forward, backward, optimizer)
- Export logs as JSON / CSV
- More visual dashboards
TraceML is early-stage and evolving quickly. Contributions, feedback, and ideas are welcome!
-
Found it useful? Please ⭐ the repo to support development.
-
Issues / feature requests → open a GitHub issue.
-
Want to contribute? See CONTRIBUTING.md (coming soon).
📧 Contact: [email protected]
TraceML - Making PyTorch memory usage visible, one trace at a time.
.png)

