Deep learning framework built from the ground up in pure Go

3 weeks ago 2

go-torch is an open-source deep learning framework built from the ground up in pure Go. It provides a modular, PyTorch-like API for building and training neural networks with a stable auto-differentiation engine.

mail - [email protected]

blog - https://abinesh-mathivanan.vercel.app/en/posts/post-5/

dynamic computation graph: tensors track their history, allowing for automatic gradient calculation during the backward pass.
extensible module system (nn.Layer, nn.Sequential): build complex model architectures with a flexible, Keras-like sequential API.
layer and function library: includes Conv2D, Linear, MaxPooling2D, Flatten, ReLU, CrossEntropyLoss, and SGD
real-time TUI dashboard: live graphs for batch-wise loss and epoch-wise validation accuracy, monitoring of memory usage (Heap/Total Alloc), GC cycles, and active goroutines along with keras-like summary.
optimized performance: using BLAS, go-routines and topological autograd + grad accumulation

TUI Dashboard

add support for RNN, LSTM, Transformers
implement Adam with Ga-lore and LORA techniques, RMSProp etc...
model.load() and model.save() without gob
support building Transformers

Go 1.18 or later.
system-installed BLAS library is recommended for maximum performance but not required.
some todo's are written inside the files. use 'better comments' extension for best experience.

git clone https://github.com/abinesh-mathivanan/go-torch.git cd go-torch

run the mnist training file to test out the features.

go run ./cnn_benchmark/go_bench.go

Benchmark Detail 128x128 512x512 1024x1024

Matrix Multiply	510.33 µs	13.54 ms	130.50 ms
Element-wise Add	71.72 µs	1.29 ms	4.13 ms
Element-wise Mul	47.83 µs	1.63 ms	3.91 ms
ReLU Activation	121.18 µs	1.75 ms	6.45 ms
Linear Layer Forward (B32,I128,O10)	71.93 µs
CrossEntropyLoss (B32,C10)	11.16 µs
Full Fwd-Bwd (Net:128-256-10, B32)	4.02 ms