Loom: Universal AI Runtime for Local, Cross-Platform Inference

2 hours ago 2

Running the same AI model on your phone, in your browser, on your server, and inside a game engine — without converting anything

A technical deep-dive for developers and a practical guide for everyone else on why local AI is about to get a lot more accessible

The Problem Everyone Has (But Nobody Talks About)

You’ve probably used ChatGPT. Maybe you’ve tried Claude or Gemini. They’re impressive, but they all share the same fundamental limitations:

They require internet (no WiFi on that flight? Too bad)
They cost money per use (every message = money to OpenAI/Google/Anthropic)
Your data goes to their servers (hope you trust them with your conversations)
They can change or disappear (remember when Twitter’s API got expensive overnight?)

For regular users, this means AI is something that happens “somewhere else” — in the cloud, controlled by big tech companies, requiring constant connectivity and trust.

For developers, it’s even worse. Try to build an app that uses AI, and you face:

Platform fragmentation: Different tools for web, mobile, desktop, embedded devices
Conversion hell: Convert your model to ONNX for this, TensorFlow Lite for that, CoreML for iOS
Inconsistent outputs: The same model produces slightly different results on different platforms
Dependency nightmares: Installing AI libraries means downloading gigabytes of software
Vendor lock-in: Build on OpenAI’s API? Good luck switching to something else later

What if there was a better way?

The Vision: AI That Works Everywhere, Identically

Imagine downloading an AI model once and running it:

On your Android phone (offline, in airplane mode)
In your web browser (no server needed)
On your laptop (Windows, Mac, or Linux)
Inside a video game (for intelligent NPCs)
On a Raspberry Pi (edge computing)
In a Python script, JavaScript app, or C# program

With identical behavior everywhere. Not “close enough.” Identical.

With zero format conversions. Just load and run.

With no cloud dependency. Everything local, private, under your control.

That’s LOOM. And it’s working right now.

What Is LOOM?

LOOM (Layered Omni-architecture Openfluke Machine) is a neural network framework written in Go that solves the fundamental problems of AI deployment.

For Non-Technical Readers:

Think of LOOM like a universal translator for AI models. Just as you can play an MP3 file on your phone, computer, or car stereo without converting it to different formats, LOOM lets the same AI model run on any device without modification.

Want to chat with an AI on your phone without sending your data to the cloud? LOOM does that.

Want a video game where NPCs can actually hold conversations? LOOM does that too.

Want to use AI in a medical app where patient data can’t leave the device? LOOM enables it.

For Technical Readers:

LOOM is a cross-platform ML runtime with several unique properties:

Native HuggingFace support: Loads safetensors directly, no conversion to ONNX/TFLITE/GGUF
Cross-platform determinism: Same model produces identical outputs (MAE < 1e-8) across all platforms
Universal API: Same function calls in Python, JavaScript, C#, Go, and WASM
Zero Python dependency: Pure Go core + C-ABI bindings = single binary deployment
Game engine native: First framework with native integration into Godot (and extensible to Unity/Unreal)
Published packages: Already on PyPI, npm, and NuGet

Technical architecture:

Supported layers: Dense, Conv2D, Multi-Head Attention (with GQA), RNN, LSTM, LayerNorm, RMSNorm, SwiGLU, Softmax (10 variants including native MoE), Residual.

Why This Matters: Three Real-World Stories

Story 1: The Indie Game Developer

The problem: Sarah is building a story-rich RPG. She wants NPCs that can hold actual conversations, remember what players said, and respond contextually. But:

Unity ML-Agents requires a Python server (can’t ship that to players)
Cloud APIs cost money per conversation (impossible for indie budgets)
Most “AI NPCs” are just decision trees pretending to be smart

The LOOM solution: Sarah loads SmolLM2–360M directly into her Godot game via C-ABI. Now her NPCs can:

Generate unique dialogue based on game context
Remember past conversations (stored locally)
Work offline (no internet required)
Cost nothing per interaction (one-time model download)

Result: The first indie game with truly conversational NPCs, running entirely on players’ devices.

See it working: Godot + LOOM Demo

Story 2: The Healthcare Startup

The problem: Dr. Martinez’s team is building a medical note-taking assistant. But healthcare regulations are strict:

Patient data CANNOT go to cloud APIs (HIPAA violations = massive fines)
AI outputs must be reproducible for audits (regulatory requirement)
Must work in hospitals with restricted internet (many are air-gapped)

The LOOM solution: Deploy LOOM to doctors’ tablets and desktops. Same model, identical outputs on both platforms, everything stays local.

Technical win: MAE < 1e-8 across platforms means audit trails are provable. “This output came from this model with this input” is verifiable.

Result: AI-powered medical tools that comply with regulations and protect patient privacy.

Story 3: The Mobile App That Works Offline

The problem: You’re traveling internationally. Your flight has no WiFi. You want to:

Translate text
Get writing suggestions
Chat with an AI assistant
Generate ideas

But every AI app says “No internet connection.”

The LOOM solution: An Android app running SmolLM2–135M locally. Full conversational AI in your pocket. No internet required. No data leaving your device. No subscription fees.

See it working: Android Demo

The Technical Deep Dive: How Does It Actually Work?

Architecture Overview

LOOM is built on three core principles:

1. Universal Model Format

Instead of requiring conversion to platform-specific formats (ONNX, TFLite, CoreML, GGUF), LOOM loads HuggingFace’s native safetensors format directly.

# No conversion needed - just load
import welvet
welvet.Transformer.load_model("HuggingFaceTB/SmolLM2-360M-Instruct")
welvet.Transformer.generate("Once upon a time")

Why this matters: Every conversion introduces:

Potential bugs (ops that don’t translate cleanly)
Maintenance overhead (re-convert after every update)
Accuracy drift (subtle numerical differences)

LOOM eliminates all of this.

2. Cross-Platform Determinism

Most ML frameworks are “mostly reproducible” — outputs vary slightly between platforms due to:

Different math libraries (Intel MKL vs OpenBLAS vs Apple Accelerate)
GPU vendor differences (NVIDIA cuDNN vs AMD ROCm)
Floating point implementation details
Compiler optimizations

LOOM achieves bit-exact determinism (MAE < 1e-8) by:

Pure Go implementation with explicit math operations
Fixed-precision arithmetic where needed
Deterministic layer implementations
Comprehensive cross-platform testing

Validation example:

Platform | Output Hash | MAE vs Reference
--------------|--------------|------------------
Go (native) | a4f8e2c1... | 0.00000000
Python | a4f8e2c1... | 0.00000000
JavaScript | a4f8e2c1... | 0.00000001
C# | a4f8e2c1... | 0.00000000
WASM | a4f8e2c1... | 0.00000002
Android | a4f8e2c1... | 0.00000001

3. Zero-Dependency Deployment

Traditional ML stacks require:

Python runtime (300MB+)
NumPy, SciPy (100MB+)
PyTorch or TensorFlow (2GB+)
CUDA libraries for GPU (several GB)

LOOM requires:

Single compiled binary (~10MB)
Model file (varies by model size)

That’s it. No pip install, no virtual environments, no dependency conflicts.

Deployment comparison:

Stack Binary Size Dependencies Platforms PyTorch N/A (Python) Python + 2GB libs Linux, Windows (x86) TensorFlow N/A (Python) Python + 500MB libs Linux, Windows, limited mobile ONNX Runtime 50–100MB C++ runtime Most platforms (with effort) LOOM 10MB Zero All platforms, identical API

Supported Models and Performance

Models That Work Right Now

LOOM supports any transformer using the Llama architecture, including:

Text Generation:

✅ Qwen2.5 (0.5B — 7B parameters)
✅ SmolLM2 (135M — 1.7B parameters)
✅ TinyLlama (1.1B parameters)
✅ Mistral (7B parameters)
✅ Llama 2/3 (with appropriate quantization)

Just released models work immediately — if it’s on HuggingFace with Llama-style architecture, LOOM loads it.

Performance Characteristics

Current state (v0.0.3):

CPU-only implementation
~0.5–3 tokens/second on small models (SmolLM2–360M)
Deterministic across all platforms
Full forward + backward passes for training

This is deliberately “correctness-first”:

No platform-specific optimizations that break determinism
No CUDA randomness
No vendor-specific math libraries
Reproducible outputs for compliance/auditing

GPU acceleration (in progress):

WebGPU compute shaders for Dense, Conv2D, Attention layers
Expected 10–50x speedup on supported operations
Maintaining determinism is the priority

Memory and Storage

Model Parameters Disk Size RAM Required Typical Use Case SmolLM2–135M 135M ~270MB ~500MB Mobile, embedded, quick responses SmolLM2–360M 360M ~720MB ~1GB Balanced quality/size for mobile Qwen2.5–0.5B 500M ~1GB ~1.5GB Desktop, high-quality generation TinyLlama-1.1B 1.1B ~2.2GB ~3GB Desktop, specialized tasks

Installation and Quick Start

For Python Developers

pip install welvetfrom welvet import Transformer# Load any HuggingFace model (downloads automatically if needed)
Transformer.load_tokenizer("HuggingFaceTB/SmolLM2-360M-Instruct")
Transformer.load_model("HuggingFaceTB/SmolLM2-360M-Instruct")# Generate text
result = Transformer.generate("The meaning of life is", max_tokens=50)
print(result.text)# Streaming generation (token-by-token)
for token in Transformer.generate_stream("Once upon a time", max_tokens=100):
print(token, end='', flush=True)

For JavaScript/TypeScript Developers

npm install @openfluke/welvetimport { initLoom } from '@openfluke/welvet';async function main() {
// Initialize WASM module
const loom = await initLoom();

// Load model (works in Node.js and browsers)
await loom.LoadTransformer("HuggingFaceTB/SmolLM2-360M-Instruct");

// Generate text
const result = loom.Generate("The future of AI is", 50);
console.log(result);
}

Browser-native inference — no server needed:

For C# / .NET Developers

dotnet add package Welvetusing Welvet;// Load model
Transformer.LoadTokenizer("HuggingFaceTB/SmolLM2-360M-Instruct");
Transformer.LoadModel("HuggingFaceTB/SmolLM2-360M-Instruct");// Generate text
var result = Transformer.Generate("In a galaxy far away", maxTokens: 50);
Console.WriteLine(result.Text);// Streaming (Unity/Godot game integration)
foreach (var token in Transformer.GenerateStream("Player: Hello\nNPC:", 100))
{
Console.Write(token); // Display in real-time
}

For Go Developers

go get github.com/openfluke/loom/nnpackage mainimport (
"fmt"
"github.com/openfluke/loom/nn"
"github.com/openfluke/loom/tokenizer"
)func main() {
// Load tokenizer and model
tk, _ := tokenizer.LoadFromFile("models/SmolLM2-360M/tokenizer.json")
network, _ := nn.LoadTransformerFromSafetensors("models/SmolLM2-360M")

// Generate text
prompt := "The key to success is"
tokens := tk.Encode(prompt, false)

// Run inference
// (Full example in repo)
}

The Game Engine Integration: A First

Why Game Engines Matter

Video games are one of the most demanding AI environments:

Real-time performance (30–60 FPS, no stuttering)
Local execution (no internet dependency for single-player)
Platform variety (PC, consoles, mobile)
Resource constraints (memory, CPU budget)

Traditional ML frameworks don’t work for games because:

❌ Unity ML-Agents requires Python server (can’t ship to players)
❌ TensorFlow has massive runtime (too big for games)
❌ Cloud APIs introduce latency (breaks immersion)
❌ Platform fragmentation (different tools for PC vs console)

LOOM + Godot: Native AI in Games

LOOM integrates with Godot Engine via C-ABI, enabling:

Local LLM-powered NPCs:

# GDScript (Godot's scripting language)
var ai = load("res://loom.gdnlib")func _ready():
ai.load_model("SmolLM2-360M-Instruct")func talk_to_npc(player_message):
var context = "You are a wise wizard. Player says: " + player_message
var response = ai.generate(context, 100)
display_dialogue(response)

What this enables:

NPCs that remember past conversations (stored in game save)
Dynamic quest generation based on player history
Procedural dialogue that fits game context
All running offline, no API costs, works on any platform

First working demo: Watch LOOM running in Godot

Mobile demo: SmolLM2 on Android in a game engine

Extending to Other Engines

The C-ABI means LOOM can integrate with:

Unity (via P/Invoke, like the C# package)
Unreal (via C++ FFI)
Custom engines (any engine with C interop)

Unity example:

using Welvet;public class NPCController : MonoBehaviour {
void Start() {
Transformer.LoadModel("SmolLM2-360M-Instruct");
}

public string GetNPCResponse(string playerInput) {
return Transformer.Generate("NPC: " + playerInput, 50).Text;
}
}

The Privacy Angle: AI Without Surveillance

Why Local AI Matters

Cloud AI has a fundamental problem: your data becomes their data.

When you use ChatGPT, Claude, or Gemini:

Your conversations are stored on their servers
They can analyze your usage patterns
They may use your data for training (depending on terms)
Government subpoenas can access your history
Service outages mean no access
Price changes affect your costs
Content policies can restrict what you can ask

LOOM flips this:

All processing happens on your device
No data sent to external servers
No usage tracking or analytics
Works in airplane mode
Can’t be censored or shut down
Zero ongoing costs after model download

Use Cases That Require Privacy

Healthcare:

Medical note-taking assistants
Patient symptom analyzers
Clinical decision support
All must comply with HIPAA (no cloud allowed)

Legal:

Contract analysis
Case research
Client communication drafting
Attorney-client privilege (can’t use cloud)

Financial:

Personal finance advisors
Trading strategy analysis
Sensitive document review
Regulatory compliance (data sovereignty)

Personal:

Journaling with AI feedback
Therapy/mental health chatbots
Creative writing assistance
Private brainstorming

LOOM enables all of these without compromise.

Platform Coverage: Where LOOM Runs

Current Platform Support

Platform Status Installation Use Case Linux (x86–64) ✅ Production pip/npm/go get Servers, development Linux (ARM64) ✅ Production Same Raspberry Pi, edge devices macOS (Intel) ✅ Production Same Development, desktop apps macOS (Apple Silicon) ✅ Production Same M1/M2/M3 Macs Windows (x86-64) ✅ Production Same Desktop apps, games Browser (WASM) ✅ Production npm Web apps, no server needed Android (ARM64) ✅ Production C-ABI Mobile apps, games iOS (ARM64) 🔄 Testing C-ABI Mobile apps, games

Same code. Same model. Eight platforms.

Cross-Compilation Example

Build for all platforms from a single machine:

# Linux AMD64
GOOS=linux GOARCH=amd64 go build# Linux ARM64 (Raspberry Pi)
GOOS=linux GOARCH=arm64 go build# macOS (Intel and Apple Silicon)
GOOS=darwin GOARCH=amd64 go build
GOOS=darwin GOARCH=arm64 go build# Windows
GOOS=windows GOARCH=amd64 go build# WASM (for browsers)
GOOS=js GOARCH=wasm go build# Android
CGO_ENABLED=1 GOOS=android GOARCH=arm64 go build -buildmode=c-shared

One codebase. Eight binaries. All with identical behavior.

Use Cases: What People Are Building

1. Offline-First Mobile Apps

Problem: Most AI apps require constant internet connection.

LOOM Solution: Fully functional AI apps that work in airplane mode.

Examples:

Language learning apps with conversational practice
Writing assistants that work offline
Personal journaling with AI feedback
Travel companions that work internationally without data

Technical approach:

Bundle model with app (or download once)
All inference runs on device
Offline-first database (SQLite) for conversation history
Sync to cloud optional, not required

2. Privacy-Preserving Healthcare Tools

Problem: Patient data cannot go to cloud APIs (HIPAA).

LOOM Solution: Medical AI that runs entirely on hospital hardware.

Examples:

Clinical note-taking from voice dictation
Drug interaction checkers
Diagnostic suggestion tools
Medical education simulations

Technical approach:

Deploy LOOM to hospital desktops/tablets
Air-gapped network (no internet connection)
Deterministic outputs for audit trails
Same model across all devices for consistency

3. Intelligent Game NPCs

Problem: Game NPCs are scripted and repetitive.

LOOM Solution: NPCs with actual conversational AI, running locally.

Examples:

Story-rich RPGs with dynamic dialogue
Strategy games with adaptive opponents
Educational games with AI tutors
Simulation games with intelligent agents

Technical approach:

Integrate LOOM via C-ABI into game engine
Load appropriate model size for target hardware
Combine LLM output with game logic
Store conversation history in save files

4. Edge Computing and IoT

Problem: Edge devices can’t depend on cloud latency.

LOOM Solution: Run AI directly on edge hardware.

Examples:

Smart home assistants (local voice control)
Industrial monitoring systems
Retail kiosks (works without internet)
Agricultural sensors with AI analysis

Technical approach:

Compile for ARM64 (Raspberry Pi, etc.)
Use quantized models for smaller footprint
On-device inference with sub-second latency
Optional cloud sync for aggregated insights

5. AI for Regulated Industries

Problem: Compliance requires deterministic, auditable AI.

LOOM Solution: Bit-exact outputs enable regulatory compliance.

Examples:

Financial services (audit trails)
Legal tech (reproducible analysis)
Government systems (data sovereignty)
Scientific research (reproducibility)

Technical approach:

Deploy identical model across all machines
MAE < 1e-8 ensures outputs match exactly
Hash inputs/outputs for audit logs
No cloud dependency = no data leakage

The Technology Stack: How LOOM Works Under the Hood

Core Architecture

Language: Pure Go (with C-ABI for foreign function interface)

Why Go:

Compiles to native binaries (fast execution)
Cross-compilation built-in (one command for any platform)
No runtime dependency (unlike Python)
Memory safe (unlike C/C++)
Excellent concurrency support (goroutines)
Growing ecosystem

Layer Implementation

LOOM implements 10 layer types, all with full CPU forward and backward passes:

1. Dense (Fully-Connected)

Matrix multiplication with activation
Supports: ReLU, Sigmoid, Tanh, Softplus, LeakyReLU, Linear
Used in: MLPs, feedforward networks, output layers

2. Conv2D (2D Convolution)

Standard convolution with stride, padding, dilation
Multiple filters, channels, activations
Used in: Image processing, spatial feature extraction

3. Multi-Head Attention

Transformer-style scaled dot-product attention
Supports Grouped Query Attention (GQA) for efficiency
Q/K/V projections with output projection
Used in: Transformers, sequence modeling

4. RNN (Recurrent Neural Network)

Simple recurrent layer with hidden state
Backpropagation Through Time (BPTT)
Used in: Sequence modeling, time series

5. LSTM (Long Short-Term Memory)

Forget, input, output, cell gates
Addresses vanishing gradient problem
Used in: Long sequences, temporal dependencies

6. LayerNorm (Layer Normalization)

Normalizes across feature dimension
Learned scale (gamma) and shift (beta)
Used in: Transformers, stabilizing training

7. RMSNorm (Root Mean Square Normalization)

Llama-style normalization (no beta parameter)
More efficient than LayerNorm
Used in: Modern transformers (Llama, Qwen, Mistral)

8. SwiGLU (Swish-Gated Linear Unit)

Gated activation: down(silu(gate(x)) * up(x))
Used in Llama/Qwen FFN layers
Better than standard ReLU for transformers

9. Softmax (10 variants)

Standard, Grid, Hierarchical, Temperature, Gumbel
Masked, Sparsemax, Entmax, Adaptive, Mixture
Grid Softmax = Native MoE (mathematically proven)

10. Residual (Skip Connections)

Adds input to output (residual connections)
Essential for deep networks
Used in: ResNets, Transformers

The Tokenizer: Pure Go BPE

Most frameworks depend on HuggingFace’s transformers library (Python) for tokenization.

LOOM implements BPE (Byte Pair Encoding) from scratch in Go:

Loads tokenizer.json directly
Supports multiple encoding schemes (GPT-2, Llama, Qwen, T5)
No Python dependency
Identical behavior to HuggingFace tokenizers

Why this matters:

Desktop apps don’t need Python installed
Mobile apps can tokenize on-device
WASM runs without external libraries
Deterministic across platforms

Model Loading: Direct Safetensors Support

Safetensors format:

Developed by HuggingFace as safer alternative to pickle
Binary format with header + tensors
No arbitrary code execution (unlike pickle)
Memory-mapped for efficiency

LOOM’s implementation:

// Load model from HuggingFace directory
network, err := nn.LoadTransformerFromSafetensors("models/SmolLM2-360M")// Or from bytes (for embedding in apps)
configBytes, _ := os.ReadFile("config.json")
weightsBytes, _ := os.ReadFile("model.safetensors")
network, err := nn.LoadTransformerFromBytes(configBytes, weightsBytes)

No conversion needed. If it’s on HuggingFace, LOOM loads it.

C-ABI: The Universal Interface

C Application Binary Interface is how LOOM talks to other languages:

// C header (auto-generated)
typedef void* NetworkHandle;NetworkHandle LoadModel(const char* modelPath);
char* Generate(NetworkHandle handle, const char* prompt, int maxTokens);
void FreeNetwork(NetworkHandle handle);

Bindings for each language:

Python: Uses ctypes to call C functions
JavaScript: WASM exports C functions automatically
C#: Uses DllImport (P/Invoke) to call native library
Go: Native (no FFI needed)

Result: Same underlying C library, different language-specific wrappers.

Comparison to Existing Solutions

LOOM vs PyTorch/TensorFlow

PyTorch/TensorFlow:

✅ Extensive layer types (100+)
✅ Mature ecosystem
✅ Fast GPU training
❌ Python dependency (2GB+ stack)
❌ Platform-specific deployment
❌ Non-deterministic (CUDA randomness)
❌ No mobile deployment story

LOOM:

⚠️ Fewer layers (10 types)
⚠️ Smaller ecosystem (new framework)
⚠️ CPU-only currently (GPU in progress)
✅ Zero dependencies (10MB binary)
✅ Universal deployment (8 platforms)
✅ Deterministic (MAE < 1e-8)
✅ Native mobile/game support

Use PyTorch/TF for: Research, training large models, maximum flexibility

Use LOOM for: Deployment, cross-platform apps, compliance, privacy, games

LOOM vs ONNX Runtime

ONNX Runtime:

✅ Cross-platform (with effort)
✅ Optimized inference
✅ Industry adoption
❌ Requires model conversion
❌ Not all ops supported
❌ Large binary (50–100MB)
❌ Complex integration

LOOM:

✅ No conversion needed
✅ Simple integration
✅ Small binary (10MB)
⚠️ Fewer optimizations (yet)
⚠️ Newer, less proven

Use ONNX for: Maximum compatibility with existing models

Use LOOM for: HuggingFace-first workflow, determinism, simplicity

LOOM vs llama.cpp

llama.cpp:

✅ Excellent performance
✅ Mature, well-tested
✅ Broad model support
❌ Requires GGUF conversion
❌ C++ (harder to extend)
❌ Limited language bindings

LOOM:

✅ No conversion (safetensors native)
✅ Easy to extend (Go)
✅ Multiple language APIs
⚠️ Slower (for now)
⚠️ Less optimized

Use llama.cpp for: Maximum inference speed on CPUs

Use LOOM for: No-conversion workflow, game engines, cross-language apps

LOOM vs Mobile ML SDKs (TFLite, CoreML)

TensorFlow Lite / CoreML:

✅ Optimized for mobile
✅ OS-level integration
❌ Requires conversion
❌ Platform-specific (TFLite=Android, CoreML=iOS)
❌ Limited ops
❌ Different APIs per platform

LOOM:

✅ No conversion
✅ Same API on both platforms
✅ Full transformer support
⚠️ Newer, less optimized
⚠️ CPU-only (mobile GPU coming)

Use TFLite/CoreML for: Maximum mobile performance on specific platforms

Use LOOM for: Cross-platform apps, same code iOS+Android, game engines

Roadmap: What’s Next

Short Term (3 months)

GPU Acceleration:

Complete WebGPU implementation for all layers
Target: 10–50x speedup on supported operations
Maintain determinism (no CUDA randomness)

Additional Layers:

Conv1D (audio/signal processing)
MaxPool2D / AvgPool2D (downsampling)
Embedding (categorical features)
Dropout (regularization)

iOS Production:

Complete testing on iOS devices
App Store submission guide
Example iOS + LOOM app

Documentation:

Complete API reference
Tutorials for each platform
Best practices guide
Architecture deep-dive

Medium Term (6 months)

Quantization:

INT8 quantization for smaller models
FP16 for mixed-precision inference
Dynamic quantization at runtime

Model Zoo:

Pre-optimized models for different targets
Benchmark suite across platforms
Model selection guide

Advanced Features:

LoRA adapter support
Model fine-tuning on-device
Streaming from compressed formats

Ecosystem:

Unity plugin (official)
Unreal Engine integration
VS Code extension
Model conversion tools

Long Term (12+ months)

Research:

Academic paper on cross-platform determinism
Benchmark against other frameworks
Novel optimization techniques

Community:

Model hub integration
Community-contributed layers
Plugin system for extensions
Conference talks / workshops

Enterprise:

Commercial support offerings
Compliance documentation (FDA, SOC2)
Enterprise deployment guides
Professional services

Getting Involved

For Users

Try LOOM:

Install: pip install welvet or npm install @openfluke/welvet
Run the demos: Desktop | Godot | Android
Report issues: GitHub Issues

Share your projects:

Built something with LOOM? Tag @openfluke on social media
Write about your experience
Request features you need

For Developers

Contribute:

Code: GitHub
Documentation: Help improve guides
Examples: Share your integration patterns
Testing: Validate on different platforms

Join discussions:

GitHub Discussions
Discord (coming soon)
Twitter: @openfluke

For Companies

Interested in:

Enterprise support?
Custom features?
Integration assistance?
Compliance documentation?

Contact: Open an issue on GitHub or reach out via social media.

Frequently Asked Questions

Technical Questions

Q: Can LOOM train models, or just run inference?

A: LOOM has full forward and backward passes for all layers, so training is possible. However, it’s optimized for inference. For training large models, use PyTorch/JAX, then export to LOOM for deployment.

Q: Why is LOOM slower than PyTorch on GPU?

A: Current version is CPU-only with focus on correctness and determinism. GPU acceleration is in progress. When complete, it will be competitive for inference while maintaining determinism.

Q: What about models that aren’t Llama-architecture?

A: Currently, LOOM focuses on transformer architectures used by most modern LLMs. Future versions will expand to other architectures based on community needs.

Q: Can I use LOOM with my existing PyTorch models?

A: If your model is available on HuggingFace or uses Llama-style architecture, yes. For custom models, you’d need to export weights to the LOOM format (guide coming).

Q: How does determinism work across different CPU architectures?

A: LOOM uses explicit floating-point operations and avoids architecture-specific optimizations that introduce non-determinism. This is validated through extensive cross-platform testing.

Deployment Questions

Q: What’s the memory footprint in production?

A: Depends on model size:

SmolLM2–135M: ~500MB RAM
SmolLM2–360M: ~1GB RAM
Qwen2.5–0.5B: ~1.5GB RAM
TinyLlama-1.1B: ~3GB RAM

Binary itself is ~10MB.

Q: Can I bundle models with my app?

A: Yes. Models can be embedded in the app bundle (mobile) or downloaded on first run. Example code in the documentation.

Q: Does LOOM work offline after initial model download?

A: Yes, completely. Once the model is downloaded, no internet connection is needed ever again.

Q: What about updates to models?

A: Download new model version, swap out the file, restart app. LOOM’s determinism means you can validate the new model behaves identically across all deployments before rolling out.

Privacy and Security Questions

Q: Does LOOM send any telemetry?

A: No. Zero telemetry, analytics, or data collection. Everything runs locally.

Q: Can I audit what LOOM is doing?

A: Yes. LOOM is open source. Audit the code, compile it yourself, verify the binaries match.

Q: Is LOOM HIPAA/GDPR compliant?

A: LOOM itself is just a library (no data collection), so it doesn’t have compliance obligations. However, its local-only processing makes it suitable for building compliant applications. Consult your compliance team.

Q: What about model licensing?

A: LOOM is Apache 2.0 (permissive). Models have their own licenses (check HuggingFace). Most open models allow commercial use, but verify before deploying.

Business Questions

Q: Is LOOM free for commercial use?

A: Yes. Apache 2.0 license allows commercial use without fees.

Q: Do you offer commercial support?

A: Not yet, but it’s on the roadmap. For now, support is community-based via GitHub issues and discussions.

Q: Can I get consulting help for integration?

A: Contact via GitHub issues. May consider consulting engagements depending on project scope.

Q: Will LOOM stay open source?

A: Yes. Core LOOM will always be open source. Potential future commercial offerings would be additional services (support, hosted tools, enterprise features), not the core framework.

The Bigger Picture: Why This Matters

The Current AI Landscape

We’re at an inflection point in AI:

The Cloud Era (2020–2024):

All AI happens in datacenters
Users send data to APIs
Pay per token
No privacy
Internet required

The Edge Era (2025+):

AI runs on devices
Data stays local
One-time cost
Total privacy
Works offline

LOOM is infrastructure for the Edge Era.

What’s Changing

Hardware:

Modern smartphones = 2015 datacenter performance
Neural accelerators in every device (Apple Neural Engine, Qualcomm NPU)
RAM increasing (8GB+ is common in phones now)
Storage is cheap (256GB+ standard)

Models:

Smaller models (1–7B parameters) are surprisingly good
Quantization makes them even smaller
Specialized models beat generalist models for specific tasks
Open source models competitive with closed APIs

Regulation:

GDPR, CCPA, HIPAA all restrict cloud data
AI Act in EU requires transparency
Data sovereignty becoming standard
Privacy is a competitive advantage

Economics:

Cloud AI costs add up fast ($1M+/month for moderate apps)
One-time hardware cost < ongoing API fees
Edge inference is essentially free after model download
Enables sustainable business models

The Privacy Revolution

People are waking up to surveillance capitalism:

Your conversations shouldn’t train someone else’s model
Your medical data shouldn’t leave your device
Your creative work shouldn’t feed corporate AI
Your private thoughts shouldn’t be stored in someone else’s datacenter

Local AI isn’t just technically better. It’s ethically better.

The Accessibility Angle

Cloud AI creates inequality:

Geographic: Requires fast internet (excludes rural, developing nations)
Economic: Pay-per-use excludes low-income users
Political: Censorship and control by platform owners

Local AI democratizes access:

Works everywhere, even offline
One-time cost (or free with open models)
Can’t be censored or shut down
Users control their own AI

LOOM makes powerful AI accessible to everyone, everywhere.

Conclusion: The Future Is Local

LOOM started as a solution to deployment hell. It became something bigger: infrastructure for the post-cloud AI era.

What we’ve built:

✅ Universal runtime (8 platforms, one model file)
✅ Deterministic outputs (provable, auditable)
✅ Zero dependencies (10MB binary)
✅ Privacy-first (everything local)
✅ Game engine native (first of its kind)
✅ Production packages (PyPI, npm, NuGet)

What this enables:

Offline-first mobile apps
Privacy-preserving healthcare tools
Intelligent game NPCs
Edge computing solutions
Compliant enterprise AI
Accessible AI for everyone

The vision: AI that works everywhere, costs nothing to run, respects privacy, and empowers users instead of corporations.

We’re not there yet. GPU acceleration needs work. More layers would help. The ecosystem is young.

But the foundation is solid. And it’s already working in production.

Try It Now

Developers:

pip install welvet # Python
npm install @openfluke/welvet # JavaScript
dotnet add package Welvet # C#

Everyone else:

Watch the demos
Star the GitHub repo
Share this article
Try building something

The future of AI is local. And it’s already here.

Written by the LOOM/OpenFluke team. Questions? Open an issue on GitHub or reach out on social media.

Last updated: November 2025

Tags: #AI #MachineLearning #Privacy #OpenSource #Go #Golang #Mobile #GameDev #EdgeComputing #LocalAI #HuggingFace #Transformers #LLM #Godot #CrossPlatform

Read Entire Article

Loom: Universal AI Runtime for Local, Cross-Platform Inference

Running the same AI model on your phone, in your browser, on your server, and inside a game engine — without converting anything

The Problem Everyone Has (But Nobody Talks About)

The Vision: AI That Works Everywhere, Identically

What Is LOOM?

For Non-Technical Readers:

For Technical Readers:

Why This Matters: Three Real-World Stories

Story 1: The Indie Game Developer

Story 2: The Healthcare Startup

Story 3: The Mobile App That Works Offline

The Technical Deep Dive: How Does It Actually Work?

Architecture Overview

1. Universal Model Format

2. Cross-Platform Determinism

3. Zero-Dependency Deployment

Supported Models and Performance

Models That Work Right Now

Performance Characteristics

Memory and Storage

Installation and Quick Start

For Python Developers

For JavaScript/TypeScript Developers

For C# / .NET Developers

For Go Developers

The Game Engine Integration: A First

Why Game Engines Matter

LOOM + Godot: Native AI in Games

Extending to Other Engines

The Privacy Angle: AI Without Surveillance

Why Local AI Matters

Use Cases That Require Privacy

Platform Coverage: Where LOOM Runs

Current Platform Support

Cross-Compilation Example

Use Cases: What People Are Building

1. Offline-First Mobile Apps

2. Privacy-Preserving Healthcare Tools

3. Intelligent Game NPCs

4. Edge Computing and IoT

5. AI for Regulated Industries

The Technology Stack: How LOOM Works Under the Hood

Core Architecture

Layer Implementation

The Tokenizer: Pure Go BPE

Model Loading: Direct Safetensors Support

C-ABI: The Universal Interface

Comparison to Existing Solutions

LOOM vs PyTorch/TensorFlow

LOOM vs ONNX Runtime

LOOM vs llama.cpp

LOOM vs Mobile ML SDKs (TFLite, CoreML)

Roadmap: What’s Next

Short Term (3 months)

Medium Term (6 months)

Long Term (12+ months)

Getting Involved

For Users

For Developers

For Companies

Frequently Asked Questions

Technical Questions

Deployment Questions

Privacy and Security Questions

Business Questions

The Bigger Picture: Why This Matters

The Current AI Landscape

What’s Changing

The Privacy Revolution

The Accessibility Angle

Conclusion: The Future Is Local

Try It Now

Related

Yt-dlp: External JavaScript runtime now required for full Yo...

Missing at U.N.'S Climate Meeting: American Executives

A Tour of the Acme Editor [video]