The Problem Everyone Has (But Nobody Talks About)
You’ve probably used ChatGPT. Maybe you’ve tried Claude or Gemini. They’re impressive, but they all share the same fundamental limitations:
- They require internet (no WiFi on that flight? Too bad)
- They cost money per use (every message = money to OpenAI/Google/Anthropic)
- Your data goes to their servers (hope you trust them with your conversations)
- They can change or disappear (remember when Twitter’s API got expensive overnight?)
For regular users, this means AI is something that happens “somewhere else” — in the cloud, controlled by big tech companies, requiring constant connectivity and trust.
For developers, it’s even worse. Try to build an app that uses AI, and you face:
- Platform fragmentation: Different tools for web, mobile, desktop, embedded devices
- Conversion hell: Convert your model to ONNX for this, TensorFlow Lite for that, CoreML for iOS
- Inconsistent outputs: The same model produces slightly different results on different platforms
- Dependency nightmares: Installing AI libraries means downloading gigabytes of software
- Vendor lock-in: Build on OpenAI’s API? Good luck switching to something else later
What if there was a better way?
The Vision: AI That Works Everywhere, Identically
Imagine downloading an AI model once and running it:
- On your Android phone (offline, in airplane mode)
- In your web browser (no server needed)
- On your laptop (Windows, Mac, or Linux)
- Inside a video game (for intelligent NPCs)
- On a Raspberry Pi (edge computing)
- In a Python script, JavaScript app, or C# program
With identical behavior everywhere. Not “close enough.” Identical.
With zero format conversions. Just load and run.
With no cloud dependency. Everything local, private, under your control.
That’s LOOM. And it’s working right now.
What Is LOOM?
LOOM (Layered Omni-architecture Openfluke Machine) is a neural network framework written in Go that solves the fundamental problems of AI deployment.
For Non-Technical Readers:
Think of LOOM like a universal translator for AI models. Just as you can play an MP3 file on your phone, computer, or car stereo without converting it to different formats, LOOM lets the same AI model run on any device without modification.
Want to chat with an AI on your phone without sending your data to the cloud? LOOM does that.
Want a video game where NPCs can actually hold conversations? LOOM does that too.
Want to use AI in a medical app where patient data can’t leave the device? LOOM enables it.
For Technical Readers:
LOOM is a cross-platform ML runtime with several unique properties:
- Native HuggingFace support: Loads safetensors directly, no conversion to ONNX/TFLITE/GGUF
- Cross-platform determinism: Same model produces identical outputs (MAE < 1e-8) across all platforms
- Universal API: Same function calls in Python, JavaScript, C#, Go, and WASM
- Zero Python dependency: Pure Go core + C-ABI bindings = single binary deployment
- Game engine native: First framework with native integration into Godot (and extensible to Unity/Unreal)
- Published packages: Already on PyPI, npm, and NuGet
Technical architecture:
HuggingFace Model (safetensors)↓
LOOM Core (Go) - 10 layer types with full forward/backward
↓ compiles to
C-ABI library (.so/.dylib/.dll)
↓ bindings for
Python | JavaScript | C# | Go | WASM | Mobile
Supported layers: Dense, Conv2D, Multi-Head Attention (with GQA), RNN, LSTM, LayerNorm, RMSNorm, SwiGLU, Softmax (10 variants including native MoE), Residual.
Why This Matters: Three Real-World Stories
Story 1: The Indie Game Developer
The problem: Sarah is building a story-rich RPG. She wants NPCs that can hold actual conversations, remember what players said, and respond contextually. But:
- Unity ML-Agents requires a Python server (can’t ship that to players)
- Cloud APIs cost money per conversation (impossible for indie budgets)
- Most “AI NPCs” are just decision trees pretending to be smart
The LOOM solution: Sarah loads SmolLM2–360M directly into her Godot game via C-ABI. Now her NPCs can:
- Generate unique dialogue based on game context
- Remember past conversations (stored locally)
- Work offline (no internet required)
- Cost nothing per interaction (one-time model download)
Result: The first indie game with truly conversational NPCs, running entirely on players’ devices.
See it working: Godot + LOOM Demo
Story 2: The Healthcare Startup
The problem: Dr. Martinez’s team is building a medical note-taking assistant. But healthcare regulations are strict:
- Patient data CANNOT go to cloud APIs (HIPAA violations = massive fines)
- AI outputs must be reproducible for audits (regulatory requirement)
- Must work in hospitals with restricted internet (many are air-gapped)
The LOOM solution: Deploy LOOM to doctors’ tablets and desktops. Same model, identical outputs on both platforms, everything stays local.
Technical win: MAE < 1e-8 across platforms means audit trails are provable. “This output came from this model with this input” is verifiable.
Result: AI-powered medical tools that comply with regulations and protect patient privacy.
Story 3: The Mobile App That Works Offline
The problem: You’re traveling internationally. Your flight has no WiFi. You want to:
- Translate text
- Get writing suggestions
- Chat with an AI assistant
- Generate ideas
But every AI app says “No internet connection.”
The LOOM solution: An Android app running SmolLM2–135M locally. Full conversational AI in your pocket. No internet required. No data leaving your device. No subscription fees.
See it working: Android Demo
The Technical Deep Dive: How Does It Actually Work?
Architecture Overview
LOOM is built on three core principles:
1. Universal Model Format
Instead of requiring conversion to platform-specific formats (ONNX, TFLite, CoreML, GGUF), LOOM loads HuggingFace’s native safetensors format directly.
# No conversion needed - just loadimport welvet
welvet.Transformer.load_model("HuggingFaceTB/SmolLM2-360M-Instruct")
welvet.Transformer.generate("Once upon a time")
Why this matters: Every conversion introduces:
- Potential bugs (ops that don’t translate cleanly)
- Maintenance overhead (re-convert after every update)
- Accuracy drift (subtle numerical differences)
LOOM eliminates all of this.
2. Cross-Platform Determinism
Most ML frameworks are “mostly reproducible” — outputs vary slightly between platforms due to:
- Different math libraries (Intel MKL vs OpenBLAS vs Apple Accelerate)
- GPU vendor differences (NVIDIA cuDNN vs AMD ROCm)
- Floating point implementation details
- Compiler optimizations
LOOM achieves bit-exact determinism (MAE < 1e-8) by:
- Pure Go implementation with explicit math operations
- Fixed-precision arithmetic where needed
- Deterministic layer implementations
- Comprehensive cross-platform testing
Validation example:
Platform | Output Hash | MAE vs Reference--------------|--------------|------------------
Go (native) | a4f8e2c1... | 0.00000000
Python | a4f8e2c1... | 0.00000000
JavaScript | a4f8e2c1... | 0.00000001
C# | a4f8e2c1... | 0.00000000
WASM | a4f8e2c1... | 0.00000002
Android | a4f8e2c1... | 0.00000001
3. Zero-Dependency Deployment
Traditional ML stacks require:
- Python runtime (300MB+)
- NumPy, SciPy (100MB+)
- PyTorch or TensorFlow (2GB+)
- CUDA libraries for GPU (several GB)
LOOM requires:
- Single compiled binary (~10MB)
- Model file (varies by model size)
That’s it. No pip install, no virtual environments, no dependency conflicts.
Deployment comparison:
Stack Binary Size Dependencies Platforms PyTorch N/A (Python) Python + 2GB libs Linux, Windows (x86) TensorFlow N/A (Python) Python + 500MB libs Linux, Windows, limited mobile ONNX Runtime 50–100MB C++ runtime Most platforms (with effort) LOOM 10MB Zero All platforms, identical API
Supported Models and Performance
Models That Work Right Now
LOOM supports any transformer using the Llama architecture, including:
Text Generation:
- ✅ Qwen2.5 (0.5B — 7B parameters)
- ✅ SmolLM2 (135M — 1.7B parameters)
- ✅ TinyLlama (1.1B parameters)
- ✅ Mistral (7B parameters)
- ✅ Llama 2/3 (with appropriate quantization)
Just released models work immediately — if it’s on HuggingFace with Llama-style architecture, LOOM loads it.
Performance Characteristics
Current state (v0.0.3):
- CPU-only implementation
- ~0.5–3 tokens/second on small models (SmolLM2–360M)
- Deterministic across all platforms
- Full forward + backward passes for training
This is deliberately “correctness-first”:
- No platform-specific optimizations that break determinism
- No CUDA randomness
- No vendor-specific math libraries
- Reproducible outputs for compliance/auditing
GPU acceleration (in progress):
- WebGPU compute shaders for Dense, Conv2D, Attention layers
- Expected 10–50x speedup on supported operations
- Maintaining determinism is the priority
Memory and Storage
Model Parameters Disk Size RAM Required Typical Use Case SmolLM2–135M 135M ~270MB ~500MB Mobile, embedded, quick responses SmolLM2–360M 360M ~720MB ~1GB Balanced quality/size for mobile Qwen2.5–0.5B 500M ~1GB ~1.5GB Desktop, high-quality generation TinyLlama-1.1B 1.1B ~2.2GB ~3GB Desktop, specialized tasks
Installation and Quick Start
For Python Developers
pip install welvetfrom welvet import Transformer# Load any HuggingFace model (downloads automatically if needed)Transformer.load_tokenizer("HuggingFaceTB/SmolLM2-360M-Instruct")
Transformer.load_model("HuggingFaceTB/SmolLM2-360M-Instruct")# Generate text
result = Transformer.generate("The meaning of life is", max_tokens=50)
print(result.text)# Streaming generation (token-by-token)
for token in Transformer.generate_stream("Once upon a time", max_tokens=100):
print(token, end='', flush=True)
For JavaScript/TypeScript Developers
npm install @openfluke/welvetimport { initLoom } from '@openfluke/welvet';async function main() {// Initialize WASM module
const loom = await initLoom();
// Load model (works in Node.js and browsers)
await loom.LoadTransformer("HuggingFaceTB/SmolLM2-360M-Instruct");
// Generate text
const result = loom.Generate("The future of AI is", 50);
console.log(result);
}
Browser-native inference — no server needed:
<script type="module">import { initLoom } from 'https://cdn.jsdelivr.net/npm/@openfluke/welvet/+esm';
// Now running a transformer entirely in the browser
</script>
For C# / .NET Developers
dotnet add package Welvetusing Welvet;// Load modelTransformer.LoadTokenizer("HuggingFaceTB/SmolLM2-360M-Instruct");
Transformer.LoadModel("HuggingFaceTB/SmolLM2-360M-Instruct");// Generate text
var result = Transformer.Generate("In a galaxy far away", maxTokens: 50);
Console.WriteLine(result.Text);// Streaming (Unity/Godot game integration)
foreach (var token in Transformer.GenerateStream("Player: Hello\nNPC:", 100))
{
Console.Write(token); // Display in real-time
}
For Go Developers
go get github.com/openfluke/loom/nnpackage mainimport ("fmt"
"github.com/openfluke/loom/nn"
"github.com/openfluke/loom/tokenizer"
)func main() {
// Load tokenizer and model
tk, _ := tokenizer.LoadFromFile("models/SmolLM2-360M/tokenizer.json")
network, _ := nn.LoadTransformerFromSafetensors("models/SmolLM2-360M")
// Generate text
prompt := "The key to success is"
tokens := tk.Encode(prompt, false)
// Run inference
// (Full example in repo)
}
The Game Engine Integration: A First
Why Game Engines Matter
Video games are one of the most demanding AI environments:
- Real-time performance (30–60 FPS, no stuttering)
- Local execution (no internet dependency for single-player)
- Platform variety (PC, consoles, mobile)
- Resource constraints (memory, CPU budget)
Traditional ML frameworks don’t work for games because:
- ❌ Unity ML-Agents requires Python server (can’t ship to players)
- ❌ TensorFlow has massive runtime (too big for games)
- ❌ Cloud APIs introduce latency (breaks immersion)
- ❌ Platform fragmentation (different tools for PC vs console)
LOOM + Godot: Native AI in Games
LOOM integrates with Godot Engine via C-ABI, enabling:
Local LLM-powered NPCs:
# GDScript (Godot's scripting language)var ai = load("res://loom.gdnlib")func _ready():
ai.load_model("SmolLM2-360M-Instruct")func talk_to_npc(player_message):
var context = "You are a wise wizard. Player says: " + player_message
var response = ai.generate(context, 100)
display_dialogue(response)
What this enables:
- NPCs that remember past conversations (stored in game save)
- Dynamic quest generation based on player history
- Procedural dialogue that fits game context
- All running offline, no API costs, works on any platform
First working demo: Watch LOOM running in Godot
Mobile demo: SmolLM2 on Android in a game engine
Extending to Other Engines
The C-ABI means LOOM can integrate with:
- Unity (via P/Invoke, like the C# package)
- Unreal (via C++ FFI)
- Custom engines (any engine with C interop)
Unity example:
using Welvet;public class NPCController : MonoBehaviour {void Start() {
Transformer.LoadModel("SmolLM2-360M-Instruct");
}
public string GetNPCResponse(string playerInput) {
return Transformer.Generate("NPC: " + playerInput, 50).Text;
}
}
The Privacy Angle: AI Without Surveillance
Why Local AI Matters
Cloud AI has a fundamental problem: your data becomes their data.
When you use ChatGPT, Claude, or Gemini:
- Your conversations are stored on their servers
- They can analyze your usage patterns
- They may use your data for training (depending on terms)
- Government subpoenas can access your history
- Service outages mean no access
- Price changes affect your costs
- Content policies can restrict what you can ask
LOOM flips this:
- All processing happens on your device
- No data sent to external servers
- No usage tracking or analytics
- Works in airplane mode
- Can’t be censored or shut down
- Zero ongoing costs after model download
Use Cases That Require Privacy
Healthcare:
- Medical note-taking assistants
- Patient symptom analyzers
- Clinical decision support
- All must comply with HIPAA (no cloud allowed)
Legal:
- Contract analysis
- Case research
- Client communication drafting
- Attorney-client privilege (can’t use cloud)
Financial:
- Personal finance advisors
- Trading strategy analysis
- Sensitive document review
- Regulatory compliance (data sovereignty)
Personal:
- Journaling with AI feedback
- Therapy/mental health chatbots
- Creative writing assistance
- Private brainstorming
LOOM enables all of these without compromise.
Platform Coverage: Where LOOM Runs
Current Platform Support
Platform Status Installation Use Case Linux (x86–64) ✅ Production pip/npm/go get Servers, development Linux (ARM64) ✅ Production Same Raspberry Pi, edge devices macOS (Intel) ✅ Production Same Development, desktop apps macOS (Apple Silicon) ✅ Production Same M1/M2/M3 Macs Windows (x86-64) ✅ Production Same Desktop apps, games Browser (WASM) ✅ Production npm Web apps, no server needed Android (ARM64) ✅ Production C-ABI Mobile apps, games iOS (ARM64) 🔄 Testing C-ABI Mobile apps, games
Same code. Same model. Eight platforms.
Cross-Compilation Example
Build for all platforms from a single machine:
# Linux AMD64GOOS=linux GOARCH=amd64 go build# Linux ARM64 (Raspberry Pi)
GOOS=linux GOARCH=arm64 go build# macOS (Intel and Apple Silicon)
GOOS=darwin GOARCH=amd64 go build
GOOS=darwin GOARCH=arm64 go build# Windows
GOOS=windows GOARCH=amd64 go build# WASM (for browsers)
GOOS=js GOARCH=wasm go build# Android
CGO_ENABLED=1 GOOS=android GOARCH=arm64 go build -buildmode=c-shared
One codebase. Eight binaries. All with identical behavior.
Use Cases: What People Are Building
1. Offline-First Mobile Apps
Problem: Most AI apps require constant internet connection.
LOOM Solution: Fully functional AI apps that work in airplane mode.
Examples:
- Language learning apps with conversational practice
- Writing assistants that work offline
- Personal journaling with AI feedback
- Travel companions that work internationally without data
Technical approach:
- Bundle model with app (or download once)
- All inference runs on device
- Offline-first database (SQLite) for conversation history
- Sync to cloud optional, not required
2. Privacy-Preserving Healthcare Tools
Problem: Patient data cannot go to cloud APIs (HIPAA).
LOOM Solution: Medical AI that runs entirely on hospital hardware.
Examples:
- Clinical note-taking from voice dictation
- Drug interaction checkers
- Diagnostic suggestion tools
- Medical education simulations
Technical approach:
- Deploy LOOM to hospital desktops/tablets
- Air-gapped network (no internet connection)
- Deterministic outputs for audit trails
- Same model across all devices for consistency
3. Intelligent Game NPCs
Problem: Game NPCs are scripted and repetitive.
LOOM Solution: NPCs with actual conversational AI, running locally.
Examples:
- Story-rich RPGs with dynamic dialogue
- Strategy games with adaptive opponents
- Educational games with AI tutors
- Simulation games with intelligent agents
Technical approach:
- Integrate LOOM via C-ABI into game engine
- Load appropriate model size for target hardware
- Combine LLM output with game logic
- Store conversation history in save files
4. Edge Computing and IoT
Problem: Edge devices can’t depend on cloud latency.
LOOM Solution: Run AI directly on edge hardware.
Examples:
- Smart home assistants (local voice control)
- Industrial monitoring systems
- Retail kiosks (works without internet)
- Agricultural sensors with AI analysis
Technical approach:
- Compile for ARM64 (Raspberry Pi, etc.)
- Use quantized models for smaller footprint
- On-device inference with sub-second latency
- Optional cloud sync for aggregated insights
5. AI for Regulated Industries
Problem: Compliance requires deterministic, auditable AI.
LOOM Solution: Bit-exact outputs enable regulatory compliance.
Examples:
- Financial services (audit trails)
- Legal tech (reproducible analysis)
- Government systems (data sovereignty)
- Scientific research (reproducibility)
Technical approach:
- Deploy identical model across all machines
- MAE < 1e-8 ensures outputs match exactly
- Hash inputs/outputs for audit logs
- No cloud dependency = no data leakage
The Technology Stack: How LOOM Works Under the Hood
Core Architecture
Language: Pure Go (with C-ABI for foreign function interface)
Why Go:
- Compiles to native binaries (fast execution)
- Cross-compilation built-in (one command for any platform)
- No runtime dependency (unlike Python)
- Memory safe (unlike C/C++)
- Excellent concurrency support (goroutines)
- Growing ecosystem
Layer Implementation
LOOM implements 10 layer types, all with full CPU forward and backward passes:
1. Dense (Fully-Connected)
- Matrix multiplication with activation
- Supports: ReLU, Sigmoid, Tanh, Softplus, LeakyReLU, Linear
- Used in: MLPs, feedforward networks, output layers
2. Conv2D (2D Convolution)
- Standard convolution with stride, padding, dilation
- Multiple filters, channels, activations
- Used in: Image processing, spatial feature extraction
3. Multi-Head Attention
- Transformer-style scaled dot-product attention
- Supports Grouped Query Attention (GQA) for efficiency
- Q/K/V projections with output projection
- Used in: Transformers, sequence modeling
4. RNN (Recurrent Neural Network)
- Simple recurrent layer with hidden state
- Backpropagation Through Time (BPTT)
- Used in: Sequence modeling, time series
5. LSTM (Long Short-Term Memory)
- Forget, input, output, cell gates
- Addresses vanishing gradient problem
- Used in: Long sequences, temporal dependencies
6. LayerNorm (Layer Normalization)
- Normalizes across feature dimension
- Learned scale (gamma) and shift (beta)
- Used in: Transformers, stabilizing training
7. RMSNorm (Root Mean Square Normalization)
- Llama-style normalization (no beta parameter)
- More efficient than LayerNorm
- Used in: Modern transformers (Llama, Qwen, Mistral)
8. SwiGLU (Swish-Gated Linear Unit)
- Gated activation: down(silu(gate(x)) * up(x))
- Used in Llama/Qwen FFN layers
- Better than standard ReLU for transformers
9. Softmax (10 variants)
- Standard, Grid, Hierarchical, Temperature, Gumbel
- Masked, Sparsemax, Entmax, Adaptive, Mixture
- Grid Softmax = Native MoE (mathematically proven)
10. Residual (Skip Connections)
- Adds input to output (residual connections)
- Essential for deep networks
- Used in: ResNets, Transformers
The Tokenizer: Pure Go BPE
Most frameworks depend on HuggingFace’s transformers library (Python) for tokenization.
LOOM implements BPE (Byte Pair Encoding) from scratch in Go:
- Loads tokenizer.json directly
- Supports multiple encoding schemes (GPT-2, Llama, Qwen, T5)
- No Python dependency
- Identical behavior to HuggingFace tokenizers
Why this matters:
- Desktop apps don’t need Python installed
- Mobile apps can tokenize on-device
- WASM runs without external libraries
- Deterministic across platforms
Model Loading: Direct Safetensors Support
Safetensors format:
- Developed by HuggingFace as safer alternative to pickle
- Binary format with header + tensors
- No arbitrary code execution (unlike pickle)
- Memory-mapped for efficiency
LOOM’s implementation:
// Load model from HuggingFace directorynetwork, err := nn.LoadTransformerFromSafetensors("models/SmolLM2-360M")// Or from bytes (for embedding in apps)
configBytes, _ := os.ReadFile("config.json")
weightsBytes, _ := os.ReadFile("model.safetensors")
network, err := nn.LoadTransformerFromBytes(configBytes, weightsBytes)
No conversion needed. If it’s on HuggingFace, LOOM loads it.
C-ABI: The Universal Interface
C Application Binary Interface is how LOOM talks to other languages:
// C header (auto-generated)typedef void* NetworkHandle;NetworkHandle LoadModel(const char* modelPath);
char* Generate(NetworkHandle handle, const char* prompt, int maxTokens);
void FreeNetwork(NetworkHandle handle);
Bindings for each language:
- Python: Uses ctypes to call C functions
- JavaScript: WASM exports C functions automatically
- C#: Uses DllImport (P/Invoke) to call native library
- Go: Native (no FFI needed)
Result: Same underlying C library, different language-specific wrappers.
Comparison to Existing Solutions
LOOM vs PyTorch/TensorFlow
PyTorch/TensorFlow:
- ✅ Extensive layer types (100+)
- ✅ Mature ecosystem
- ✅ Fast GPU training
- ❌ Python dependency (2GB+ stack)
- ❌ Platform-specific deployment
- ❌ Non-deterministic (CUDA randomness)
- ❌ No mobile deployment story
LOOM:
- ⚠️ Fewer layers (10 types)
- ⚠️ Smaller ecosystem (new framework)
- ⚠️ CPU-only currently (GPU in progress)
- ✅ Zero dependencies (10MB binary)
- ✅ Universal deployment (8 platforms)
- ✅ Deterministic (MAE < 1e-8)
- ✅ Native mobile/game support
Use PyTorch/TF for: Research, training large models, maximum flexibility
Use LOOM for: Deployment, cross-platform apps, compliance, privacy, games
LOOM vs ONNX Runtime
ONNX Runtime:
- ✅ Cross-platform (with effort)
- ✅ Optimized inference
- ✅ Industry adoption
- ❌ Requires model conversion
- ❌ Not all ops supported
- ❌ Large binary (50–100MB)
- ❌ Complex integration
LOOM:
- ✅ No conversion needed
- ✅ Simple integration
- ✅ Small binary (10MB)
- ⚠️ Fewer optimizations (yet)
- ⚠️ Newer, less proven
Use ONNX for: Maximum compatibility with existing models
Use LOOM for: HuggingFace-first workflow, determinism, simplicity
LOOM vs llama.cpp
llama.cpp:
- ✅ Excellent performance
- ✅ Mature, well-tested
- ✅ Broad model support
- ❌ Requires GGUF conversion
- ❌ C++ (harder to extend)
- ❌ Limited language bindings
LOOM:
- ✅ No conversion (safetensors native)
- ✅ Easy to extend (Go)
- ✅ Multiple language APIs
- ⚠️ Slower (for now)
- ⚠️ Less optimized
Use llama.cpp for: Maximum inference speed on CPUs
Use LOOM for: No-conversion workflow, game engines, cross-language apps
LOOM vs Mobile ML SDKs (TFLite, CoreML)
TensorFlow Lite / CoreML:
- ✅ Optimized for mobile
- ✅ OS-level integration
- ❌ Requires conversion
- ❌ Platform-specific (TFLite=Android, CoreML=iOS)
- ❌ Limited ops
- ❌ Different APIs per platform
LOOM:
- ✅ No conversion
- ✅ Same API on both platforms
- ✅ Full transformer support
- ⚠️ Newer, less optimized
- ⚠️ CPU-only (mobile GPU coming)
Use TFLite/CoreML for: Maximum mobile performance on specific platforms
Use LOOM for: Cross-platform apps, same code iOS+Android, game engines
Roadmap: What’s Next
Short Term (3 months)
GPU Acceleration:
- Complete WebGPU implementation for all layers
- Target: 10–50x speedup on supported operations
- Maintain determinism (no CUDA randomness)
Additional Layers:
- Conv1D (audio/signal processing)
- MaxPool2D / AvgPool2D (downsampling)
- Embedding (categorical features)
- Dropout (regularization)
iOS Production:
- Complete testing on iOS devices
- App Store submission guide
- Example iOS + LOOM app
Documentation:
- Complete API reference
- Tutorials for each platform
- Best practices guide
- Architecture deep-dive
Medium Term (6 months)
Quantization:
- INT8 quantization for smaller models
- FP16 for mixed-precision inference
- Dynamic quantization at runtime
Model Zoo:
- Pre-optimized models for different targets
- Benchmark suite across platforms
- Model selection guide
Advanced Features:
- LoRA adapter support
- Model fine-tuning on-device
- Streaming from compressed formats
Ecosystem:
- Unity plugin (official)
- Unreal Engine integration
- VS Code extension
- Model conversion tools
Long Term (12+ months)
Research:
- Academic paper on cross-platform determinism
- Benchmark against other frameworks
- Novel optimization techniques
Community:
- Model hub integration
- Community-contributed layers
- Plugin system for extensions
- Conference talks / workshops
Enterprise:
- Commercial support offerings
- Compliance documentation (FDA, SOC2)
- Enterprise deployment guides
- Professional services
Getting Involved
For Users
Try LOOM:
- Install: pip install welvet or npm install @openfluke/welvet
- Run the demos: Desktop | Godot | Android
- Report issues: GitHub Issues
Share your projects:
- Built something with LOOM? Tag @openfluke on social media
- Write about your experience
- Request features you need
For Developers
Contribute:
- Code: GitHub
- Documentation: Help improve guides
- Examples: Share your integration patterns
- Testing: Validate on different platforms
Join discussions:
- GitHub Discussions
- Discord (coming soon)
- Twitter: @openfluke
For Companies
Interested in:
- Enterprise support?
- Custom features?
- Integration assistance?
- Compliance documentation?
Contact: Open an issue on GitHub or reach out via social media.
Frequently Asked Questions
Technical Questions
Q: Can LOOM train models, or just run inference?
A: LOOM has full forward and backward passes for all layers, so training is possible. However, it’s optimized for inference. For training large models, use PyTorch/JAX, then export to LOOM for deployment.
Q: Why is LOOM slower than PyTorch on GPU?
A: Current version is CPU-only with focus on correctness and determinism. GPU acceleration is in progress. When complete, it will be competitive for inference while maintaining determinism.
Q: What about models that aren’t Llama-architecture?
A: Currently, LOOM focuses on transformer architectures used by most modern LLMs. Future versions will expand to other architectures based on community needs.
Q: Can I use LOOM with my existing PyTorch models?
A: If your model is available on HuggingFace or uses Llama-style architecture, yes. For custom models, you’d need to export weights to the LOOM format (guide coming).
Q: How does determinism work across different CPU architectures?
A: LOOM uses explicit floating-point operations and avoids architecture-specific optimizations that introduce non-determinism. This is validated through extensive cross-platform testing.
Deployment Questions
Q: What’s the memory footprint in production?
A: Depends on model size:
- SmolLM2–135M: ~500MB RAM
- SmolLM2–360M: ~1GB RAM
- Qwen2.5–0.5B: ~1.5GB RAM
- TinyLlama-1.1B: ~3GB RAM
Binary itself is ~10MB.
Q: Can I bundle models with my app?
A: Yes. Models can be embedded in the app bundle (mobile) or downloaded on first run. Example code in the documentation.
Q: Does LOOM work offline after initial model download?
A: Yes, completely. Once the model is downloaded, no internet connection is needed ever again.
Q: What about updates to models?
A: Download new model version, swap out the file, restart app. LOOM’s determinism means you can validate the new model behaves identically across all deployments before rolling out.
Privacy and Security Questions
Q: Does LOOM send any telemetry?
A: No. Zero telemetry, analytics, or data collection. Everything runs locally.
Q: Can I audit what LOOM is doing?
A: Yes. LOOM is open source. Audit the code, compile it yourself, verify the binaries match.
Q: Is LOOM HIPAA/GDPR compliant?
A: LOOM itself is just a library (no data collection), so it doesn’t have compliance obligations. However, its local-only processing makes it suitable for building compliant applications. Consult your compliance team.
Q: What about model licensing?
A: LOOM is Apache 2.0 (permissive). Models have their own licenses (check HuggingFace). Most open models allow commercial use, but verify before deploying.
Business Questions
Q: Is LOOM free for commercial use?
A: Yes. Apache 2.0 license allows commercial use without fees.
Q: Do you offer commercial support?
A: Not yet, but it’s on the roadmap. For now, support is community-based via GitHub issues and discussions.
Q: Can I get consulting help for integration?
A: Contact via GitHub issues. May consider consulting engagements depending on project scope.
Q: Will LOOM stay open source?
A: Yes. Core LOOM will always be open source. Potential future commercial offerings would be additional services (support, hosted tools, enterprise features), not the core framework.
The Bigger Picture: Why This Matters
The Current AI Landscape
We’re at an inflection point in AI:
The Cloud Era (2020–2024):
- All AI happens in datacenters
- Users send data to APIs
- Pay per token
- No privacy
- Internet required
The Edge Era (2025+):
- AI runs on devices
- Data stays local
- One-time cost
- Total privacy
- Works offline
LOOM is infrastructure for the Edge Era.
What’s Changing
Hardware:
- Modern smartphones = 2015 datacenter performance
- Neural accelerators in every device (Apple Neural Engine, Qualcomm NPU)
- RAM increasing (8GB+ is common in phones now)
- Storage is cheap (256GB+ standard)
Models:
- Smaller models (1–7B parameters) are surprisingly good
- Quantization makes them even smaller
- Specialized models beat generalist models for specific tasks
- Open source models competitive with closed APIs
Regulation:
- GDPR, CCPA, HIPAA all restrict cloud data
- AI Act in EU requires transparency
- Data sovereignty becoming standard
- Privacy is a competitive advantage
Economics:
- Cloud AI costs add up fast ($1M+/month for moderate apps)
- One-time hardware cost < ongoing API fees
- Edge inference is essentially free after model download
- Enables sustainable business models
The Privacy Revolution
People are waking up to surveillance capitalism:
- Your conversations shouldn’t train someone else’s model
- Your medical data shouldn’t leave your device
- Your creative work shouldn’t feed corporate AI
- Your private thoughts shouldn’t be stored in someone else’s datacenter
Local AI isn’t just technically better. It’s ethically better.
The Accessibility Angle
Cloud AI creates inequality:
- Geographic: Requires fast internet (excludes rural, developing nations)
- Economic: Pay-per-use excludes low-income users
- Political: Censorship and control by platform owners
Local AI democratizes access:
- Works everywhere, even offline
- One-time cost (or free with open models)
- Can’t be censored or shut down
- Users control their own AI
LOOM makes powerful AI accessible to everyone, everywhere.
Conclusion: The Future Is Local
LOOM started as a solution to deployment hell. It became something bigger: infrastructure for the post-cloud AI era.
What we’ve built:
- ✅ Universal runtime (8 platforms, one model file)
- ✅ Deterministic outputs (provable, auditable)
- ✅ Zero dependencies (10MB binary)
- ✅ Privacy-first (everything local)
- ✅ Game engine native (first of its kind)
- ✅ Production packages (PyPI, npm, NuGet)
What this enables:
- Offline-first mobile apps
- Privacy-preserving healthcare tools
- Intelligent game NPCs
- Edge computing solutions
- Compliant enterprise AI
- Accessible AI for everyone
The vision: AI that works everywhere, costs nothing to run, respects privacy, and empowers users instead of corporations.
We’re not there yet. GPU acceleration needs work. More layers would help. The ecosystem is young.
But the foundation is solid. And it’s already working in production.
Try It Now
Developers:
pip install welvet # Pythonnpm install @openfluke/welvet # JavaScript
dotnet add package Welvet # C#
Everyone else:
- Watch the demos
- Star the GitHub repo
- Share this article
- Try building something
The future of AI is local. And it’s already here.
Written by the LOOM/OpenFluke team. Questions? Open an issue on GitHub or reach out on social media.
Last updated: November 2025
Tags: #AI #MachineLearning #Privacy #OpenSource #Go #Golang #Mobile #GameDev #EdgeComputing #LocalAI #HuggingFace #Transformers #LLM #Godot #CrossPlatform
.png)

![A Tour of the Acme Editor [video]](https://www.youtube.com/img/desktop/supported_browsers/firefox.png)