Live demo: uhop.dev
UHOP is an open hardware optimization platform that unifies GPU acceleration across CUDA, ROCm/HIP, Metal, OpenCL, and future architectures. It detects your machine, dispatches to the best backend, can generate kernels with AI, validates them, and caches the fastest path for reuse — so developers can write simple code and run fast everywhere.
Key capabilities today:
- Automatic backend detection: Torch (CUDA/MPS/CPU), OpenCL (GPU/CPU), Triton (Linux), CPU fallback
- Drop‑in acceleration via @uhop.optimize("op") decorator (e.g., matmul)
- AI kernel generation (OpenAI) for OpenCL/CUDA/Python/Triton with validation/smoke tests
- On‑disk caching of selected kernels/implementations per device
- Friendly CLI for hardware info, demos, AI codegen, and cache tools
- Optional Local Agent so the web portal can run on your hardware
Vision: a universal, community-driven runtime optimizer that makes high‑performance computing approachable, portable, and fun — across vendors and form factors.
Planned (see issues/): multi‑backend benchmarking/policies, correctness suites, distributed training loops for AI‑generated kernels, richer dashboard, and tighter framework integrations (PyTorch/JAX).
The platform has four layers working together:
- Frontend (Vite + React) — live controls, real‑time logs, and benchmarks
- Backend (Node/Express + ws) — routes jobs to your Local Agent or server runtime
- Local Agent (Python) — runs UHOP operations on your machine securely
- UHOP Core (Python) — backends, optimizer, AI codegen/validation, caching
See also: docs/architecture.svg (source image) for sharing in blogs/slides.
At a glance, the request flow prefers the Local Agent when connected, and falls back to server‑side execution when not.
Prereqs
- Python 3.10+
- OS: Windows, macOS, or Linux
- Drivers/toolchains as applicable: CUDA (NVIDIA), OpenCL runtime (AMD/Intel/NVIDIA), Apple MPS (macOS)
- Optional: OPENAI_API_KEY for AI codegen
Install
Verify your setup
Run a demo
Try OpenCL elementwise add vs naive
Integrate in your code
Environment knobs
- UHOP_OPENCL_DEVICE_INDEX=<idx> — default OpenCL device override
- UHOP_STRICT_VALIDATE=1 — tighten AI‑kernel validation during codegen
Expose a local HTTP API for demos/automation:
Endpoints
- GET /health
- GET /info
- POST /demo/matmul with { "size": 256, "iters": 3 }
Docker
We’re building UHOP as a friendly, long‑term open platform. All experience levels welcome — and we especially invite:
- GPU engineers (CUDA/ROCm/Metal/OpenCL)
- Compiler/runtime developers (Triton/MLIR/TVM)
- ML engineers and researchers (kernels, validation, datasets)
- Frontend devs (Vite/React/Tailwind, data viz)
Start here:
- Read CONTRIBUTING.md for local setup, tests, and PR tips
- Run ./contributing.sh setup and ./contributing.sh test
- Explore issues/ for scoped design notes and milestones
Expectations:
- Keep public APIs stable; update docs/tests with behavior changes
- Aim for reproducible steps and minimal dependencies
- Small, focused PRs with clear titles (Conventional Commits encouraged)
Pre‑MVP | Runtime decorator, hardware detection, caching, CLI demo | In progress |
MVP | Multi‑backend benchmarking and selection policies | Planned |
AI Kernels v1 | Automated validation, correctness suites, smoke tests | Planned |
Dashboard | Logging, benchmark viz, local agent UX | Planned |
Frameworks | PyTorch/JAX wrappers, training loop integration | Planned |
All‑systems support | CUDA, ROCm/HIP, Metal, OpenCL (explore Vulkan/oneAPI) | Vision |
All‑ops coverage | Elementwise, reductions, convs, attention, norms, fused ops | Vision |
Protocol Spec v1.0 | Stable spec: device negotiation, cache manifests, kernel metadata | Vision |
See the issues/ directory for detailed write‑ups:
- 01 Implement runtime decorator
- 02 Hardware detection refinement
- 03 Caching metadata schema
- 04 CLI demo
- 05 AI kernel validation
- 06 Logging & benchmark viz
- 07 Multi‑backend benchmarking
Jump in with these approachable starters:
- Improve OpenCL/kernel templates and add simple correctness tests
- Add a CUDA/HIP example parity with the OpenCL elementwise add
- Enhance uhop info --json fields (driver versions, memory footprints)
- Add README snippets for Windows/Mac specific setup tips
- Polish the frontend build or add a minimal dashboard card
- Optimize CI/CD workflow and docs for PRs and promotions (badges, faster CI, templates) — see issues/15-ci-cd-workflow-docs-promo.md
Or pick one from the tracked proposals above in issues/ and comment to claim.
Run the test suite (GPU‑dependent tests skip automatically):
Targeted runs:
MIT © UHOP Systems
Tags: gpu, compiler, rocm, cuda, opencl, metal, hpc, mlops, deep-learning, open-hardware