Show HN: We made our own inference engine for Apple Silicon

3 months ago 5

Mirai

A high-performance inference engine for AI models on Apple Silicon. Key features:

Simple, high-level API
Hybrid architecture, where layers can be computed as GPU kernels or via MPSGraph (a low-level API beneath CoreML with ANE access)
Unified model configurations, making it easy to add support for new models
Traceable computations to ensure correctness against the source-of-truth implementation
Utilizes unified memory on Apple devices

First, add the uzu dependency to your Cargo.toml:

[dependencies] uzu = { git = "https://github.com/trymirai/uzu", branch = "main", package = "uzu" }

Then, create an inference Session with a specific model and configuration:

use std::path::PathBuf; use uzu::{ backends::metal::sampling_config::SamplingConfig, session::{ session::Session, session_config::SessionConfig, session_input::SessionInput, session_output::SessionOutput, session_run_config::SessionRunConfig, }, }; fn main() -> Result<(), Box<dyn std::error::Error>> { let model_path = PathBuf::from("MODEL_PATH"); let mut session = Session::new(model_path.clone())?; session.load_with_session_config(SessionConfig::default())?; let input = SessionInput::Text("Tell about London".to_string()); let tokens_limit = 128; let run_config = SessionRunConfig::new_with_sampling_config( tokens_limit, Some(SamplingConfig::default()) ); let output = session.run(input, run_config, Some(|_: SessionOutput| { return true; })); println!("{}", output.text); Ok(()) }

For a detailed explanation of the architecture, please refer to the documentation.

uzu uses its own model format. To export a specific model, use lalamo. First, get the list of supported models:

uv run lalamo list-models

Then, export the specific one:

uv run lalamo convert meta-llama/Llama-3.2-1B-Instruct --precision float16

Alternatively, you can download a preprepared model using the sample script:

./scripts/download_test_model.sh $MODEL_PATH

You can run uzu in a CLI mode:

cargo run --release -p cli -- help

Usage: uzu_cli [COMMAND] Commands: run Run a model with the specified path serve Start a server with the specified model path help Print this message or the help of the given subcommand(s)