Current AI tool-use models are broken. They lock you into centralized platforms, compromise your privacy, and limit your control. We propose a new client-centric paradigm called Orakle.
Orakle decouples tool execution from the LLM, running “skills”, instead of “tools”, because are not external to the system anymore, in a local, user-controlled environment. Through a novel hybrid matching system, it achieves superior reliability. We present benchmark data proving that Orakle’s architecture enables flawless skill selection across all major LLMs.
Crucially, we demonstrate that a self-hosted model on consumer hardware can compete with — and in some cases, outperform — commercial cloud APIs in both speed and accuracy. This is the technical foundation for a new generation of private, powerful, and user-owned AI applications.
This document is the proof.
The dominant paradigm for LLM tool use, often called “function calling,” is a gilded cage. While convenient, it forces developers and users into a model that benefits the platform, not them.
Your Data is Not Your Own: To use a tool, your query and often your personal data are sent to a third-party server for processing. This is a fundamental privacy violation.
You are Locked In: The tool-calling mechanism is tied to the provider’s specific API. Switching models means re-engineering a core part of your application, or relying in a third party library while standards are quickly evolving compromising real compatibility.
You Have No Control: The execution environment is a black box. You cannot control its security, its dependencies, or its configuration. This is unacceptable for skills that touch sensitive internal systems.
You Pay Per-Use, Forever: Every tool call is another metered event, adding to a perpetually growing bill.
This centralized model is fundamentally at odds with the promise of a decentralized, user-owned internet.
Orakle flips the model. Instead of the LLM provider dictating tool execution, control is returned to the client application.
The process is simple and transparent:
- The user issues a command. The LLM, prompted by the Ainara Pybridge (chat) server, determines a skill is needed and outputs a natural language request via HEREDOC format, eg: “<<<ORAKLE \n what is the weather in London \n ORAKLE”
- The Orakle Middleware on the client intercepts this request.
- It uses the Orakle Hybrid Matcher (first embedding quick filtering, then LLM fine-tune match) to reliably identify the exact skill and its required parameters.
- The skill is executed locally in a secure, sandboxed environment on the user’s machine.
- The result is sent back to the LLM for interpretation into natural language.
- The result is actually interpreted according to the user query, and only the final LLM answer is injected back into the conversation, keeping a clean context free of raw data.
The LLM is used for what it’s best at — language understanding and generation — while the critical execution and data handling happens in a trusted environment controlled by the user.
Figure 1: High-Level Architecture of the Orakle Framework
The “magic” of Orakle is its two-phase skill matching process, which makes it incredibly robust and model-agnostic.
- Phase 1: Semantic Pre-selection: The natural language query from the LLM is passed to a local transformer model. This model performs a rapid semantic search against the descriptions of all available skills, instantly producing a short list of the most relevant candidates. This is fast, efficient, and weeds out 99% of irrelevant options.
- Phase 2: LLM-based Refinement: This short list of candidate skills, along with their detailed schemas, is presented to the LLM in a structured prompt. The LLM’s task is now trivial: select the single best match from a handful of options and extract the parameters.
This hybrid approach combines the raw speed of semantic search with the nuanced reasoning of an LLM, resulting in near-perfect accuracy. As our data shows, it just works.
We benchmarked the Orakle framework with a suite of tests ranging from simple calculations to complex, ambiguous queries. We used a variety of commercial cloud APIs and, critically, a self-hosted model running on a consumer-grade gaming PC.
Test Suites: general_skills (simple, one-shot tasks) and complex_queries (tasks requiring disambiguation and complex parameter extraction).
Metrics: Skill Selection Correctness, Parameter Extraction Score, Interpretation Score, Overall Success Rate, and Average End-to-End Duration.
Local Setup: A quantized Qwen 2.5 Code model running via llama.cpp on a PC with an NVIDIA RTX 3060 GPU.
The data speaks for itself.
- Skill Selection is a Solved Problem: The Orakle Hybrid Matcher achieved a 100% success rate in selecting the correct skill across every model and every test case. The architecture is fundamentally sound.
- Performance is a Trade-off: The fastest models (Haiku, Gemini) proved highly effective, making the latency of the client-centric approach a non-issue for interactive applications.
- The Local Contender: The most important result is local/qwen-2.5-coder-rtx3060. This is the proof.
Let’s look closer at the “local/qwen-2.5-rtx3060” result. This isn’t just a participant; it’s a top-tier competitor.
- It achieved a 75% success rate, matching the commercial options gpt-4.1-mini and deepseek-chat.
- Its average speed of 7.23 seconds is more than twice as fast as Deepseek’s API (15.64s).
- It achieved one of the highest Interpretation Scores (93%), meaning it’s excellent at producing human-friendly output.
This data proves that a user can achieve the triad of benefits that large corporations cannot offer:
- Total Privacy & Control: The model, the data, and the tools never leave the user’s machine.
- Competitive Performance: The speed and reliability are in the same league as premium, cloud-based services.
- Zero Per-Use Cost: After the one-time hardware cost, execution is free.
- Magic Hot-Swap: As the technology is not attached anymore to any server-side architecture, is actually possible to swap the provider, bringing all skill/tools with you mid-conversation.
The Orakle framework makes local-first AI not just a dream for privacy advocates, but a practical, high-performance reality.
Orakle is more than just a clever piece of engineering. It is the foundational engine for the Ainara Protocol.
While this manifesto focuses on the technical proof, Orakle is the engine that will power AI-Driven Applications (AID Apps) — a new class of user-owned software detailed in our main whitepaper. Orakle proves that the core of our vision is not just possible, but already built and benchmarked.
We are building a new, open operating system for AI, and it starts with giving control back to the user and the developer. Orakle is how we do it.
The era of centralized, walled-garden AI is over before it has truly begun. A new, open paradigm is possible.
- Explore the Code: https://github.com/khromalabs/ainara
- Read the Grand Vision: https://ainara.app/AINARA_AID_PLATFORM_WHITEPAPER_V1.pdf
- Join the Community: https://discord.com/invite/WqRAwXKN9u
.png)


