The Orakle Manifesto: Why Your AI Apps (Should) Belong to You

4 months ago 6

Rubén Gómez

Current AI tool-use models are broken. They lock you into centralized platforms, compromise your privacy, and limit your control. We propose a new client-centric paradigm called Orakle.

Orakle decouples tool execution from the LLM, running “skills”, instead of “tools”, because are not external to the system anymore, in a local, user-controlled environment. Through a novel hybrid matching system, it achieves superior reliability. We present benchmark data proving that Orakle’s architecture enables flawless skill selection across all major LLMs.

Crucially, we demonstrate that a self-hosted model on consumer hardware can compete with — and in some cases, outperform — commercial cloud APIs in both speed and accuracy. This is the technical foundation for a new generation of private, powerful, and user-owned AI applications.

This document is the proof.

The dominant paradigm for LLM tool use, often called “function calling,” is a gilded cage. While convenient, it forces developers and users into a model that benefits the platform, not them.

Your Data is Not Your Own: To use a tool, your query and often your personal data are sent to a third-party server for processing. This is a fundamental privacy violation.
You are Locked In: The tool-calling mechanism is tied to the provider’s specific API. Switching models means re-engineering a core part of your application, or relying in a third party library while standards are quickly evolving compromising real compatibility.
You Have No Control: The execution environment is a black box. You cannot control its security, its dependencies, or its configuration. This is unacceptable for skills that touch sensitive internal systems.
You Pay Per-Use, Forever: Every tool call is another metered event, adding to a perpetually growing bill.

This centralized model is fundamentally at odds with the promise of a decentralized, user-owned internet.

Orakle flips the model. Instead of the LLM provider dictating tool execution, control is returned to the client application.

The process is simple and transparent:

  1. The user issues a command. The LLM, prompted by the Ainara Pybridge (chat) server, determines a skill is needed and outputs a natural language request via HEREDOC format, eg: “<<<ORAKLE \n what is the weather in London \n ORAKLE”
  2. The Orakle Middleware on the client intercepts this request.
  3. It uses the Orakle Hybrid Matcher (first embedding quick filtering, then LLM fine-tune match) to reliably identify the exact skill and its required parameters.
  4. The skill is executed locally in a secure, sandboxed environment on the user’s machine.
  5. The result is sent back to the LLM for interpretation into natural language.
  6. The result is actually interpreted according to the user query, and only the final LLM answer is injected back into the conversation, keeping a clean context free of raw data.

The LLM is used for what it’s best at — language understanding and generation — while the critical execution and data handling happens in a trusted environment controlled by the user.

Figure 1: High-Level Architecture of the Orakle Framework

Fig. 1: Orakle Architecture Diagram

The “magic” of Orakle is its two-phase skill matching process, which makes it incredibly robust and model-agnostic.

  • Phase 1: Semantic Pre-selection: The natural language query from the LLM is passed to a local transformer model. This model performs a rapid semantic search against the descriptions of all available skills, instantly producing a short list of the most relevant candidates. This is fast, efficient, and weeds out 99% of irrelevant options.
  • Phase 2: LLM-based Refinement: This short list of candidate skills, along with their detailed schemas, is presented to the LLM in a structured prompt. The LLM’s task is now trivial: select the single best match from a handful of options and extract the parameters.

This hybrid approach combines the raw speed of semantic search with the nuanced reasoning of an LLM, resulting in near-perfect accuracy. As our data shows, it just works.

We benchmarked the Orakle framework with a suite of tests ranging from simple calculations to complex, ambiguous queries. We used a variety of commercial cloud APIs and, critically, a self-hosted model running on a consumer-grade gaming PC.

Test Suites: general_skills (simple, one-shot tasks) and complex_queries (tasks requiring disambiguation and complex parameter extraction).
Metrics: Skill Selection Correctness, Parameter Extraction Score, Interpretation Score, Overall Success Rate, and Average End-to-End Duration.
Local Setup: A quantized Qwen 2.5 Code model running via llama.cpp on a PC with an NVIDIA RTX 3060 GPU.

The data speaks for itself.

Fig.2: Orakle test suite results against local and commercial providers
  1. Skill Selection is a Solved Problem: The Orakle Hybrid Matcher achieved a 100% success rate in selecting the correct skill across every model and every test case. The architecture is fundamentally sound.
  2. Performance is a Trade-off: The fastest models (Haiku, Gemini) proved highly effective, making the latency of the client-centric approach a non-issue for interactive applications.
  3. The Local Contender: The most important result is local/qwen-2.5-coder-rtx3060. This is the proof.

Let’s look closer at the “local/qwen-2.5-rtx3060” result. This isn’t just a participant; it’s a top-tier competitor.

  • It achieved a 75% success rate, matching the commercial options gpt-4.1-mini and deepseek-chat.
  • Its average speed of 7.23 seconds is more than twice as fast as Deepseek’s API (15.64s).
  • It achieved one of the highest Interpretation Scores (93%), meaning it’s excellent at producing human-friendly output.

This data proves that a user can achieve the triad of benefits that large corporations cannot offer:

  1. Total Privacy & Control: The model, the data, and the tools never leave the user’s machine.
  2. Competitive Performance: The speed and reliability are in the same league as premium, cloud-based services.
  3. Zero Per-Use Cost: After the one-time hardware cost, execution is free.
  4. Magic Hot-Swap: As the technology is not attached anymore to any server-side architecture, is actually possible to swap the provider, bringing all skill/tools with you mid-conversation.

The Orakle framework makes local-first AI not just a dream for privacy advocates, but a practical, high-performance reality.

Orakle is more than just a clever piece of engineering. It is the foundational engine for the Ainara Protocol.

While this manifesto focuses on the technical proof, Orakle is the engine that will power AI-Driven Applications (AID Apps) — a new class of user-owned software detailed in our main whitepaper. Orakle proves that the core of our vision is not just possible, but already built and benchmarked.

We are building a new, open operating system for AI, and it starts with giving control back to the user and the developer. Orakle is how we do it.

The era of centralized, walled-garden AI is over before it has truly begun. A new, open paradigm is possible.

Read Entire Article