Isaac 0.1 – Perception for the Physical World

1 hour ago 1

Today, we're introducing Isaac 0.1, our first perceptive-language model and a major step toward building AI systems that can understand and interact with the physical world. Isaac 0.1 is an open-source, 2B-parameter model built for real-world applications. It sets a new standard for efficiency, delivering capabilities that meet or exceed those of models over 50 times its size.

Founded by the team behind Meta's Chameleon multimodal models, Perceptron is tackling a fundamental challenge: bringing the power of physical AI to the dynamic, multimodal, and real-time environments we live and work in.

Isaac 0.1 is the first in our family of models built to be the intelligence layer for the physical world. It's now available open source for researchers and developers everywhere.

The Efficient Frontier of Perception

Perception workloads are continuous, latency-sensitive, and often run near the sensor. Capability alone isn’t enough—you need capability at the efficient frontier, where cost, power, and tail latency meet real-world constraints.

Isaac 0.1 (2B) matches or exceeds the performance of significantly larger models on key perceptive benchmarks while using orders-of-magnitude fewer weights. That translates to drastically lower serving cost and power, edge-ready tail latencies, and scalable deployment across manufacturing, logistics, security, and robotics—delivering more powerful capabilities to more applications without compromising quality.

What’s new in Isaac 0.1

  • Visual QA, simply trained
    Strong results on standard understanding benchmarks with a straightforward, reproducible training recipe.

  • Grounded spatial intelligence
    Precise pointing and localization with robust spatial reasoning. Ask “what’s broken in this machine?” and get grounded answers with highlighted regions—handling occlusions, relationships, and object interactions.

  • In-context learning for perception
    Show a few annotated examples (defects, safety conditions, etc.) in the prompt and the model adapts—no YOLO-style fine-tuning or custom detector stacks required.

  • OCR & fine-grained detail
    Reads small text and dense scenes reliably, across resolutions, with dynamic image handling for tiny features and cluttered layouts.

  • Conversational Pointing
    A new interaction pattern where language and vision stay in lockstep: every claim is grounded and visually cited, reducing hallucinations and making reasoning auditable.

Performance

Isaac 0.1 sets a new bar for 2B-class perception models across both standard perceptive benchmarks and internal evaluations tuned to real, economically important tasks. We report results with a single 2B checkpoint and a consistent inference setup; full methodology and ablations are in the technical report.

Grounding & localization

High-precision pointing and region localization, robust to occlusion and clutter—key for reliable automation and HCI.

Visual Question Answering

Leads its size class and is competitive with much larger systems on diagram and real-world benchmarks.

In-context learning

Learns novel categories from a handful of annotated examples in the prompt, matching or surpassing fine-tuned detector baselines without task-specific training.

Below are highlights; for complete tables, prompts, and evaluation settings, see the technical report.

Try Isaac Now

We believe in building in the open and empowering the community. You can start building with Isaac today.

The Road Ahead

Isaac 0.1 is the beginning. We are focused on pushing the frontier of what’s possible for AI in the physical world—building systems that are more capable, more efficient, and more adaptable to the complexities of real-world environments.

We're working with enterprise customers to deploy Isaac in manufacturing, logistics, and security, and are hard at work on the next generation of our models that will further expand these capabilities.

Get in touch

Read Entire Article