DeepMind Introduces Gemini Robotics 1.5: The “Think First” AI to Bring Agents into the Physical World

Google DeepMind has made a qualitative leap in its robotics efforts: Gemini Robotics 1.5 and Gemini Robotics-ER 1.5, two models that work together to enable a robot to perceive, plan, think, use tools, and act in complex, multi-step physical tasks. The company describes this launch as a , with a clear focus on transparency, safety, and generalization across different robotic “bodies”.

The approach is simple to explain but hard to achieve: when asked to sort laundry by colors or classify waste according to local regulations, recognizing objects isn’t enough. The robot must understand the context, consult relevant information (for example, search the municipal recycling guidelines online), break down the task into steps, and execute them with robust motion control. For this, DeepMind presents two “brains” that share the workload: a deliberative planner and an inner-reflective executor.

Two models, one agent: thinking and acting with transparency

  • Gemini Robotics-ER 1.5 (VLM) is the embodied reasoning model. It acts as a “high-level brain”: plans in natural language, makes logical decisions in physical environments, and calls tools natively (such as search) to gather external data or invoke user-defined functions. Additionally, it estimates progress and success probability and achieves cutting-edge results across a battery of 15 academic benchmarks for spatial reasoning inspired by real-world cases.
  • Gemini Robotics 1.5 (VLA) is the vision-language-action model. It translates planner instructions into motor commands, guiding each step based on visual input. The key innovation: thinks before acting. This model generates an internal natural language reasoning sequence—a dialogue with itself—that explains its process and improves task resolution for semantically complex tasks. This provides explainability: the system can show how each decision was reached.

Collectively, this agent framework (reasoning-plan-acting with tool use) enhances generalization in longer tasks and diverse environments, avoiding the classic “one instruction, one move” pattern of previous generations.

From everyday examples to complexity: from laundry baskets to recycling centers

DeepMind illustrates the approach with relatable scenarios. If asked to “sort my laundry by colors”, the planner understands that white goes into one container and colors into another; it breaks down the steps (locate red garment, bring it closer, grasp, deposit in the black bin) and validates progress. If the task is “classify these objects as organic, recyclables, or other based on my location”, the system checks local regulations, identifies objects, and executes necessary movements to complete the process. In both cases, the VLA “thinks” through micro-strategies before moving the robot, segmenting long tasks into shorter, safer, and more reliable segments.

Transfer across “bodies”: learn once, perform many

A classic obstacle in robotics has been that each platform — with different kinematics, sensors, and degrees of freedom — requires specific models. DeepMind claims that Gemini Robotics 1.5 demonstrates remarkable cross-embodiment learning ability: skills trained with one system (e.g., the bimanual ALOHA 2) can work on other formats such as the humanoid Apollo or a double-arm Franka, without specialized tuning. This transfer accelerates skill learning and reduces deployment costs by reusing policies across robots.

What’s available and for whom

  • Gemini Robotics-ER 1.5: available now for developers via the Gemini API in Google AI Studio. It allows generating detailed plans and action sequences for robotic projects, with a configurable “thinking budget” balancing latency and quality.
  • Gemini Robotics 1.5: the action model is limited to selected partners and trusted testers. It’s the component that controls real robots, requiring more strict safety validation and responsibility before a broader release.

This availability gap makes sense: planning is less risky than moving hardware in uncontrolled environments. Still, opening the ER planner to the community already enables exploration of physically capable agents with greater deliberation and explainability.

“Think before acting”: precision, latency, and explainability

The intermediate thinking improves success rates on complex tasks, but it costs time. Therefore, the system introduces a tunable “thinking budget”: more deliberation for long missions (e.g., packing a suitcase based on the weather forecast), and less for reactive actions (like open-close, pick-place). The traceability of reasoning—accessible as readable text—is key for auditing decisions, debugging failures, and aligning behavior with safety norms and dialogue policies.

Safety and responsibility: ASIMOV and layered controls

DeepMind follows up with advances in semantic safety and alignment. The development has been overseen by internal responsibility teams and a Safety Council, and updates the ASIMOV benchmark to evaluate safety understanding and adherence to physical constraints with better coverage of edge cases, new question types, and video modalities. In these assessments, Gemini Robotics-ER 1.5 shows top-tier performance, and its “thinking” capacity helps better assess risks and respect safety limitations.

In operation, Gemini Robotics 1.5 incorporates a holistic layered safety approach: reasoning about safety at the semantic level before acting, aligning dialogues with existing policies, and triggering onboard low-level subsystems (e.g., collision avoidance) when needed. The philosophy: prevent at high level, mitigate at low level.

What this means for the robotics community

  1. Clear agent architecture: deliberative planner + reflective executor, both multimodal and based on the Gemini family.
  2. Generalization: increased robustness in long tasks and diverse environments; transferability across platforms without specific tuning.
  3. Native tools: the ER invokes searches and external functions to expand action space without retraining.
  4. Governance: readable natural language reasoning that facilitates audits, validations, and potentially certification for use in sensitive environments.

A milestone… with grounded expectations

The discourse is ambitious —generalist physical agents— but the team and the community agree on two realities:

  • Concrete progress in physical reasoning, planning, and explainability, with improved transfer across robots.
  • Open challenges remain in fine motor skills, robustness outside the lab, operational safety, and few-shot learning in chaotic environments.

In other words: no household robots folding clothes tomorrow; but better performance where multi-step planning, space understanding, and process transparency make a difference: lightweight logistics, labs, healthcare, flexible manufacturing, or structured interaction services.

Getting started today: safe experimentation

Those experimenting with robotics can start with Gemini Robotics-ER 1.5: send images of the environment, request step-by-step plans, and adjust the thinking budget. Best practices include adding an interpreter to review actions before execution, simulating in a digital twin, and measuring everything end-to-end (latencies p95/p99, jitter, substep success rates). When the action model becomes widely available, these principles—along with safety blocks—will form the foundation of responsible operation.

Positioning in the race for physical agents

Universities and companies compete for the agent that combines perception, language, and control. DeepMind’s proposal leverages the multimodal maturity of Gemini and data specialization for the physical world. The ceiling of the system will depend on the quality and diversity of multi-robot data, realistic safety metrics, and hardware costs capable of scaling beyond demos.

A step towards AGI… with traceability

The team frames Gemini Robotics 1.5 as a foundational step towards robots capable of reasoning and generalizing in complex environments. The difference lies not just in accuracy, but in explainability:
showing the chain of reasoning before moving the arm. For an industry that must certify behaviors and own risks, this traceability can be as crucial as precision.


Frequently Asked Questions

What is Gemini Robotics-ER 1.5 and how is it used in robotics?
It’s the embodied reasoning model that plans in natural language, calls tools (like search), and orchestrates mission steps. It helps a robot understand context, estimate progress, and select strategies before acting.

How does Gemini Robotics 1.5 differ from a traditional VLA?
It’s a VLA that “thinks before acting”: it generates legible internal reasoning explaining how each sub-step is addressed, segments long tasks, and uses vision to guide movement. This explainability enhances robustness and simplifies audits.

Can I use these models today in a project?
The planner Gemini Robotics-ER 1.5 is available via API for developers; the action model Gemini Robotics 1.5 is restricted to partners and trusted testers. A practical approach is to prototype with the ER and validate in simulation before deploying on a real robot.

What are the safety advances and how are they evaluated?
DeepMind has updated the ASIMOV benchmark to assess semantic and physical safety, reporting top-tier performance. Additionally, the VLA integrates high-level safety reasoning and onboard subsystems (e.g., collision avoidance).

Can the system learn on one robot and deploy on another?
Yes: it demonstrates transferability between embodiments (e.g., from ALOHA 2 to Apollo or a Franka bimanual) without dedicated tuning, boosting learning speed and reducing deployment costs.


via: Noticias inteligencia artificial

Scroll to Top