How are energy investors positioned?
Investing.com -- Google (NASDAQ:GOOGL) DeepMind has announced the introduction of two new artificial intelligence models, Gemini Robotics and Gemini Robotics-ER, both based on the Gemini 2.0 technology. These models are intended to lay the groundwork for the next generation of practical robots.
Gemini Robotics is an advanced vision-language-action (VLA) model that extends Gemini 2.0 to include physical actions, enabling direct control of robots. The Gemini Robotics-ER model enhances Gemini’s embodied reasoning (ER) abilities, offering advanced spatial understanding for roboticists to run their own programs.
The new models are designed to allow a wide variety of robots to perform a broader range of real-world tasks. Google DeepMind is collaborating with Apptronik to create the next generation of humanoid robots using Gemini 2.0. Additionally, they are working with a select group of trusted testers to guide the development of Gemini Robotics-ER.
To be effective and beneficial to people, AI models for robotics need to be general, interactive, and dexterous. Gemini Robotics has made significant progress in all these areas, bringing us closer to truly multipurpose robots.
Gemini Robotics uses Gemini’s world understanding to generalize to new situations and solve a wide range of tasks. It is also skilled at handling new objects, diverse instructions, and new environments. The model is interactive due to its foundation on Gemini 2.0, allowing it to understand and respond to commands in everyday, conversational language. It can also adapt its behavior based on changes in its environment or instructions.
Gemini Robotics can perform complex, multi-step tasks that require precise manipulation, such as origami folding or packing a snack into a Ziploc bag. The model has been designed to adapt to different types of robots, with training primarily on data from the bi-arm robotic platform, ALOHA 2.
The Gemini Robotics-ER model enhances Gemini’s understanding of the world in ways necessary for robotics, focusing especially on spatial reasoning. It improves Gemini 2.0’s existing abilities like pointing and 3D detection by a large margin. Gemini Robotics-ER can perform all the steps necessary to control a robot right out of the box, including perception, state estimation, spatial understanding, planning and code generation.
Google DeepMind is taking a holistic approach to addressing safety in their research, from low-level motor control to high-level semantic understanding. They are also releasing a new dataset to evaluate and improve semantic safety in embodied AI and robotics. They have developed a framework to automatically generate data-driven constitutions - rules expressed directly in natural language – to steer a robot’s behavior.
Google DeepMind is collaborating with experts in their Responsible Development and Innovation team as well as their Responsibility and Safety Council to assess the societal implications of their work. They are also consulting with external specialists on the challenges and opportunities presented by embodied AI in robotics applications.
The Gemini Robotics-ER model is also available to trusted testers including Agile Robots, Agility Robots, Boston Dynamics, and Enchanted Tools. Google DeepMind is looking forward to exploring the capabilities of these models and continuing to develop AI for the next generation of more helpful robots.
This article was generated with the support of AI and reviewed by an editor. For more information see our T&C.