Google’s artificial intelligence lab published a new paper explaining the development of the “first-of-its-kind” vision-language-action (VLA) model that learns from scrapping the internet and other data to allow robots to understand plain language commands from humans while navigating environments like the robot from the Dinsey movie Wall-E or the robot from the late 1990s flick Bicentennial Man

“For decades, when people have imagined the distant future, they’ve almost always included a starring role for robots,” Vincent Vanhoucke, the head of robotics for Google DeepMind, wrote in a blog post. 

Do you recall the 1999 sci-fi comedy-drama film featuring Robin Williams, titled Bicentennial Man?

Vanhoucke continued, “Robots have been cast as dependable, helpful and even charming. Yet across those same decades, the technology has remained elusive — stuck in the imagined realm of science fiction.” 

Until now… 

DeepMind introduced the Robotics Transformer 2 (RT-2), which utilizes a VLA model that learns from the web and robotics data and translates this knowledge into understanding its environment and human commands.