"Our research investigates Gemini as a visual language model (VLM) and its agentic behaviors in diverse environments from robustness and safety perspectives. So far, we have evaluated Gemini's robustness against distractions such as pop-up windows when VLM agents perform computer tasks, and have leveraged Gemini to analyze social interaction, temporal events as well as risk factors based on video input."

"Gemini Pro and Flash, with their long context window, have been helping us in OK-Robot, our open-vocabulary mobile manipulation project. Gemini enables complex natural language queries and commands over the robot's "memory": in this case, previous observations made by the robot over a long operation duration. Mahi Shafiullah and I are also using Gemini to decompose tasks into code that the robot can execute in the real world."