Google has recently introduced the Gemini 2.5 Computer Use model, giving developers a sneak peek at its features through Project Mariner. This exciting model is designed to work with graphical user interfaces, specifically for browsers and websites.
So how does it work? Here’s a simple breakdown:
- The user sends a request that includes a screenshot and recent actions.
- The model analyzes these inputs and generates a response, often involving a basic action like clicking or typing.
- This response is executed on the client-side, and then a new screenshot is sent back to the model to continue the loop until the task is finished.
Some actions it can perform include navigating, scrolling, and using keyboard shortcuts. Google has shared a couple of examples showcasing its capabilities:
“From this link, get details for any pet in California, add them as a guest to my spa CRM at this site, and set up a follow-up appointment for October 10th.”
“Help organize our art club tasks into categories. Check out this site and move notes to the right sections.”
While Gemini 2.5 is mainly geared for web browsers, tests show promising results for mobile tasks. In fact, it outperformed previous models, like Claude and OpenAI’s offerings, especially in terms of speed and efficiency.
According to a recent report, many developers have responded positively to Gemini 2.5’s speed and user-friendly interface. They value its potential for improving workflow automation and assisting with routine tasks, showcasing a growing trend in how AI can enhance productivity.
This model is part of Google’s bigger plan, utilizing visual understanding and reasoning features to streamline UI testing and software development internally. It’s currently available for public preview through the Gemini API in Google AI Studio and Vertex AI. Look out for more updates as third-party developers start building tools based on this technology!
For further exploration, you can check out more on this in Google’s official [announcement](https://blog.google/technology/google-deepmind/gemini-computer-use-model/) and see how it’s shaping the future of AI-driven tasks.

