Accelerating Gemini Nano models on Pixel with frozen Multi-Token Prediction

Google has announced a new architecture that retrofits Multi-Token Prediction (MTP) onto existing, “frozen” Gemini Nano v3 models, aiming to make on-device AI faster and more efficient on mobile phones. The update is designed for edge computing and is already rolled out to the Pixel 9 and 10 series.

The company said on-device models like Gemini Nano and Gemma can power features such as instantly summarizing notifications or proofreading text messages without sending private data off device. It said the challenge is making these features work efficiently under the energy and memory limits of mobile devices.

Google said standard language models generate text “autoregressively” by producing one word, or token, at a time. That process can underuse a phone’s processing power, strain memory bandwidth, and slow down the user experience while draining the battery.

To address that bottleneck, Google said it designed new architectural components for mobile environments, building on prior approaches like the EAGLE framework and Confident Adaptive Language Modeling (CALM). It also said its recent announcements highlighted accelerating Gemma 4 with MTP and making it available to developers.

Google said the new approach provides an out-of-the-box speedup on the Pixel 9 and 10 series. For users, that means features like AI Notification Summaries and Proofread generate text significantly faster and with less energy consumption. For developers, Google said it removes the need to fine-tune separate, memory-heavy drafting models for every new task.

Source: research.google.

Companies can share verified announcements through Newz9’s international press release submission page.

Technology