Voyager: A Leap in Automated Video Processing
Tencent has launched Voyager, a new model building on their earlier HunyuanWorld 1.0. Voyager is part of Tencent’s “Hunyuan” ecosystem, which also includes Hunyuan3D-2 for text-to-3D generation and HunyuanVideo for video creation.
What makes Voyager exciting? The training process relies on advanced software that analyzes videos automatically. It tracks camera movements and calculates depth for each frame. This innovation eliminates the tedious task of manually labeling countless hours of footage. The system sifted through over 100,000 video clips from real-world recordings and renders from Unreal Engine.
However, running Voyager isn’t simple. It demands powerful computing resources—at least 60GB of GPU memory for 540p resolution, with 80GB recommended for optimal performance. Tencent has made the model weights available on Hugging Face, including code for both single and multi-GPU setups.
It’s crucial to note there are licensing restrictions. The model cannot be used in the EU, UK, and South Korea. If your application serves over 100 million monthly users, you’ll need a special license from Tencent.
In benchmarks like WorldScore, Voyager scored an impressive 77.62, outperforming competitors like WonderWorld and CogVideoX-I2V. It excelled in object control, style consistency, and subjective quality. Notably, it performed well in camera control, although it came second to WonderWorld.
Despite these strong benchmark results, broader deployment will face challenges due to the heavy computational power required. For developers seeking quicker results, parallel inference can be used with the xDiT framework, achieving processing speeds up to 6.69 times faster with eight GPUs than with just one.
Although the technology shows significant promise, it’s still a ways off from creating real-time interactive experiences. We are on the brink of a new era in generative art and interactive experiences, similar to the early days of Google’s Genie. This journey, fueled by new innovations like Voyager, could redefine how we create and interact with digital worlds.