About InfinityStar

InfinityStar is a unified spacetime autoregressive framework for high-resolution image and dynamic video synthesis. Developed by FoundationVision, this system represents a significant advancement in visual generation technology, combining spatial and temporal modeling within a single architecture.

What is InfinityStar?

InfinityStar is a unified spacetime autoregressive framework that generates high-resolution images and dynamic videos from text descriptions. The system uses a discrete approach that jointly captures spatial and temporal dependencies within a single architecture. This unified design naturally supports a variety of generation tasks such as text-to-image, text-to-video, image-to-video, and long-duration video synthesis through straightforward temporal autoregression.

Key Achievements

  • Scores 83.74 on VBench, outperforming all autoregressive models by significant margins
  • Surpasses diffusion-based competitors like HunyuanVideo in benchmark performance
  • Generates 5-second 720p videos approximately 10 times faster than leading diffusion-based methods
  • First discrete autoregressive video generator capable of producing industrial-level 720p videos
  • Accepted as NeurIPS 2025 Oral presentation

Technical Architecture

InfinityStar operates as an 8 billion parameter model that processes visual information in a unified manner. The system uses Flan-T5-XL as its text encoder, which helps translate natural language descriptions into visual representations. The model architecture combines:

  • Unified spacetime modeling that processes spatial and temporal information together
  • Discrete autoregressive approach that treats visual content as sequences
  • FlexAttention mechanism for efficient training and inference
  • Support for multiple generation modes within a single architecture

Research and Development

InfinityStar was developed by FoundationVision and represents years of research into autoregressive modeling for visual content. The project combines insights from natural language processing with computer vision techniques, demonstrating that discrete autoregressive approaches can compete with and exceed the performance of continuous diffusion models in video generation tasks.

Open Source Commitment

The InfinityStar project is committed to open source development and research. The project includes:

  • Complete training code for reproducibility
  • Inference code for generating images and videos
  • Model checkpoints for both 480p and 720p resolutions
  • Web demo for interactive exploration
  • Comprehensive documentation and guides

All code and models are released to foster further research in efficient, high-quality video generation.

Applications

InfinityStar's capabilities make it suitable for various applications:

  • Content creation and social media
  • Film and animation production
  • Educational content development
  • Prototyping and design visualization
  • Research and development in visual generation

Note: This is an unofficial about page for InfinityStar. For the most accurate and up-to-date information, please refer to the official repository and research paper.