Midjourney Launches V1, Its First Video Model: Images That Come to Life in Seconds

The image animation AI joins the global race for dominance in AI-generated video, competing with Sora, Veo, Kling, Runway, and Pika.

AI-generated video is experiencing a true revolution, and Midjourney—previously focused on photorealistic images created with AI—has just announced its entry into this realm with V1, its first video model. With this new tool, the company allows any user to animate static images into clips of up to 20 seconds without the need for advanced technical knowledge.

The process is simple: start with an image (either generated in Midjourney or uploaded from a computer), click the Animate button, and the system generates up to four animated versions of five seconds each. These can then be extended in blocks of four seconds, up to a maximum total of 20 seconds.

But beyond its simplicity, V1 represents the first step towards a more ambitious project for Midjourney: creating a unified generative platform that combines video, images, 3D models, and real-time simulation, with the promise that one day interactive “open worlds” can be created from scratch.


Expanded Comparison: The Most Relevant AI Video Models

The launch of V1 takes place against a backdrop of great enthusiasm in the field of generative video models. Here’s a comparative review of the most notable solutions available today:

ModelCompanyMain InputMaximum DurationOutput TypeUser ControlCurrent Access StatusHighlighted Features
V1MidjourneyImage + motion promptUp to 20 s4 animated clipsLow / mediumAvailable via web (June 2025)Fast, visual, accessible
SoraOpenAITextUp to 60 sCoherent videosHighRestricted access (researchers)Advanced storytelling, physics simulation, and camera
VeoGoogle DeepMindTextUp to 60 sHigh-quality videoMedium / highClosed, in pre-release phaseCinematic quality, natural language
KlingByteDanceText + image2–4 sRealistic facial animationsMediumLimited to users in ChinaPrecise movement, facial expressiveness
Runway Gen-3RunwayText + image15–30 s (variable)Creative clipsMediumAvailable for registered usersArtistic control, integration into creative workflow
Pika 1.0Pika LabsText + imageUp to 30 sCreative videoMediumAvailable (with registration)Easy editing, quick cinematic effects

What Does V1 Bring Compared to Others?

While models like Sora and Veo focus on generating long, narrative, and highly realistic videos from scratch based on text, V1 emphasizes immediacy and visual control, making it an excellent choice for artists, designers, and content creators already working with images.

Advantages of V1:

  • Simplicity and speed: does not require complex prompt knowledge.
  • Direct interface: image-based, no scripts needed.
  • Accessibility: available from a web browser without waiting or access requests.
  • Reasonable price: each video costs about 8 times that of an image, equivalent to one second of video per image in terms of consumption.

Limitations:

  • Less narrative control.
  • Limited quality compared to more advanced models.
  • Risk of visual errors in “high-motion” scenes.

Two Modes, Two Levels of Movement

V1 offers two ways to generate animations:

  1. Automatic: the AI decides how to move the scene.
  2. Manual: the user can write a brief text describing the desired movement.

And two levels of dynamism:

  • Low movement: useful for subtle or ambient scenes.
  • High movement: for more energetic and dynamic animations, although with a greater margin for error.

In the Midst of Innovation, a Legal Warning

This launch comes at a delicate time for Midjourney. Disney and Universal have filed lawsuits against the company for alleged copyright infringement by training their models with protected images. The presented evidence includes AI-generated content that reproduces characters like Homer Simpson and Darth Vader with concerning fidelity.

Midjourney has responded by urging its community to use the tool ethically and responsibly. “When used correctly, this technology can be fun, useful, and even profound,” the company noted.


A Long-Term Vision: Generative and Interactive Worlds

Midjourney has made it clear that V1 is just one piece of a much larger puzzle. Its vision is ambitious: to build a platform where images, videos, 3D scenes, and entire worlds can be generated that respond in real-time. All of this with a visual, fast, and collaborative interface.

While it is still far from achieving the complexity of models like Sora or Veo, V1 demonstrates that Midjourney does not want to be left behind in the race for AI-generated video. On the contrary: it has chosen to do what it does best—provide powerful and accessible tools to visual creators—and make them available to everyone.

On the horizon, a future emerges where users will not only generate static images but complete universes that move, evolve, and respond instantly, all from a creative interface powered by artificial intelligence.

Source: AI News

Scroll to Top