AI video has gone from party trick to production tool in what feels like a single breath. Runway has been sitting front row for the whole thing, and its CEO Cristóbal Valenzuela thinks the show hasn’t even really started yet.
The company has raised close to $860 million at a $5.3 billion valuation, which puts it in the same weight class as labs like Google and OpenAI. That’s real money for a company that started out making tools for video editors, not competing with the biggest research orgs on the planet. But here we are.
Valenzuela’s point is simple: generating a clip of a cat walking through a neon hallway is cool, but it’s not the endgame. The real target is what he calls “world models” — systems that don’t just predict the next pixel but understand how objects behave, how light moves, how cause and effect work in physical space.
This isn’t just a semantic shift. A world model that grasps physics could let you simulate a car crash for safety testing, or render a building’s structural response to wind loads, without writing a single line of traditional simulation code. It’s a fundamentally different kind of intelligence than the autoregressive next-frame generators we have now.
The gap between current video models and proper world models is still enormous. Today’s systems hallucinate constantly — limbs bend wrong, reflections don’t match, objects appear and disappear between frames. They’re pattern matchers, not reasoners. Valenzuela acknowledges this but argues that the trajectory is clear. Each generation of models gets better at consistency, and at some point, the system crosses a threshold where it’s no longer just mimicking video training data but actually modeling the underlying rules.
Runway has been pushing in this direction for a while. Their Gen-3 and Gen-4 models already show better temporal coherence than earlier versions, and the company has been investing heavily in data that captures physical interactions — not just people talking to camera but objects colliding, liquids flowing, materials deforming. That kind of data is expensive to collect and harder to train on, but it’s exactly what you need if you want a model that understands the world rather than just parroting it.
I’ve been skeptical of the “world model” hype for a while, mostly because it’s become a buzzword that gets slapped onto any model that can generate a halfway coherent 10-second clip. But Valenzuela’s framing is more grounded than most. He’s not claiming Runway has solved physics or that we’re months away from a Simulacra-level simulation engine. He’s saying that the video generation race is really a stepping stone toward something bigger, and that companies that treat it as just a better editing tool are missing the point.
Whether Runway can actually get there is another question. The competition is fierce, and the technical challenges are brutal. But the direction makes sense. If you can build a model that understands how the world works well enough to generate convincing video, you’ve essentially built a simulation engine. And that’s a much bigger market than video editing will ever be.
Comments (0)
Login Log in to comment.
Be the first to comment!