The Inner Detail

The Inner Detail

Home » Technology » Artificial Intelligence » Meta creates a new AI platform that makes Videos from Text called ‘Make-a-Video’

Meta creates a new AI platform that makes Videos from Text called ‘Make-a-Video’

Artificial Intelligence stretched an easy pathway for artwork to be perfect, recreating ancient sculptures, giving life to statues as real faces, and creating images on one’s wish, irrespective of realness.

AI had already let possible to blend images upon one’s texts form, so called ‘DALL-E’ and now Meta has leveraged the platform to make a video solely from the texts.

Making images out of texts comparatively doesn’t seem difficult, but constructing a video out of a text really updates the game of how we perceive and interpret technology.

“It’s much harder to generate video than photos because beyond correctly generating each pixel, the system also has to predict how they’ll change over time.” Make-A-Video doesn’t have that problem because it “understand[s] motion in the physical world and apply it to traditional text-to-image generation.”

Meta explained briefly that make-a-video is nothing but an extension of ‘images from text’, intervening it to be in accordance to continuous frames, ultimately for bringing life to the photo.

Make-a-video by Meta

The text-to-video generator tool generates short, soundless video snippets based on the same type of text prompts that’s used in DALL-E, a text-to-image generating algorithm.

Meta has accomplished this by building the AI in generative and unsupervised training approach, that has the potential to work without intervention of humans.

The system learns what the world looks like from paired text-image data and how the world moves from video footage with no associated text. Feeding the AI with unlabeled photos and video content, enables it to learn and understand the way it can interpret the images to make it live as motion.

The process involves three main components, viz (i) a base text-to-image model trained on text-image pairs, (ii) spatiotemporal convolution and attention layers that extends the picture to the temporal dimension (moving with time) and (iii) spatiotemporal networks that consist of both spatiotemporal layers, as well as frame-interpolation for high frame rate generation.

For instance, AI takes the image, understands the elements and objects in that image, interpret it by two layers, one that moves with time and one that does not and correlates it frame-to-frame to fetch a motion.

The next step of AI

Generative AI research is pushing creative expression forward by giving people tools to quickly and easily create new content. With just a few words or lines of text, Make-A-Video can bring imagination to life and create one-of-a-kind videos full of vivid colors, characters and landscapes. The system can also create videos from images, or take existing videos and create new ones that are similar.

In its official website of Make-a-video, Meta showcases number of short clips that had been created by the AI algorithm, besides publicizing the research paper.


Related Posts


It details how it was trained, along with the technical limitations of the tool, which include its inability to generate clips longer than five seconds and deliver resolutions higher than 768 by 768 pixels at 16 frames per second.

This AI approach is a step forward of the technology to think of its own and Generative AI expected to uncover many new tools in future, though cursed for pulling down the mankind.

Meta’s Make-a-video is not available to public yet, but you can sign up here to get on the queue to try the feature in future.

Scroll to Top