Mona Lisa Gangsta Rapping is something I have never needed to hear. Leonardo da Vinci may not have even wanted to hear it, if the technology had been available in Renaissance times. It certainly makes her less mysterious. Microsoft’s VASA-1 is the latest AI offering, despite neither wanting nor needing it.

VASA-1 generates “life-like audio-driven talking face generated in real time”. It can, in simple terms take a photo and audio recording of the person and produce an video that is incredibly convincing. Microsoft has opened up a portal to deep-fake Hell.

A talking head can be created from any photo

You may wonder why yet another AI is so exciting and novel. Microsoft’s tldr statement states that VASA-1 is able to “generate naturalistic head movements and lifelike facial behavior in real-time.”

Sinceramente, it’s the little details that make you shiver. It’s not only impeccably accurate, but it can also produce different expressions and deliveries in a flash. You could be putting words into your mouth by uploading a photo online.

Realistic and controlled output

Microsoft’s examples have a wide range of options for eye gaze, head movement and expressions. Users can express a variety of natural head movements and facial expressions. This is impressive, especially when combined with the lip-syncing.

Mona Lisa Rapping is an excellent example of what’s possible. They used an audio recording of Anne Hathaway performing a rap that became viral in 2011.