Muting the Troublesome Future of Audio Deepfakes

In 2016, at the company’s Max conference, Adobe demonstrated a version of the company’s Audition audio editing software that highlighted a feature allowing users to edit audio as if it were text. Dubbed VoCo, this new software and its related algorithms gave anyone the ability to create speech from specific individuals that did not exist, virtually putting words into the mouths of others if they had a relatively small sample size of existing speech audio.

Today, audio deepfakes are nearly as prevalent as their video and photo counterparts. Though VoCo never saw release (as it was meant to be a research prototype), off-the-shelf audio deepfaking algorithms exist in the wild, letting anyone with even a minute amount of media editing savvy to create speeches from individuals that never existed.

How Widespread Are Audio Deepfakes?

Video deepfakes are convincing enough to the point where millions have had trouble determining doctored videos from real clips. For these videos to actually work, the sound coming from the speaker’s lips need to be convincing as well. Thus, those creating video deepfakes have vastly improved the quality and accuracy of their audio components in the last several years.

Though not all deepfakes are bad, the ones used to spread misinformation are disruptive and destructive enough to the point where they can have real, tangible, and negative impacts on their viewers. Recently, a deepfake video of Ukrainian President Volodymyr Zelenskyy asking soldiers to surrender to Russia nearly fooled people into believing it was, in fact, the real leader of Ukraine. The video component of Zelenskyy was believable enough, but the audio deepfake using Zelenskyy’s voice made it even more convincing. Had the fake Zelenskyy not sounded like the real one, the deepfaked video would have had less of an impact.

How Does Reality Defender Detect Audio Deepfakes?

To separate real audio from doctored clips, we convert raw audio signals into image representations called spectrograms. These images are then fed into Reality Defender’s in-house neural network model, which studies correlations across time and frequency domains to produce advanced features that facilitate the real or fake classification. To achieve robust performance across diverse data samples, our audio deepfake detection models employ a variety of data augmentation strategies.

As with other deepfakes and deepfake detection methods, how we detect audio deepfakes is not static and always improving. New deepfaking methods are discovered, used, and abused every day, and we employ these and other (sometimes unused) methods in our detection processing to ensure it is not only updated, but ahead of the curve.

Start Detecting Audio Deepfakes Today

Make sure the audio on your platform or service is valid with Reality Defender’s best-in-class audio deepfake detection. Register with us today and detect audio deepfakes before they become your problem.

Detect Deepfakes