Enhanced Audio Detection, Now on Reality Defender

Scott Steinhardt

Enhanced Audio Detection, Now on Reality Defender

Reality Defender's audio detection platform is used by entities around the world to detect voice fraud in real time and thwart state-sponsored political disinformation. Since the launch of our voice deepfake detection models, clients have always received a simple score indicating the likelihood of AI generation or manipulation in each file they upload. This score — shown as a percentage from 1-99% (never 0 or 100, as we do not have the ground truth) — provides actionable data allowing clients and their own platforms and systems to take further action.

In recent months, as we improved our detection capabilities across all other media types and provided more information in each and every scan, our team increased the data and information provided to users post-scan. This meant showing what parts of an image were considered the most by our model, or highlighting and scanning every face in every scene of a video. This goes a step beyond our original answering of the question "is this likely a deepfake?" and changes that question to "is this likely a deepfake, and if so, what helped Reality Defender come to that conclusion?"

Today, we're proud to launch our enhanced Audio Deepfake Detection features, which take this approach and bring it into the world of deepfake audio detection. Now that deepfake voice attacks are becoming all too common, more info, more data, and greater robustness for detection are all absolutely essential in the fight against weaponized generative AI.

Audio Timelines and Preprocessing

Instead of scanning an audio file and providing a blanket score for the entire file, audio files are now visualized on a timeline (displayed as a waveform), with every six seconds of audio detected individually for deepfakes. This allows users to have not only better accuracy for each scan, but see parts of a file that are likely fake and/or likely real within the same file.

For instance, if a file has a deepfaked voice at the beginning and end but not in the middle, our new system will highlight exactly where the deepfaked voice likely is or is not. The accuracy is greater than that of our previous hyper-accurate detection models, while also providing far more information on each detected file.

Files also now proceed through a new processing pipeline prior to detection, focusing our detection models only on the parts of a file that contain voice. This allows for even greater accuracy and robustness, while allowing users to focus on the validity (or lack thereof) of the voices being detected.

Enhanced Audio Deepfake Detection is now available to all users via the Reality Defender web application and API, as well as to clients using real-time deepfake call detection. Whether you need to detect one audio file at a time or an endless stream of content, this update helps catch deepfakes faster, with greater accuracy, and with pinpoint precision, empowering organizations to confidently identify and take action against damaging audio deepfakes.

‍

Insights