There are a million and one wrong ways to approach detecting deepfakes and AI-generated content.
Relying on provenance watermarking does not work, and may incur unintended damages in the process. The same goes for relying on humans to detect deepfakes, especially at a time when creation tools are advancing at a rapid rate and churning out increasingly more believable content. (Even the experts on the Reality Defender team have an incredibly difficult time determining deepfakes simply by looking at them, unassisted by our tools.)
Several years ago, when Reality Defender started as a small non-profit, we took a flawed approach to deepfake detection ourselves. Our founders created a single detection model, ran media through them (including what was then considered state-of-the-art deepfakes, but are now primitive by comparison), and generated a result from these tests. This made sense at a time when deepfakes were but a mere technological marvel that took clusters of computers to generate and worked perfectly in a lab setting. In the real world — one where this technology’s capabilities grew well beyond the scope of Moore’s Law — it did absolutely nothing.
The Kitchen Sink Approach
Our founders quickly realized that as more technologies and AI models appeared to create deepfakes and AI-generated content, a one-and-done approach to detection would not work in the real world. This is why Reality Defender created the “ensemble of models'' approach we use to this day.
Reality Defender uses proprietary detection models to look at an individual video, audio, or image file from thousands of different approaches all at once, and all in milliseconds. Instead of using one model, our platform uses many in concert with each other, all looking for different details in each file. One model may find nothing, while others will find something and flag it as suspicious or likely manipulated. The models themselves are made of different models, resulting in the most comprehensive and robust system of its kind with a wealth of data to help clients make judgments based on a ton of results per scan.
For instance, there are tens of thousands of models used to generate voice deepfakes, all using a couple of similar technologies (albeit with different approaches). Reality Defender’s models will look not only at audio files for use of popular “text to speech” generation methods, but “voice synthesis” methods as well, along with the many other ways AI-generated voices can be generated by deepfake creation models. All of these detections happen concurrently the moment a file is uploaded to our platform via API or Web App, reporting all findings immediately after.
Simply put, one scan on Reality Defender is actually many scans all at once. If one model finds nothing and another does, it is likely manipulated. This helps look at a file from all angles, creating a “smothered” approach to detect in ways that single models will miss outright.
Modeling for the Future
Reality Defender does not simply detect deepfakes that exist. We also build detection models to scan for manipulated content that can exist in the future.
Instead of being reactionary and catching up to bad actors and new models creating deepfakes and AI-generated content, the Reality Defender team looks to where technology is going and could go. We create detection models and methods based on this research, which allows us to detect many new technologies and methods on day one. We’ve taken this approach for quite some time, and media created from many of the most popular creation tools around today (both open source and proprietary) have been easily detected by our platform when these tools launched.
Our detection models are constantly retrained, updated, iterated, and improved upon. Just as AI is advancing at an unprecedented clip, so too are our models and methods for detection. This ensures that everyone using Reality Defender’s platform has the most up-to-date detection capabilities at all times, always staying several steps ahead of bad actors with the best-in-class detection solutions on the market.