Ensuring Fair and Accurate Deepfake Detection: Mitigating Bias in AI Models

Reality Defender

Core AI Team

One of the biggest concerns among AI developers and users is the inherent bias in datasets used to train AI models, and the risk of AI models mirroring these biases. Reality Defender’s AI and Data Engineering teams regularly test our deepfake detection models for evidence of bias, adjusting our datasets and the processes in which the models learn to compensate for dataset trends and ensure balanced results.

An AI model's performance is considered biased if there are certain subgroups of data that show accuracy rates which diverge significantly from the total average accuracy for the dataset. This kind of bias in AI can have serious consequences, perpetuating discrimination and leading to unfair outcomes in areas like loan approvals, job applications, or even criminal justice.

In Reality Defender’s case, bias could negatively affect our models’ ability to distinguish between authentic and generative deepfake content. This is why we regularly analyze our models to ensure that even if any of the datasets used show bias, the learning algorithms can compensate and mitigate its effect on our detection models.

How Reality Defender Mitigates Bias in Deepfake Detection Models

Bias within AI models emerges in one of two ways. In the first scenario, the training dataset is not sufficiently diverse, resulting in some subgroups being underrepresented. Standard training algorithms optimize for the overall average performance, so performance on small underrepresented subgroups can turn out to be substandard, affected by the asymmetry in data, even though the overall average performance within the dataset is high.

In the second scenario, biased model performance can be induced by the presence of spurious correlations, or relationships in which two or more features of the data are associated but not causally related. In this case, a model might base its decision on the presence of the spuriously correlated feature instead of the desired target feature, resulting in a mislabeling of data as authentic or fake. For example, a model trained to classify types of animals might use the background of the image instead of features of the animals. As a result, the model would show a biased performance against animals with uncommon background scenes.

At Reality Defender, we work to mitigate these sources of bias through careful curation of our in-house datasets to ensure our deepfake detection models train on the most balanced, highest quality data possible. After training, we measure and test the performance of our models across as many subgroups as possible defined with labels for attributes such as accent, skin tone, age, and facial pose.

Uncovering and Addressing Sources of Bias in AI Models

We also use state-of-the-art bias discovery and mitigation methods to mitigate AI model bias. These include techniques involving unsupervised learning in data embedding spaces and more robust training algorithms. By proactively addressing bias, balancing our datasets in-house, and investing in newest testing techniques, we will continue to ensure that Reality Defender's AI solutions deliver fair and accurate deepfake detection results for everyone.

Insights