Preprint: Common-Sense Bias Discovery and Mitigation for Classification Tasks

By Miao Zhang, Zee Fryer and Ben Colman

Published on arXiv

Summary

Machine learning model bias can arise from dataset composition: sensitive features correlated to the learning target disturb the model decision rule and lead to performance differences along the features. Existing de-biasing work captures prominent and delicate image features which are traceable in model latent space, like colors of digits or background of animals. However, using the latent space is not sufficient to understand all dataset feature correlations. In this work, we propose a framework to extract feature clusters in a dataset based on image descriptions, allowing us to capture both subtle and coarse features of the images. The feature co-occurrence pattern is formulated and correlation is measured, utilizing a human-in-the-loop for examination. The analyzed features and correlations are human-interpretable, so we name the method Common-Sense Bias Discovery (CSBD). Having exposed sensitive correlations in a dataset, we demonstrate that downstream model bias can be mitigated by adjusting image sampling weights, without requiring a sensitive group label supervision. Experiments show that our method discovers novel biases on multiple classification tasks for two benchmark image datasets, and the intervention outperforms state-of-the-art unsupervised bias mitigation methods.

Research

All Solutions

Reality Defender Launches Free Access to Deepfake Detection API

Reality Defender Launches Free Access to Deepfake Detection API

Reality Defender Wins “Most Innovative Startup” at RSA Conference Innovation Sandbox

Preprint: Common-Sense Bias Discovery and Mitigation for Classification Tasks

Published on arXiv

Summary

Read More of Our Peer-Reviewed Research, Published in Top Journals

PolyJuice Makes It Real: Black-Box, Universal Red Teaming for Synthetic Image Detectors

A Data-Driven Diffusion-based Approach for Audio Deepfake Explanations

X-Edit: Detecting and Localizing Edits in Images Altered by Text-Guided Diffusion Models