Unmasking Intersectional Bias in Medical AI: Beyond Data Imbalance in Fetal Ultrasound
Explore a novel framework to detect and disentangle complex biases in medical AI, particularly in fetal ultrasound for weight estimation. Learn how acquisition factors like pixel spacing impact performance, even more than demographic representation.
Beyond Simple Data Imbalance in Medical AI
The conversation around bias in Artificial Intelligence (AI) often centers on representation imbalance within datasets: if a particular demographic group is underrepresented in training data, the AI model may underperform when encountering that group in the real world. While this perspective is undeniably crucial, it doesn't fully capture the complex ways bias can manifest, especially in image-based medical applications like fetal ultrasound. Here, predictive accuracy isn't solely dependent on who is in the dataset, but also on the quality of the images themselves. These quality variations can unknowingly introduce deep-seated biases, complicating accurate diagnosis and treatment.
Ultrasound image quality is influenced by a multitude of factors, ranging from the specific acquisition conditions and the expertise of the sonographer, to patient-dependent variables such as maternal body mass index (BMI). These factors can, in turn, systematically correlate with sensitive demographic features, creating a web of "intersectional bias." Such intertwined influences can obscure the true causes of performance disparities, leading to flawed conclusions if bias detection focuses only on demographic representation. For instance, in fetal weight estimation—a critical task for monitoring fetal health—inaccurate predictions due to overlooked biases can have serious clinical consequences.
The Nuance of Bias: Acquisition, Clinical, and Demographic Factors
Medical imaging, particularly ultrasound, presents unique challenges for AI fairness. Unlike other modalities, the quality of an ultrasound image is highly dynamic. A sonographer’s skill, the type of equipment used, and the established workflow all contribute to systematic variability in image quality. Furthermore, maternal BMI directly impacts acoustic attenuation, affecting how clearly fetal structures can be visualized. When these acquisition-related factors are demographically structured – for example, certain demographic groups may have higher average BMI or receive scans from institutions with varying equipment – performance disparities can arise irrespective of how well those groups are represented in the dataset.
This means that simply balancing the demographic composition of a training dataset might not be enough to resolve inherent biases. Instead, a deeper understanding of how acquisition, clinical, and demographic factors interact is required. Addressing these complex interactions is essential for building truly robust and equitable AI systems in healthcare, which is why solution providers like ARSA Technology leverage advanced AI Video Analytics capabilities to process and interpret visual data with high precision, ensuring that such nuances are accounted for in real-world deployments.
A Structured Framework for Detecting Intersectional Bias
To effectively explore and disentangle these complex biases, researchers at the Technical University of Denmark and Rigshospitalet proposed a structured framework outlined in their paper, "A Framework for Exploring and Disentangling Intersectional Bias: A Case Study in Fetal Ultrasound," published on arXiv here. This innovative framework integrates three key components designed to go beyond surface-level analysis:
First, Unsupervised Slice Discovery identifies latent, or hidden, subgroups within the dataset that exhibit distinct error profiles. This is achieved by analyzing the internal "embeddings" (numerical representations) generated by the AI model. By clustering these embeddings, the framework can automatically surface underperforming groups without prior assumptions about what those groups might be, thereby enabling hypothesis generation for deeper investigation.
Second, Factor-wise Analysis systematically quantifies performance variations across individual, predefined factors. These factors can include clinical aspects (like gestational age or parity), acquisition parameters (such as ultrasound device type or pixel spacing), and traditional demographic features (like ethnicity or maternal age/BMI). By analyzing each factor independently, researchers can pinpoint which specific variables correlate most strongly with performance differences.
Finally, Intersectional Bias Analysis is the critical step where the joint effects of multiple variables are examined. This stage aims to disentangle how various clinical and acquisition-related factors interact with demographic attributes to influence model behavior. It helps identify primary confounding drivers – factors that might appear to be a cause of bias but are actually correlated with a deeper, underlying issue. Understanding these intertwined relationships is paramount for developing Custom AI Solutions that are both accurate and fair, addressing the actual roots of bias rather than just its symptoms.
Case Study: Fetal Weight Estimation and the Role of Pixel Spacing
The research applied this framework to a large-scale case study involving over 94,000 fetal ultrasound images used for fetal weight estimation. The study evaluated bias in both a state-of-the-art deep learning (DL) model and the clinical standard Hadlock formula, a regression equation that uses biometric measurements from ultrasound. The reference fetal weight was determined from birth weight using a standard growth curve.
The analysis revealed a significant and consistent driver of performance variation: pixel spacing (PS). Pixel spacing refers to the physical distance between the centers of adjacent pixels in an image, essentially indicating the resolution or detail captured. Higher pixel spacing generally means lower resolution, as the image data points are spread further apart. The study found that higher PS was consistently associated with substantial improvements in predictive accuracy—up to 24% in selected subgroups for both the DL model and the Hadlock formula.
This finding carries a considerable risk of confounding, as pixel spacing is often adapted in clinical practice in cases of high maternal BMI (where tissue attenuation makes high-resolution imaging challenging) or low gestational age (GA). The intersectional analysis proved invaluable here, demonstrating that while part of the PS-associated signal could be explained by gestational age, the improvements related to pixel spacing persisted across different BMI strata. This suggests that acquisition resolution plays an independent role, with effects that are dependent on gestational age but less so on BMI. Such detailed insights into complex data interactions are a hallmark of organizations like ARSA Technology, which has been experienced since 2018 in developing robust, real-world AI systems.
Implications for Ethical AI Development and Clinical Practice
The findings of this research have profound implications for the development and deployment of medical AI. They underscore that achieving fairness and robustness in AI systems, especially in image-based diagnostics, often requires more than just refining algorithms or balancing training data. Instead, a crucial focus must shift towards protocol optimization and acquisition-aware evaluation. Understanding how factors like pixel spacing, influenced by clinical protocols and patient conditions, directly impact AI performance is key to developing more equitable and reliable tools.
For healthcare providers and technology developers alike, this means:
- Rethinking data collection: Implementing standardized acquisition protocols and carefully documenting metadata related to acquisition conditions.
- Holistic bias analysis: Employing comprehensive frameworks that consider demographic, clinical, and acquisition factors to identify and disentangle complex biases.
- Informed deployment: Recognizing that AI models might perform differently based on the characteristics of the acquired images, necessitating adaptive strategies.
By systematically addressing these often-overlooked sources of bias, we can build medical AI tools that are not only highly accurate but also equitable, ensuring that technological advancements benefit all patients consistently. For example, AI-powered health screening solutions such as the Self-Check Health Kiosk must incorporate an understanding of these multi-faceted influences to ensure accurate and reliable health assessments for diverse user populations. This ensures trust, supports regulatory compliance, and ultimately drives better patient outcomes.
To explore how advanced AI and IoT solutions can transform your operations with precision and fairness, we invite you to contact ARSA for a free consultation. Our team is ready to discuss your unique challenges and engineer intelligent solutions tailored to your needs.
Source: Elgebaly, A., Fournel, J., Jørgensen, B. L. J., Mikolaj, K., Christensen, A., Tolsgaard, M., Ladefoged, C., & Feragen, A. (2026). A Framework for Exploring and Disentangling Intersectional Bias: A Case Study in Fetal Ultrasound. arXiv. https://arxiv.org/abs/2605.02942