Advancing AI Fairness: How Neural Networks Improve Skin Tone Estimation for Dermatoscopic Analysis

Discover how neural networks, supervised by colorimeter data, are revolutionizing skin tone estimation from dermatoscopic images to enhance AI fairness in dermatology and beyond.

Advancing AI Fairness: How Neural Networks Improve Skin Tone Estimation for Dermatoscopic Analysis

The Critical Need for Fair AI in Dermatology

      Artificial intelligence (AI) is rapidly transforming clinical decision support in dermatology, particularly for diagnosing skin conditions from dermatoscopic images. These high-magnification images of skin lesions help clinicians identify conditions like melanoma. However, numerous studies have highlighted a concerning issue: AI models often exhibit performance disparities across different skin tones. This means an AI might diagnose less accurately for individuals with darker skin tones compared to those with lighter ones, leading to potential inequities in healthcare. Addressing these biases is not just an ethical imperative but also crucial for developing trustworthy and universally effective AI systems.

      A major hurdle in auditing and mitigating this bias is the lack of reliable skin tone annotations in most publicly available dermatoscopy datasets. Without accurate labels indicating skin tone, it’s challenging to quantify how well an AI performs for various demographic groups. Previous attempts to estimate skin tone from images relied on simple methods, like averaging pixel colors in a representative skin area. However, the relationship between physical skin color and raw image pixel values is surprisingly complex and non-linear, making these pixel-based estimations unreliable. This unreliability can lead to flawed fairness audits, masking the true extent of performance gaps.

Bridging the Gap: Neural Networks and Objective Measurements

      To overcome the limitations of traditional pixel-based methods, recent research has introduced an innovative approach: training neural networks to directly learn the intricate mapping between dermatoscopic images and physical skin color. This research, detailed in "Colorimeter-Supervised Skin Tone Estimation from Dermatoscopic Images for Fairness Auditing," utilizes objective, real-world data to create a robust ground truth for AI training. Instead of guessing skin tone from pixels, these advanced AI systems are taught to understand the visual cues in an image by correlating them with precise, physically measured color values.

      The study operationalizes skin tone using two widely accepted dermatological scales:

  • Fitzpatrick (FP) Skin Type: A categorical scale (types I-VI) commonly used by clinicians, though its assessment can be subjective.


Individual Typology Angle (ITA): An objective, continuous measure of skin pigmentation derived from CIE Lab color space data, providing a more granular and consistent assessment.

      This groundbreaking method uses a colorimeter—a specialized device that accurately measures physical color—to obtain precise skin color values. These physical measurements, along with in-person Fitzpatrick type annotations from dermatologists, serve as the "ground truth" for training the neural networks. This supervised learning approach allows the AI to develop a nuanced understanding of skin tone, moving beyond the superficial pixel values to grasp the underlying physical properties.

How the AI Models Work

      The research involved training two distinct neural network models to predict skin tone:

  • Fitzpatrick Type Prediction: An ordinal regression model, designed to predict categories with a natural order, classified skin into one of the six Fitzpatrick types. While human annotations for Fitzpatrick type can vary, this model aims for consistency and objectivity.
  • Individual Typology Angle (ITA) Prediction: A color regression model, trained to predict continuous color values, accurately estimated the ITA. This is particularly innovative as it directly correlates image data with objective colorimeter readings.


      Both models underwent extensive pretraining using a diverse range of synthetic and real dermatoscopic and clinical images. This rigorous training regime ensures the models are highly generalized and capable of handling various image qualities and lighting conditions. Evaluation was performed on a carefully balanced cohort of 64 subjects, ensuring representation across all skin tones. The results demonstrated that ITA predictions, in particular, showed very high concordance with colorimeter-derived values, significantly outperforming previous pixel-averaging methods. This represents a substantial step forward in reliable, scalable skin tone estimation directly from images.

Key Findings and Their Impact

      The study's findings are pivotal for the future of AI in healthcare:

  • Superior ITA Estimation: The neural network-based ITA predictions achieved an Intraclass Correlation (ICC3) of 93.88% with colorimeter measurements. This is a considerable improvement over earlier computer vision techniques, which suffered from the unreliable relationship between image pixels and actual skin tone. This accuracy highlights the power of training AI with objective physical measurements.
  • Fitzpatrick Type Agreement: While Fitzpatrick type predictions showed agreement slightly lower than human annotators, achieving a linear-weighted Cohen’s kappa of 52.98% compared to crowdsourced annotations at 66.08%, they still offer a standardized, automated approach to a traditionally subjective classification.
  • Revealing Dataset Bias: Applying these new estimators to widely used public dermatoscopy benchmarks, specifically ISIC 2020 and MILK10k, revealed a severe underrepresentation of darker skin tones. Less than 1% of subjects in these datasets were estimated as Fitzpatrick types V and VI. This quantitative evidence underscores previous concerns about the lack of diversity in medical AI datasets, which directly contributes to performance disparities.


      The ability of neural networks to provide fast, scalable, and reliable skin tone estimation from dermatoscopic images opens new avenues for rigorous bias auditing. By accurately annotating large-scale public datasets, researchers can more effectively evaluate and improve the fairness of dermatoscopic classifiers. This approach sets a precedent for developing more equitable and effective AI systems, not just in dermatology but in any field where visual data needs sensitive and unbiased interpretation. For instance, in industrial settings, accurate AI Video Analytics can monitor safety compliance or detect defects, where ensuring fairness in detection across different object variations is equally critical.

Beyond Dermatology: Implications for Robust AI Development

      The methodology presented in this research extends beyond the confines of dermatological imaging. The core principle of grounding AI vision models with objective physical measurements is universally applicable to any industry requiring highly accurate and unbiased visual analysis. Whether it's for quality control in manufacturing, identifying security threats, or monitoring environmental changes, ensuring the AI's interpretations are tied to real-world physical properties can significantly enhance reliability and reduce inherent biases.

      For enterprises looking to implement sophisticated AI solutions, especially those demanding high accuracy, privacy-by-design, and fairness, ARSA Technology offers advanced AI Box Series devices that bring powerful computer vision and deep learning capabilities to the edge. This enables real-time processing and immediate insights, ensuring that critical decisions are made based on reliable, bias-audited data. Such solutions can be critical in industries like healthcare, manufacturing, and transportation, where the consequences of AI bias can be severe.

      This study was conducted by Marin Benčević et al. from the Josip Juraj Strossmayer University of Osijek, Croatia, and is available on arXiv (Source: arXiv:2602.10265).

      Discover how ARSA Technology leverages AI and IoT to build intelligent, fair, and impactful solutions for your industry. For a deeper discussion on implementing robust AI vision systems and ensuring fairness in your operations, contact ARSA for a free consultation.