Connect with us


We must tackle bias in AI (with a data-centric approach)

Avatar photo



Bias in AI is arguably the biggest challenge for innovators. Ulrik Stig Hansen and Eric Landau co-founders of Encord, share their insights with Health Tech World…

Until recently, the machine learning engineers and academic researchers who built medical AI had one main focus–getting the technology to work.

Ulrik Stig Hansen

Now, however, algorithms are no longer just academic puzzles: clinicians can apply and realise the value of medical AI in multiple use cases, and the results have real-world implications for patients. As such, there’s an urgent need to answer the questions which those first generation technologists couldn’t.

Specifically, How well is this technology going to work and who is it going to work for?

In the past few years, research has shown that medical AI is working better for some people than others. Algorithmic bias is disproportionately harming minority patients– those same patients who have historically struggled to obtain equitable access to quality healthcare.


Bias in AI – taking a data-centric approach

Some researchers are attempting to take a model-centric approach to solve the problem of algorithm bias (e.g. introducing algorithmic fairness methods), but a data-centric approach can help practitioners stop racial and gender bias by ensuring the models are trained on data that is diverse and representative. 

Machine learning engineers must look at “quality of training”

An AI system’s most relevant component is the data it’s trained on– not the model or set of models that it uses. If a model isn’t predicting fairly or accurately, machine learning engineers should first look to improve the quality of their training data rather than to improve the model’s algorithms. 

When it comes to algorithmic bias, the problem often stems from biased or unrepresentative training data. Models learn to make predictions after training and retraining on a variety of data.

If this data isn’t representative of the patient population that the medical AI is going to serve, then the AI will make mistakes when it’s put into practice and encounters never-before-seen cases. 

For instance, if medical AI is going to be deployed in a hospital that predominantly serves patients from minority communities, then the model needs to be trained on a vast amount of data collected from patients of similar demographics. 

Then, it must be validated with similar but never-before-seen data to ensure that it will perform as expected when put to work in the real world. For a model to deliver real business value and value for patients and clinicians, it needs to be localised by specific patient populations. 

Another benefit of a data-centric approach is that it can help keep implicit biases in check during model development.

For instance, if a non diverse machine learning team accidentally induced a medical AI model with subconscious bias, training and validating on a diverse dataset would provide a safeguard against these biases.

The model would show lower than expected performance on these populations, allowing machine learning engineers to catch these problems before the model is deployed in real-world scenarios. 

Obtaining data from diverse sources

For a data-centric approach to work, however, practitioners have to think more carefully about where and how they source medical data.

After all, bias in medical data existed long before machine learning. Historically, racist and sexist beliefs have resulted in clinical research being performed largely on white males with the results then generalised to the rest of the population.

Even now, an over-reliance on sourcing data from well-off research institutions can result in unrepresentative datasets. 

These institutions often lead the way in bleeding-edge research, but they generally cater to rich and predominantly white patients. As a result, racial bias begins with the selection of patients used in the research study and extends to the models trained on the data resulting from that study.  

When building medical AI, machine learning teams need to widen their nets for sourcing data. They should forge new, creative partnerships, such as working with community health centres or walk-in clinics in urban areas, to obtain patient data that’s more reflective of the general population as opposed to relying on hospitals and research facilities associated with top-tier academic institutions.

Balancing regulation with evolving technologies

While some experts argue for the creation of regulatory frameworks as a means of preventing the discriminatory outcomes that result from biased datasets, these arguments tend to ignore the challenges of how such frameworks would interact with the reality of producing medical AI. 

Medical AI isn’t stagnant

Unlike other medical products, such as drugs and vaccinations, medical AI isn’t stagnant.

After passing through a regulatory framework, the models will continue to learn from their new environment and adjust their predictions accordingly.

As they run “in the wild,” they need to be continuously validated to account for environmental and demographic changes. For instance, if the population of a hospital using an AI system begins to change, then the model is at risk of making mistakes and misdiagnosing patients. It needs to be immediately calibrated to the data that it is now seeing.

In such cases, going through the entire regulatory framework again would inhibit iteration and work against improving the AI’s performance in practice. 

Diversity in STEM

Of course, the best way to tackle bias is to increase diversity within the medical and STEM professions. With greater diversity, teams can build more inclusive approaches to research and help reduce blind spots during data collection and AI development.

As we work towards that goal, however, shifting to a data-centric approach for building medical AI can help us cure the source of algorithm bias rather than merely treat its symptoms. First and foremost, that shift requires thinking carefully about how we source and manage training data.

Continue Reading


  1. Pingback: 1907 announces ‘22 Roadmap and launch of Atala | Health Tech World

  2. Pingback: "Critical development" for dermatology as lab-grown skin arrives

Leave a Reply

Your email address will not be published. Required fields are marked *

Trending stories