Data from UK BioBank leaked dozens of times, investigation reveals

By Published On: March 16, 2026Last Updated: April 15, 2026
Data from UK BioBank leaked dozens of times, investigation reveals

UK Biobank data was exposed online dozens of times, a Guardian investigation has found, raising questions over how patient records were safeguarded.

The investigation found that files from UK Biobank, which holds the medical records of 500,000 British volunteers, appear to have been posted online by researchers who were given access to the confidential data.

Founded in 2003 by the Department of Health and medical research charities, UK Biobank holds genome sequences, scans, blood samples and lifestyle information for 500,000 volunteers.

It is one of the world’s most comprehensive stores of health information and is credited with helping to drive breakthroughs in cancer, dementia and diabetes research.

But scientists approved to access Biobank’s sensitive data appear at times to have been cavalier about its security.

The files, which seem to have been inadvertently posted online by researchers using the data, do not include names or addresses, but they may still pose privacy concerns.

One dataset found by the Guardian contained millions of hospital diagnoses and associated dates for more than 400,000 participants.

With the consent of a Biobank volunteer, the Guardian was able to pinpoint what appeared to be extensive hospital diagnosis records for the volunteer, using only their month and year of birth and details of a major surgery they had undergone.

One data expert said the scale and persistence of the problem was “shocking” at a time when AI and social media were making it ever easier to cross-reference information online.

UK Biobank rejected the concerns, saying that no identifying data, such as names and addresses, were provided to researchers.

In a statement, Prof Sir Rory Collins, the chief executive of UK Biobank, said: “We have never seen any evidence of any UK Biobank participant being re-identified by others.”

Last month, the government extended Biobank’s access to volunteers’ GP records.

Scientists at universities and private companies across the world apply for access and, until late 2024, were free to download data directly on to their own computer systems.

Before this point, data had been inadvertently published online and Biobank appears to still be grappling with the problem.

The issue emerged because journals and funders increasingly require researchers to publish the code they have used to analyse large datasets.

When intending to upload code, some researchers have also accidentally published partial or entire Biobank datasets to GitHub, a popular online code-sharing platform.

UK Biobank prohibits researchers from sharing data outside their systems and says it has introduced further training for all researchers.

In the past year, the data leaks appear to have become a more urgent concern for UK Biobank. Between July and December 2025, it issued 80 legal notices to GitHub, which has complied with requests to remove data from the internet. Yet much still remains available.

Some of the data files contain just patient IDs, or test results for small numbers, while others are more extensive.

One dataset found online by the Guardian in January contained hospital diagnoses and associated diagnosis dates for about 413,000 participants, along with their sex and month and year of birth.

A data expert, who reviewed the file, said: “It sent shivers down my spine to even open. I deleted the file immediately.

“It was very detailed and felt like a gross invasion of privacy even to glance at.”

To test the risk of re-identification, the Guardian approached several Biobank volunteers, two of whom had undergone medical procedures in the timeframe within the data and agreed to share these details with an external data scientist.

When wellness meets music
The Clementine Churchill Hospital first private hospital in the UK to install da Vinci 5