The genetic sequences, medical scans, and lifestyle records of half a million British volunteers spent days listed for sale on Alibaba before anyone at UK Biobank noticed.
Three academic institutions, since banned from the platform, had quietly walked the data out through a research system that was supposed to keep it under lock and key.
At least one of the three Alibaba listings appeared to contain the full dataset covering every one of the 500,000 participants who handed over their blood, their DNA, and decades of personal health information on the understanding it would be used for medical research.
The UK government confirmed the breach on Thursday. Technology minister Ian Murray told the House of Commons that Biobank had flagged the incident on Monday, and that the Chinese government and Alibaba had cooperated to pull the listings down before any purchases went through. Murray thanked Beijing directly for its “speed and seriousness” in taking down the data, a sentence that carries some weight given the three research institutions identified as the source are Chinese, though officials have declined to draw conclusions about intent.
Professor Rory Collins, Biobank’s chief executive and principal investigator, issued a statement saying the listings “were swiftly removed before any purchases were made.” He apologized to participants and confirmed that access to the research platform had been suspended while the organization installs file size limits designed to stop researchers from walking off with bulk datasets.
An automated checking system to vet outgoing files is not expected to be ready until late 2026.
The sales listing is not the scandal. The scandal is what the sales listing reveals about how often Biobank’s data has already been exposed and where it now sits.
Prof Luc Rocher of the Oxford Internet Institute has been tracking the problem and maintains a public record of known incidents. By his count, the Alibaba posting is “the 198th known exposure of UK Biobank data since last summer.” Rocher added that the data “is not just available for sale, it also remains available online for anyone to download today.” Researchers have repeatedly uploaded the dataset to code-sharing platforms by accident, and copies have since been replicated across the web. Taking down one Alibaba listing does nothing about the other 197.
Biobank’s response to this pattern has been to emphasize that the data is “de-identified” and that no participant has been knowingly re-identified. The reassurance rests on a technical claim that does not survive contact with the evidence.