Some online sleuthing has unearthed 13 genetic sequences of SARS-CoV-2, the virus that causes COVID-19, that were detected in the very early days of the outbreak in Wuhan but have since mysteriously disappeared from databases. While many questions remain, the freshly recovered sequences hold the potential to shed some light onto the initial spread of the virus in Wuhan back in late 2019.
The non-peer-reviewed report by Dr Jesse Bloom, a virologist at the Fred Hutchinson Cancer Research Center, was recently posted on the pre-print server bioRxiv. Some independent experts commenting on the paper have said it's “intriguing” while others say it borders on "supposition and conjecture." Either way, most agree the work should be taken with a pinch of salt until further research shows how solid the findings really are.
Bloom is the first name to appear on the much-discussed open letter in Science on the investigations of COVID-19. The letter essentially called for scientists to keep an open mind on the viruses’ origins, noting “theories of accidental release from a lab and zoonotic spillover both remain viable,” which has since helped to bolster the recent renewed interest in the “COVID lab leak theory.” Bloom’s new report doesn’t add anything to this debate, but it may indicate there may be more raw data on the early outbreak than previously thought.
Bloom’s investigation started after he noticed some SARS-CoV-2 sequences from early in the Wuhan epidemic had been deleted from the National Institutes of Health database. The sequences were reportedly collected by scientists at Wuhan University before March 30 2020 and uploaded onto the online database. However, according to the paper, he came across evidence that certain entries had been removed around June 2020.
Through further digging around online databases and scientific literature, Bloom even managed to find another study posted online in March 2020 by scientists at Renmin Hospital of Wuhan University that appears to contain some information from the missing database in the supplementary material. This, however, only contained information of specific mutations in the viruses, not the full sequence data. Together with these clues and further sleuthing around files left on Google Cloud, Bloom managed to recover 13 sequences that were removed last summer.
So, what can we make of this?
The recovered genetic sequences were obtained early on in the outbreak from Wuhan. However, they contain some mutations that separate them from previously known early sequences gathered from Wuhan’s Huanan Seafood Market, once proposed as the original site where the virus made a zoonotic jump between an animal to humans. Instead, these newly recovered sequences are more similar to SARS-CoV-2’s bat coronavirus relatives.
This suggests the virus was circulating in Wuhan before it was identified at the market. In other words, the Huanan Seafood Market is not “ground zero”. This is not exactly new information – many other researchers and the Chinese health authorities have also suggested this in the past – but this new report indicates that more raw data about the initial outbreak of SARS-CoV-2 could be out there, which could potentially reveal the origins of the outbreak.
"This study does not provide any additional strong evidence favoring either natural zoonosis or lab accident. Rather, it shows that there are additional sequences from relatively early in the outbreak that are still unknown, and in some cases have mutations that suggest they are probably evolutionarily older than the viruses from the Huanan Seafood Market," Bloom said in an email to CNN.
As for why the sequences were deleted, it remains unclear. In the paper, Bloom writes it “seems likely the sequences were deleted to obscure their existence.” However, other scientists reject this idea and say the report “veers into non-scientific areas such as cover-ups” when it discusses possible reasons for the deletions.
Commenting on the paper, Professor Andrew Preston, a Professor of Microbial Pathogenesis at the University of Bath said: “The language of the paper is unusual, its contains a significant degree of supposition and conjecture, cites blog posts and appears to be pointing towards a deliberate cover-up by Chinese authorities of early sequence data from Wuhan. However, this is an entirely subjective appraisal of the situation, which will be very difficult to confirm or disprove.”