One In Five Genetics Studies Contain Mistakes Due To Microsoft Excel


Robin Andrews

Science & Policy Writer


Oh dear. majcot/Shutterstock

Much of the world relies on Microsoft Excel, and there are certainly good reasons for this. However, due to an overlooked auto-formatting feature of the venerable computer program, it appears this reliance may have caused a significant problem for scientists using it to compile data.

According to a review published in the journal Genome Biology, one in five genetics studies published in leading scientific journals, including Nature, Science, and PLOS ONE, contain errors brought about by the use of Excel. As it turns out, Excel automatically converts gene names, when represented as symbols, to calendar dates or other number formats, which radically changes the type of gene being described by the researchers.


The Washington Post uses the example of the gene Septin 2, which is shortened to SEPT2. Excel sees this as “September 2” and will convert this automatically to 2-Sep, storing it as the date, not the gene. There’s no way to easily undo this, as the Edit -> Undo function will delete everything from the cell the information has been entered into.

There is no formatting feature that effortlessly allows researchers to keep nomenclature like SEPT2 from morphing beyond recognition, and there is no method of permanently disabling the automatic date formatting within the program. Although you can force Excel to bend to your will, it seems that the authors of around 720 of 3,600 genetics papers – some of which were published way back in 2004 – failed to do this.

Of the 18 journals investigated by the team, all of whom were from the Baker IDI Heart & Diabetes Institute, some fared worse than others. Molecular Biology and Evolution, Bioinformatics, DNA Research, and Genome Biology and Evolution had the lowest proportion of error-imbued papers, whereas the journals Nucleic Acids Research, Genome Biology, Nature Genetics, Genome Research, Genes and Development, and Nature contained the highest proportion.

It’s not yet clear how widely the field of genetics has been affected by these mistakes. Although mislabeled genes may be correctly identified by those in the know, anyone who isn’t an expert on those particular gene sequences may interpret or refer to them incorrectly in follow-up studies, or even during the course of scientific journalism.


The authors of the review point out that Google Sheets doesn’t present the same irritating obstacle to genetics researchers, and in the meantime, anyone using Excel will have to remain particularly vigilant to its authoritarian formatting ways.


