A million-row limit on Microsoft’s Excel spreadsheet software may have led to Public Health England misplacing nearly 16,000 Covid test results, it is understood.
The data error, which led to 15,841 positive tests being left off the official daily figures, means than 50,000 potentially infectious people may have been missed by contact tracers and not told to self-isolate.
PHE was responsible for collating the test results from public and private labs, and publishing the daily updates on case count and tests performed.
But the rapid development of the testing programme has meant that much of the work is still done manually, with individual labs sending PHE spreadsheets containing their results. Although the system has improved from the early days of the pandemic, when some of the work was performed with phone calls, pens and paper, it is still far from automated.
In this case, the Guardian understands, one lab had sent its daily test report to PHE in the form of a CSV file – the simplest possible database format, just a list of values separated by commas. That report was then loaded into Microsoft Excel, and the new tests at the bottom were added to the main database.
But while CSV files can be any size, Microsoft Excel files can only be 1,048,576 rows long – or, in older versions which PHE may have still been using, a mere 65,536. When a CSV file longer than that is opened, the bottom rows get cut off and are no longer displayed. That means that, once the lab had performed more than a million tests, it was only a matter of time before its reports failed to be read by PHE.
Microsoft’s spreadsheet software is one of the world’s most popular business tools, but it is regularly implicated in errors which can be costly, or even dangerous, because of the ease with which it can be used in situations it was not designed for.
In 2013, an Excel error at JPMorgan masked the loss of almost $6bn (GBP4.6bn), after a cell mistakenly divided by the sum of two interest rates, rather than the average. The news led James Kwak, a professor of law at the University of Connecticut, to warn that Excel is “incredibly fragile”.
“There is no way to trace where your data comes from, there’s no audit trail (so you can overtype numbers and not know it), and there’s no easy way to test spreadsheets, for starters. The biggest problem is that anyone can create Excel spreadsheets – badly. Because it’s so easy to use, the creation of even important spreadsheets is not restricted to people who understand programming and do it in a methodical, well-documented way,” Kwak wrote.
Errors from the spreadsheet software have even changed the very foundations of human genetics. The names of 27 genes have been changed over the past year by the Human Gene Nomenclature Committee, after Microsoft’s program continually misformatted them. The genes SEPT1 and MARCH1, for instance, have been changed to SEPTIN1 and MARCHF1 after they were repeatedly turned into dates, while symbols that were common words have been altered so that grammar tools didn’t autocorrect them: WARS is now WARS1, for instance.