Statistics and Covid-19

As enthusiastic statisticians, we live and breathe statistics. Measuring the impact of Covid-19 is essential for monitoring it’s development to implement precautionary measures to reduce the negative impacts of such a deadly virus. However, governments across the world are using various types of measurements to determine the virus’ true impact. The range of reporting can lead to unfair comparisons of neighbour countries, which can, and have been used to severely mislead the public. Official reports are also not mapping the full story, from using outdated data sources to using partial statistics to exaggerate and scaremonger. Therefore it is essential to trust the statisticians with what they know best.

The importance of like for like comparisons

For the global tracking of any data, constant methodologies are fundamental for accurate reporting. Unfortunately, due to several countries having disagreements on which is the best way to track Covid-19 data, this leaves the global data with many holes in its reliability. If all countries are measuring the impacts in different ways, how can we truly trust the overall legitimacy of the global data? Take Belgium for example, they have the highest number of Covid-19 deaths per capita in Europe, but taking into account their different approach to measuring the outbreak to other European countries, their statistics are potentially misleading. They have taken a wholeheartedly transparent approach, counting those suspected of being linked with Covid-19 as part of the deaths caused by Covid-19 irrespective of whether they were tested. It is therefore impossible to begin to compare the statistics gathered by Belgium to countries like the UK, who only recently started investigating the number of deaths in care homes - in contrast to Belgium who have included care home deaths related to Covid-19 in their data. This will have the effect of overpredicting the effect in Belgium against underpredicting the impact of Covid-19 in the UK. When you fail to compare like for like when investigating statistics the overall picture becomes very misleading. 

Up to date data matters!

Furthermore, misleading data from reliable sources can create problems concerning the integrity of Covid-19 statistics. A recent study found high disproportionality of BAME individuals in the Covid-19 statistics. However, the study noted that while 14% of the UK population are BAME, 34% of critically ill patients belong to BAME groups, leading to the conclusion BAME groups are over represented. This data was from the 2011 census. It is clear that using census data from nine years ago seriously hinders the reliability of the disproportionality of the current situation when investigating Covid-19 impacts on BAME individuals. The population demographics have clearly changed over the years of immense diversification of UK cities. This can be seen as BAME accounted for 6.8% of the UK population in the 2001 census, which grew to 13% in 2011 - and in recent years diversity has been on the increase in UK cities. Furthermore, due to the outbreak in the UK originating from London, where a majority of BAME individuals are situated, this can hinder the overall results when looking into the overall data for the UK by disproportionately over representing BAME individuals. This should not however stand in the way of a study being constructed to determine how to protect individuals that may be at a higher risk of being impacted by Covid-19. However, up to date data must be used to provide the most accurate statistics in order to combat underlying problems that are causing disproportionality, without misleading the public to sell media.

Does correlation equal causation?

A recent article by New Scientist highlighted a correlation between air pollution and being impacted by Covid-19. As statisticians it is essential that we do not mistake correlation with causation - just because two data sets correlate, does not mean that they impact each other. For example, a large proportion of early cases lived near a London Underground station, it would be reaching to assume that because they lived near a station that this would cause them to be impacted by Covid-19. Therefore it is essential to undergo multivariate analysis to understand the full dimensions of the virus, without jumping to rash conclusions about sensitive issues.