Corona Virus and the labyrinth of statistics!





As we sit at home, relatively secure, but largely idle, we all have many questions. When will things stabilize so the lockdown is lifted? What is the probability of my contracting the infection, or worse, kicking the bucket at the scaffold of Covid-19?

A whole lot of statistics are being thrown at us from TV channels and the faculty, eggheads and scholars of the Whatsapp Open University for the Mentally-Negligible (WOUM). In the mirage of colourful graphs and Covid buzzwords, you can see that the anchor, or the imbecile forwarder, has basically very little understanding of what they parrot. Terms like flattening of the curve, mortality rate, fatality rate, recovery rate, incremental rate and so on.

Doctors don’t understand that much. It is not their job. For example the same information, that of number of Covid-19 cases in India going to 20000 in a month, was being put up last month in different manners, mostly without any reference to statistical significance. Near exponential growth, very low rate of growth, erratic growth and so on. You have a simple X-Y graph claiming that the spread in India was slower. True. But we have the same comparative data with scale Y in logarithmic scale and the presenter said that we were not that better off. Quite obviously, these graphs and numbers are not painting the full picture correctly.

Are you confused? I am. “Facts are stubborn things, but statistics are pliable”, said Mark Twain. It’s not my charge that Corona statistics are being manipulated. No. What I am saying is that statistics being put up are based on factual data from reliable sites but its processing and presentation are being done in a manner that is either too esoteric or simply stupid. A devilish Stalin could say that, “A single death is a tragedy; a million deaths is a statistic”. But for us, the lesser mortals in India, a better appreciation of the numbers would help us endure as the news from the world-over is increasingly very mournful and depressing.  


So I decided to decipher at least some of these mystery terms. Let me pick up a couple of aspects from the drivel of all this gobbledygook. Those aspects which I think could impact us directly and understanding them better may perhaps make us more relaxed or extra circumspect.

Disclaimers:

There are significant imponderables in any statistical study. In the Covid-19 data too there are many. All of them can have a very significant effect on the analysis but in this study, I have made a simple analysis assuming the data from a reliable source, https://ourworldindata.org/coronavirus to be fully significant. The correlation analysis which accounts for these imponderables can be a complex exercise not within the scope of my limited study.

To mention a few:

·     No one really knows the total number of people infected with COVID-19. The data of confirmed cases depends on the extent of testing as it is the infection status of those who were actually tested. Ideally, the analysis should collate the data with test data as well and inflate the number based on a statistically-derived multiplier. For example, a study in LA claims that the actual number of infected cases was in lacs and not thousands as the testing was simply inadequate. This also means that a large number of cases in India are going undetected. The number of infected cases may be actually much larger. At the same time, I would assume that the immunity of Indians against the disease is higher than the west and a large percentage actually got infected and recovered, without being counted.
·     One would think that, by and large, the number of deaths reported would be accurate as a death is not hidden easily. There may, however, be many deaths which were caused by Covid-19 but attribution of the cause of death was not Covid-19 due to lack of testing. The difference between reported confirmed deaths and total deaths can vary significantly in India, particularly in rural areas; many deaths which take place at homes may get misreported.
·  There are some variations in the data put up by various government agencies. It is difficult to judge as to which one is more accurate.
·      India is a large country with region-specific peculiarities and biases which is not captured in the data; it assumes that all regions in India would behave as one.
·        Mortality among the elderly is decidedly higher. But the data we have used does not have any age-related data matrix so the result may wrongly give more hope to the aged.

OK. So what is this flattening of the curve, bunged at us every day, ad nauseum? Flattening of the curve is being talked about in respect of the adequacy of the Health infrastructure to cope up with the Covid-19 cases and their requirements with passage of time. So we obviously have time on X axis. Whatever we have on the Y axis should not peak so much that the system is overwhelmed and collapses under the weight of sheer count of infected cases.



So let us examine the last curve. Based on simple statistical principles, this number shows a steep rise and the trendline would not flatten at all. But that is a clear misdeem. We have to consider known lags and imponderables. The raw data does not take into account the influence of lockdown and the fact that the outcome cases may lag behind the occurrence of fresh cases by 20 to 35 days; while confirmed infection may still not be one with symptoms of Covid-19, and may take some time to incubate and attack, there is a time period of recovery of approx. 15 to 20 days for an outcome after showing symptoms (fingers crossed but hoping that most of the outcomes would be recovery!). Today our addition of fresh cases is touching 2000 per day whereas that of outcomes is around 550 per day. As time passes and lockdown may keep the number of fresh cases regulated to not too much beyond 2000, the outcomes would surely increase beyond 1000 per day. The curve would, therefore, start flattening. While pure statistical trend may show upwards of 1.5 lac cases on 100th day, I would stick my neck out and project the number to be below one lac. Since the lockdown has given us time to regroup and organize, I would think that the Health infra should be good enough for these one lac patients, given that the imponderables are immaterial as only the tested patients would need hospitalization. Keep watching and hoping for the best!

Let us now talk of fatality, mortality or death rate. The WOUM professors and moronish students, and the main media, belt out mortality rates which vary from less than 1 to 20 percent; you are none the wiser and get confused in the various jargons used so casually. Most people, especially those above 60 like me, are looking to have an informed answer to this question, “If I get is infected, what is the probability that I would die?” Period. Not multi-coloured charts and graphs and inscrutable numbers.

There are many important terms in use, like “case fatality rate”, “crude mortality rate”, and “infection fatality rate” but unfortunately all these are bundled by these worthies as mortality or fatality rate.

All we want to know about is the risk of dying. The “case fatality rate”, or CFR is the one commonly used in media as the mortality rate.  The CFR has the number of people who have died in the numerator and the total number of confirmed cases in the denominator. So if today, approx. 700 people have died in India and the number of confirmed cases is 27000, the CFR would be 2.5 %. Now this appears to be rather high, isn’t it? Don’t worry. It relies on the number of confirmed cases whereas the actual number is likely to be much higher. At the same time, it counts the only the present number of deaths and a large number who are sick and may die soon are discounted. In my opinion, this so called mortality rate is meaningless information to someone trying to determine the risk factor in the present.

Then we have “crude mortality rate” or CMR, another simple indicator. It also measures the probability that any individual would die from the disease but calculated by dividing the number of deaths from the disease by the total population. So if today, approx. 700 people have died in India and our population is 12500 lacs, the CMR would be an insignificantly small number. This number may have some significance once the disease is eradicated (say Spanish Flu of 1918) but with the numbers mounting every day, it does not give any answer in respect of assessing the risk of death in a fast-changing situation like Covid-19.

The answer to this is best captured by what is called “infection fatality rate” or IFR. The IFR is the number of deaths from a disease divided by the actual number of cases. Now, the number of deaths today is 700 and we can inflate it to include those among the infected that may die in the next 15 days or so. Using the trendline of the curve of daily deaths, we can estimate the number of actual deaths at around 1600. Further assuming that the number of deaths in India are misreported only to the extent of approx. 25%, we can think of number of deaths at approx. 2000; this is the numerator number. The total number of cases of COVID-19, however, is woefully short of the number of confirmed cases but is an unknown. Competent researchers are putting out various data which project very high multipliers but let us not get lost into that. Assuming the actual cases to be only about 5 to 10 times the number of confirmed cases, say 7.5 times, we can fix the total number of cases at approx. 135000. The IFR then would be about 1%.

I would like to believe this number. So even if I contract the ruddy Corona virus, I have to be the only unlucky one among a hundred to die. Take solace from that buddies but take all precautions and stay at home. In any case, deal with life without a thought about death, remember Fiarq,

Maut ka bhi ilaaj ho shayad,
Zindagi  ka koi   ilaaj  nahin

(There may well be a cure for death but for this life, there is no remedy)

Adios, till I round up enough motivation to try to compare our data with that of USA and Italy or dissect the studies which show that our problems would be over by such and such date.


____

Comments

  1. Sir
    You are considering the IFR as the dependable parameter. I second you because the number of asymptomatic patients is too high to calculate CMR and CFR correctly.Few patients who are found positive do not present with the classical symptoms.They may present with viral enteritis and skip the specific COVID tests.Moreover the sensitivity and specificity of the tests should be kept in mind while doing tiring , lengthy and complicated calculations.Your post is very positive.

    ReplyDelete
  2. Thanks...let us see. My IFR is based on undisclosed cases being in the range of 5 to 10 times. Studies show that it is actually much higher. Hopi against hope....

    ReplyDelete
  3. Mani, the profession of guessing is hazardous. Intelligent guessing is no different, just better sounding. Remember we had tried to guess the expected figure for March 25th around March 15th, when it was still in the hundreds. A month down, we have crossed 30, 000 cases and 1000 deaths. I personally think it cannot be a statistical game. It is a medical crisis. Finally, its size will depend on our efficacy in dealing with it. The infrastructure in districts in the interiors, whether or not it reaches the poor (with poor body reserves, even if with immunity) in these districts, and access to doctors will decide the final size of the epidemic in India. For me, these are still early days, and I would just pray and keep mt fingers crossed.

    So far the best, short and sweet article I have read on COVID is by Karan Thapar a few days back. Please see it at https://www.hindustantimes.com/columns/covid-19-decoding-the-low-death-rate-in-india-opinion/story-KWY96OvN2aJvly1Vo4NMsN.html

    ReplyDelete

Post a Comment

Popular posts from this blog

High-Speed Talgo Trains in Uzbekistan Much faster than Vande Bharat!

So Balasore never happens again!

The Vande Vande Waltz