Corona Virus and the labyrinth of statistics!
As we sit at home, relatively secure, but largely idle, we all
have many questions. When will things stabilize so the lockdown is lifted? What
is the probability of my contracting the infection, or worse, kicking the
bucket at the scaffold of Covid-19?
A whole lot of statistics are being thrown at us from TV channels
and the faculty, eggheads and scholars of the Whatsapp Open University for the
Mentally-Negligible (WOUM). In the mirage of colourful graphs and Covid
buzzwords, you can see that the anchor, or the imbecile forwarder, has
basically very little understanding of what they parrot. Terms like flattening
of the curve, mortality rate, fatality rate, recovery rate, incremental rate
and so on.
Doctors don’t understand that much. It is not their job. For
example the same information, that of number of Covid-19 cases in India going
to 20000 in a month, was being put up last month in different manners, mostly
without any reference to statistical significance. Near exponential growth, very
low rate of growth, erratic growth and so on. You have a simple X-Y graph
claiming that the spread in India was slower. True. But we have the same comparative data
with scale Y in logarithmic scale and the presenter said that we were not that better
off. Quite obviously, these graphs and numbers are not painting the full
picture correctly. 
Are you confused? I am. “Facts are stubborn things, but statistics are
pliable”, said Mark Twain. It’s not my charge that
Corona statistics are being manipulated. No. What I am saying is that statistics
being put up are based on factual data from reliable sites but its processing and
presentation are being done in a manner that is either too esoteric or simply
stupid. A devilish Stalin could say that, “A
single death is a tragedy; a million deaths is a statistic”. But for us,
the lesser mortals in India, a better appreciation of the numbers would help us
endure as the news from the world-over is increasingly very mournful and
depressing.   
So I decided to decipher at least some of these mystery terms. Let me pick up a couple of aspects from the drivel of all this gobbledygook. Those aspects which I think could impact us directly and understanding them better may perhaps make us more relaxed or extra circumspect.
Disclaimers:
There
are significant imponderables in any statistical study. In the Covid-19 data
too there are many. All of them can have a very significant effect on the
analysis but in this study, I have made a simple analysis assuming the data
from a reliable source, https://ourworldindata.org/coronavirus
to
be fully significant. The correlation analysis which accounts for these
imponderables can be a complex exercise not within the scope of my limited study.
To mention a few:
·     No
one really knows the total number of people infected with COVID-19. The data of
confirmed cases depends on the extent of testing as it is the infection status
of those who were actually tested. Ideally, the analysis should collate the
data with test data as well and inflate the number based on a
statistically-derived multiplier. For example, a study in LA claims that the
actual number of infected cases was in lacs and not thousands as the testing
was simply inadequate. This also means that a large number of cases in India
are going undetected. The number of infected cases may be actually much larger.
At the same time, I would assume that the immunity of Indians against the
disease is higher than the west and a large percentage actually got infected
and recovered, without being counted.
· One would think that, by and large, the number of deaths reported would be accurate as a death is not hidden easily. There may, however, be many deaths which were caused by Covid-19 but attribution of the cause of death was not Covid-19 due to lack of testing. The difference between reported confirmed deaths and total deaths can vary significantly in India, particularly in rural areas; many deaths which take place at homes may get misreported.
· There are some variations in the data put up by various government agencies. It is difficult to judge as to which one is more accurate.
· India is a large country with region-specific peculiarities and biases which is not captured in the data; it assumes that all regions in India would behave as one.
· Mortality among the elderly is decidedly higher. But the data we have used does not have any age-related data matrix so the result may wrongly give more hope to the aged.
· One would think that, by and large, the number of deaths reported would be accurate as a death is not hidden easily. There may, however, be many deaths which were caused by Covid-19 but attribution of the cause of death was not Covid-19 due to lack of testing. The difference between reported confirmed deaths and total deaths can vary significantly in India, particularly in rural areas; many deaths which take place at homes may get misreported.
· There are some variations in the data put up by various government agencies. It is difficult to judge as to which one is more accurate.
· India is a large country with region-specific peculiarities and biases which is not captured in the data; it assumes that all regions in India would behave as one.
· Mortality among the elderly is decidedly higher. But the data we have used does not have any age-related data matrix so the result may wrongly give more hope to the aged.
OK. So what is this flattening of the curve, bunged at
us every day, ad nauseum? Flattening
of the curve is being talked about in respect of the adequacy of the Health
infrastructure to cope up with the Covid-19 cases and their requirements with
passage of time. So we obviously have time on X axis. Whatever we have on the Y
axis should not peak so much that the system is overwhelmed and collapses under
the weight of sheer count of infected cases.
So let us examine the last curve. Based on simple statistical
principles, this number shows a steep rise and the trendline would not flatten
at all. But that is a clear misdeem. We have to consider known lags and
imponderables. The raw data does not take into account the influence of
lockdown and the fact that the outcome cases may lag behind the occurrence of
fresh cases by 20 to 35 days; while confirmed infection may still not be one
with symptoms of Covid-19, and may take some time to incubate and attack, there
is a time period of recovery of approx. 15 to 20 days for an outcome after
showing symptoms (fingers crossed but hoping that most of the outcomes would be
recovery!). Today our addition of fresh cases is touching 2000 per day whereas
that of outcomes is around 550 per day. As time passes and lockdown may keep
the number of fresh cases regulated to not too much beyond 2000, the outcomes
would surely increase beyond 1000 per day. The curve would, therefore, start
flattening. While pure statistical trend may show upwards of 1.5 lac cases on
100th day, I would stick my neck out and project the number to be below
one lac. Since the lockdown has given us time to regroup and organize, I would
think that the Health infra should be good enough for these one lac patients,
given that the imponderables are immaterial as only the tested patients would
need hospitalization. Keep watching and hoping for the best! 
Let us now talk of fatality, mortality or death rate. The WOUM
professors and moronish students, and the main media, belt out mortality rates
which vary from less than 1 to 20 percent; you are none the wiser and get
confused in the various jargons used so casually. Most people, especially those
above 60 like me, are looking to have an informed answer to this question, “If I
get is infected, what is the probability that I would die?” Period. Not
multi-coloured charts and graphs and inscrutable numbers.
There are many important terms in use,
like “case fatality rate”, “crude mortality rate”, and “infection fatality
rate” but unfortunately all these are bundled by these worthies as mortality or
fatality rate.
All we want to know about is the risk
of dying. The “case fatality rate”, or CFR is the one commonly used in media as
the mortality rate.  The CFR has the
number of people who have died in the numerator and the total number of confirmed
cases in the denominator. So if today, approx. 700 people have died in India
and the number of confirmed cases is 27000, the CFR would be 2.5 %. Now this
appears to be rather high, isn’t it? Don’t worry. It relies on the number of confirmed cases whereas the actual number is
likely to be much higher. At the same time, it counts the only the present number
of deaths and a large number who are sick and may die soon are discounted. In
my opinion, this so called mortality rate is meaningless information to someone
trying to determine the risk factor in the present. 
Then we have “crude mortality rate” or
CMR, another simple indicator. It also measures the probability that any
individual would die from the disease but calculated by dividing the number of
deaths from the disease by the total
population. So if today, approx. 700 people have died in India and
our population is 12500 lacs, the CMR would be an insignificantly small number.
This number may have some significance once the disease is eradicated (say
Spanish Flu of 1918) but with the numbers mounting every day, it does not give
any answer in respect of assessing the risk of death in a fast-changing
situation like Covid-19.
The answer to this is best captured by
what is called “infection
fatality rate” or IFR. The IFR is the number of deaths from a
disease divided by the actual number
of cases. Now, the number
of deaths today is 700 and we can inflate it to include those among the
infected that may die in the next 15 days or so. Using the trendline of the curve
of daily deaths, we can estimate the number of actual deaths at around 1600.
Further assuming that the number of deaths in India are misreported only to the
extent of approx. 25%, we can think of number of deaths at approx. 2000; this
is the numerator number. The total
number of cases of COVID-19, however, is woefully short of the number of
confirmed cases but is an unknown. Competent researchers are putting out various
data which project very high multipliers but let us not get lost into that.
Assuming the actual cases to be only about 5 to 10 times the number of
confirmed cases, say 7.5 times, we can fix the total number
of cases at approx. 135000. The IFR then would be about 1%. 
I would like to believe this number.
So even if I contract the ruddy Corona virus, I have to be the only unlucky one
among a hundred to die. Take solace from that buddies but take all precautions
and stay at home. In any case, deal with life without a thought about death,
remember Fiarq,
Maut
ka bhi ilaaj ho shayad,
Zindagi  ka koi 
 ilaaj  nahin
(There
may well be a cure for death but for this life, there is no remedy)
Adios, till I round up enough
motivation to try to compare our data with that of USA and Italy or dissect the
studies which show that our problems would be over by such and such date. 
____



 
 
 
Sir
ReplyDeleteYou are considering the IFR as the dependable parameter. I second you because the number of asymptomatic patients is too high to calculate CMR and CFR correctly.Few patients who are found positive do not present with the classical symptoms.They may present with viral enteritis and skip the specific COVID tests.Moreover the sensitivity and specificity of the tests should be kept in mind while doing tiring , lengthy and complicated calculations.Your post is very positive.
Thanks...let us see. My IFR is based on undisclosed cases being in the range of 5 to 10 times. Studies show that it is actually much higher. Hopi against hope....
ReplyDeleteMani, the profession of guessing is hazardous. Intelligent guessing is no different, just better sounding. Remember we had tried to guess the expected figure for March 25th around March 15th, when it was still in the hundreds. A month down, we have crossed 30, 000 cases and 1000 deaths. I personally think it cannot be a statistical game. It is a medical crisis. Finally, its size will depend on our efficacy in dealing with it. The infrastructure in districts in the interiors, whether or not it reaches the poor (with poor body reserves, even if with immunity) in these districts, and access to doctors will decide the final size of the epidemic in India. For me, these are still early days, and I would just pray and keep mt fingers crossed.
ReplyDeleteSo far the best, short and sweet article I have read on COVID is by Karan Thapar a few days back. Please see it at https://www.hindustantimes.com/columns/covid-19-decoding-the-low-death-rate-in-india-opinion/story-KWY96OvN2aJvly1Vo4NMsN.html
Your words ring so true today...
Delete