The year 2021 is almost done, which means that our new closest friend - COVID-19 - has already been with us for more than 2 years.

One year ago, our situation was quite different - we (at least many of us) were keeping our fingers crossed for the final stages of vaccine development. We believed that that was the only way of pacifying the pandemic. At that point, we were not perfectly sure how this would happen:

  • will the forthcoming vaccines reduce the infection rate (make pandemic spread slower)?
  • or maybe they will alleviate the symptoms and course of the disease (reduce the mortality rate and utilization of patient beds in specialized hospitals)?

Fast forward 12 months

A lot has happened since then. There are vaccines available (practically, not theoretically) on the market, the vaccination rates have reached 50%+ in many countries, but they've stopped rising - surprisingly many skeptics have no plans to get vaccinated (for many, better or worse expressed, reasons).

Nevertheless, the last twelve months have provided us with a LOT of data. And I can't help the impression that we don't use that data properly - at least not in global, everyday communication. We do report various coefficients (cases, deaths, tests), but DO NOT correlate them. That sounds like a waste (lack of direct cause-and-effect analysis) and ... is a bit worrisome (is there anything to ... hide?).

Flat metrics, nearly useless info; src:
Flat metrics, nearly useless info; src:
Flat metrics, nearly useless info; src:

What do I mean by "correlation"? Well, it's simple - two new cases are not necessarily alike, because:

  • the virus has new variants (Delta, Omicron, etc.) - with presumably different characteristics (severity, infection rate, mortality, resistance to vaccines)
  • each case means a person who was either vaccinated or not (prior to the infection) ...
  • ... and if (s)he was vaccinated, it was vaccine X, Y, or Z (one could be more effective than the other)
  • not even mentioning that a vaccinated person could take 1, 2, or 3 doses
  • and lives in particular geo, with a certain %-tage of the overall population's vaccination

If we knew more about each case (vaccination status, severity), we could approach scientifically such topics like:

  1. What is the general effectiveness of vaccination (how does it impact the numbers of infections? hospitalizations? fatalities?) in location population of XYZ
  2. Are there substantial, practical differences between (the different) vaccines?
  3. What's the actual (proven "in the trenches") impact of additional (vaccine) doses applied?
  4. Is the vaccination equally efficient against different virus variants (Delta, Omicron, etc.)? What are the quantified (e.g., as probability %-tages) differences?

Where's the data?

That is not rocket science; all we need is some more metadata for all the reported cases: severity of the case (registered, hospitalized, required the use of a respirator, fatality), was the person vaccinated or not (which vaccine, how many doses), what was the virus variant detected. This information should be available w/o additional fuss. There's no need to reveal any PII, aggregation on the geo level (voivodship/state) sounds good enough.

However ... I've looked for it, and there was no single source (raw or processed) with such information available freely. Maybe I was looking in all the wrong places. Perhaps I've given up too quickly (some important public datasets are surprisingly poorly documented). If you're aware of any credible sources of such information, please don't hesitate to let me know!

Some sources I've checked:

These are all very rich, impressive sets, but they do not correlate the data! All you can get is separate stats on cases, vaccinations, deaths, but sadly there's no joint info :( (and no way to join it, as all sets are already aggregated).

Good starting point

What I'd do with such data? Nothing really fancy. The most basic metrics/visualizations I'd start with are:

  1. The number of hospitalizations/respirators utilized/fatalities per N of unvaccinated/vaccinated people - does vaccination help reduce the severity of COVID-19 (once you get infected)?
  2. Drill-down (of the above), but split across different COVID-19 variants (mutations - like Delta or Omicron) - to compare severity/mortality across variants
  3. Drill-down (of the above), but split across different vaccines (Pfizer, Moderna, etc.) - to compare actual, field-level effectiveness of vaccines
  4. How does the %-tage of local society vaccinated impact the number of all cases in the vicinity (correlation of vaccination %-tage with the number of new daily cases) - to find out if and when we reach so-called "herd immunity"

We honestly need some hard data to validate/debunk facts, factoids, and total myths like:

  • do the vaccinations reduce COVID-19 severity down to the level of flu?
  • do restrictions for vaccinated people make an actual difference (save lives, unclog hospitals) only for unvaccinated people? (as only those would suffer)
  • should the vaccinated people worry about new mutations popping up? if so - about which ones?

IMHO the answers to these questions are absolutely essential if we're to shape an effective strategy for this stage of the pandemic. I don't know about you, but I don't see a clear plan (on the level of any country, not mentioning globally) for the future: what we're betting on? what we are aiming for? It seems that we just hope the pandemic eventually disappears by itself, or that people will simply get used to the situation.

Realistically, that is not going to happen. And hope seems to be an embarrassingly poor strategy.