The beauty of (open) data will save public global health

By guest contributor Francesco Branda

How many of us can remember our friends’ phone numbers, how many can keep in mind all the passwords to access each device? It is not just due to a lack of training if we cannot remember everything, but because we live in a highly data-driven world. Every day we are immersed by information in the form of data, we ourselves are turned into information where we browse online to buy something, express our preferences, search for the next vacation destination. The world has changed since it became possible to translate it into bits and transfer it to hardware.

Data became one of the most powerful tools to fight an epidemic

The first line of defense against infectious disease outbreaks is data. In the past, Londoners were able to track cholera trends almost in real time, thanks largely to the work of one man: a physician and statistician named William Farr. For most of the Victorian era, Farr oversaw the collection of public health statistics in England and Wales. It could be said, without exaggeration, that the news environment around us today is the one William Farr invented: a world in which the numbers that track the spread of a virus (How many intubations today? What is the growth rate of hospitalizations?) have become the most important data stream available. Farr was among the first to think systematically about how data on epidemics, their distribution in space and time, could be used to minimize the burden of current epidemics and prevent future ones. He was a pioneer not just in collecting data but also in devising ingenious new ways of representing it. In particular, he was responsible for taking raw data and making it meaningful: discovering interesting trends in the numbers, comparing health outcomes for different subgroups in the population, inventing new forms of visualization.

With the coronavirus disease (COVID-19) pandemic we found ourselves in a situation not very different from that of the Victorians, despite the great gap in scientific, technological, and medical expertise that separates us from them. Without a vaccine or widespread treatment access, our main protection was what Farr began to build almost two centuries ago, i.e., data collection and analysis, to study the characteristics and spread of the virus, monitor societies’ behavioral responses, help formulate public policy, and more. What is unique about the response to COVID-19 is that we were able to leverage a tool that was not as robust in past pandemics: the “quiet open data revolution”. One example of using open data has been the collection of data from different sources to build interactive maps able to provide daily updates on the number of cases in each country and around the world. For example, researchers at Johns Hopkins University synthesized publicly available data from across the world into COVID-19 data dashboard showing national and global trends on cases, deaths, tests, hospitalizations, and vaccines. The project Our World in Data has been widely recognized for its efforts in collecting, organizing, and presenting data related to the COVID-19 pandemic. through a range of visualizations, interactive charts, and informative articles that cover various aspects of the pandemic, such as case numbers, testing rates, vaccination progress, and policy responses in different countries. These examples illustrate how several non-governmental actors, such as academics, journalists, and the civic tech community, have effectively utilized open data to showcase the pandemic’s impact, communicate risks to the public, and promote quick analysis by researchers.

Lessons from COVID-19: Data challenges and opportunities

The coronavirus pandemic has revealed some crucial gaps in how we collect data during an emerging epidemic. Public agencies lack technical skills, data infrastructure, efficient information sharing and integration, and effective release of data in open and machine-readable formats. Most public health data during outbreaks are still largely organized on paper, pen and PDFs containing figures and analyses without allowing users to access the underlying data, preventing them from examining the data, epidemiological models and other behavioral predictions used for decision-making. COVID-19 can be an opportunity to renew attention on enhancing the quality, timeliness, and completeness of government-produced health data, by strengthening existing information management practices. For example, on the basis of experiences from past outbreaks, the Open COVID-19 Data Working Group has chosen to adopt a crowdsourcing methodology. They have assembled an international team of volunteers responsible for manually curating data sources and organizing the information in a standardized format. This format enables the representation of epidemiological data at an individual case level, facilitating the extraction of remarkably detailed information regarding case demographics, travel histories, and highly-resolution geographical distributions. An especially significant aspect of this approach is the continuous real-time screening of various information sources, which enhances its value and establishes it as a crucial tool for disease surveillance. To ensure its effectiveness, the initiative places a strong emphasis on data sharing through Google Sheets and the software development platform GitHub.

To achieve successful open data initiatives, several challenges related to data ingestion and curation still persist and addressing them is of crucial importance to ensure that open data can be leveraged during future public health emergencies.   

First and foremost, it is necessary to guarantee data quality, timeliness, completeness, and availability, by providing sufficient context and metadata (information about the data, such as a codebook, data collection methods, coverage, limitations, and so forth) to ensure correct interpretations of the data by end users. Moreover, improved principles concerning data usage and general data-sharing practices are essential, incorporating guidelines to prevent the reinforcement of existing biases or discrimination against specific populations based on factors such as gender, age, or location. For example, there has been under-reporting of cases, particularly where there was limited test availability. There are biases in reporting, with racial and ethnic minorities less likely to be tested in the US, and lower testing rates in Italy among undocumented persons from Africa and the Middle East.  For countries such as Korea and Singapore that disclosed granular data, additional challenges have been privacy concerns and increased social stigma, which can in turn discourage community members from being tested.

Another priority is developing techniques and rules for data de-identification. There is a significant risk of reidentification, prompting concerns from academic researchers and human rights activists regarding privacy and civil liberty issues. Several examples of these data leading to individual damages such as a suspension of Uber accounts in Mexico to a couple drivers who unknowingly provided a ride to an infected patient and their recent passengers. In South Korea, the identification of cases associated with the Shincheonji Church of Jesus has resulted in discriminatory treatment towards its members. Similarly, when an outbreak was traced back to a ‘super spreader’ who had visited an LGBTQ night club, it led to stigmatization against the LGBTQ community.

Finally, realizing the full potential of open data requires the development of an open data ecosystem that actively involves diverse data users to harness data for innovative purposes. As with the COVID-19 example, journalists, academics, and software developers have been able to use open data in creative ways to generate value. However, simply publishing data on the web is not enough; proactive policies are essential to foster a collaborative and interdependent network of actors with diverse roles and functions. Future pandemics are inevitable, but we can reduce the risk: a truly open-mindedness can help users overcome existing geographic, organizational, and social barriers to accessing information and enable great accountability and democratization of public health.

PLOS Global Public Health addresses deeply entrenched global inequities in public health and makes impactful research visible and accessible to health professionals, policy-makers, and local communities without barriers. Submit your original research today!

About the author:

Francesco Branda is a data scientist investigating questions related to the spatial spread of infectious diseases. His work spans a range of topics, including infectious disease modeling, social network analysis, and decision making under uncertainty. He earned a Ph.D  in Information and Communication Technologies (ICT) at Department of Computer Science, Modeling, Electronics and Systems Engineering (DIMES), University of Calabria, Rende, Italy. For more information see: He can be found on Twitter at @fbranda94

Disclaimer: Views expressed by contributors are solely those of individual contributors, and not necessarily those of PLOS.


Leave a Reply

Your email address will not be published. Required fields are marked *