Data: Friend or Foe?
Does anyone actually know what big data is? It has become as ubiquitous as kale; you can’t go a block without seeing an advertisement for the leafy vegetable. All of a sudden, it’s in every smoothie and salad. Everyone says it’s healthy, so we go along with it. Big data is like kale: it’s supposed to be great for you, but has anyone thought about where it came from, or how it functions? And is it working?
Big data is a broad term for extremely large data sets that can be analyzed to reveal patterns or trends. Its use was shaped by the increasing digitization of our every interaction, and is closely associated with machine learning, algorithms, and data science. Together, they provide quantifiable insight into immeasurable qualities of human behavior.
How does this work in public health?
Public health’s goal is to achieve a healthier population. Like other fields, public health has become reliant on models and data science to advance this mission. The models and other tools use millions of existing data points to help make predictions. However, we need to acknowledge that these initial data sets are often biased and occasionally racist: reflections of our overall inequitable society. This issue became apparent when a study found that a national algorithm used to determine which patients would benefit from extra care had a large racial bias against black people. ((Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464), 447–453. doi: 10.1126/science.aax2342)) The algorithm reduced the number of black patients flagged for extra care by half, because the data it draws on reflects more than 400 years of systemic racism. Without intense scrutiny, the bias in the algorithm may have never been discovered, and countless black patients would continue to suffer the consequences.
Bias in clinical medicine and public health is often brought to light through examination of the electronic medical record (EMR). As the repository for most individuals’ health information, the EMR is a treasure trove of rich data. Analysis and proper use of this data could lead to breakthroughs in clinical medicine. Unfortunately, this has not yet come to pass. A recent study reported on creation of a clinical decision support tool that draws on a large data set to track disease patterns based on race.((Glicksberg, B. S., et al. (2016). Comparative analyses of population-scale phenomic data in electronic medical records reveal race-specific disease networks. Bioinformatics (Oxford, England), 32(12), i101–i110. https://doi.org/10.1093/bioinformatics/btw282)) Based on these metrics, Hispanic and Latino patients had less quantified disease compared to patients identified as European or African American. However, this data set does not paint the true picture of disease burden, because it does not take into account structural inequities like access to healthcare, language barriers, and other socioeconomic factors.((Gianfrancesco, M. A., Tamang, S., Yazdany, J., & Schmajuk, G. (2018). Potential Biases in Machine Learning Algorithms Using Electronic Health Record Data. JAMA Internal medicine, 178(11), 1544–1547. https://doi.org/10.1001/jamainternmed.2018.3763)) As a result, if this algorithm or machine learning tool was used to provide input in a clinical setting, it could misclassify Hispanic and Latino individuals as healthier than they are in reality, and therefore divert scarce resources away from them. These leaps of logic can have severe ramifications in clinical practice, as people who need medical care are deemed “not sick” by an algorithm . These results could lead to additional bias against already marginalized populations, further perpetuating structural inequities.
What do we do?
Big data provides a clear path forward as a solution to end the inequities have plagued our country since its inception. However, if we want to rely on data, and I believe we should, we need more transparency about how it may be reinforcing racism in our society. We need federal policy to provide clarity on big data’s usage in our day-to-day lives, to provide oversight so that we may use big data to promote a just society, rather than reinforce existing biases and the health disparities that result.
Photo by Helloquence on Unsplash
Is it Feasible?
Some people in the U.S. might argue that transparent algorithms are impossible and that no country or government could impose such prescriptive regulations on companies or corporations. I would direct their attention to the European Union, which recently passed a data privacy law that allows for individuals to obtain an explanation of an algorithmic decision and to challenge the decision.((Art. 22, General Data Privacy Regulation, https://gdpr-info.eu/art-22-gdpr/)) Going further, leaders in the United Kingdom have called for all public sectors to be transparent about the role of big data, directly citing the “social implications of the data and algorithms used.” (( Data Ethics Framework. UK Dep’t for Digital, Culture, Media and Sport (2018, August 30). Retrieved from https://www.gov.uk/government/publications/data-ethics-framework/data-ethics-framework)) These steps and calls to action are the beginning of a visionary approach to more closely examining and understanding big data, an approach the United States can use as a model.
Looking Ahead
Data has the potential for so much good. In health, startups are beginning to use data analytics to reduce prescription errors, reducing cost by some $20 billion a year and saving the lives of 7,000 individuals every year from adverse reactions., Dignity Health, a hospital provider in California, has been using a new tool to detect sepsis earlier in critical patients and has seen a 5% reduction in mortality thanks to earlier intervention. 123 These are immense victories, and big data is playing a pivotal role. But to effectively wield these tools, data science cannot blindly perpetuate existing biases and their resultant health disparities. We need federal policies to provide regulation and transparency about the role of big data in health.
- Regulating Data in Public Health - May 4, 2020
- Healthcare Big Data and the Promise of Value-Based Care. (2018) NEJM Catalyst. Retrieved from https://catalyst.nejm.org/doi/full/10.1056/CAT.18.0290 [↩]
- Using Analytics to Prevent Deadly Infections. Retrieved from https://www.sas.com/en_us/insights/articles/analytics/using-analytics-to-prevent-sepsis.html [↩]
- (https://www.medaware.com/ [↩]
Pingback: Intervene Upstream Issue 3 - Underlying: Reflections on Condition » Intervene Upstream, the online peer-reviewed public health publication for graduate students