vchalup - stock.adobe.com
The continued protests over the last two months have served as a wakeup call to examine various forms of systemic racism across the U.S. Data scientists and analysts are in a unique position to understand and communicate how various forms of racial inequity and bias show up across business, government and policing and identify opportunities to use data for public good.
"Data can be used to surface and point to inequities in our society, particularly when we look at data disaggregated by race in America," said Amanda Makulec, senior data visualization lead at Excella, an Agile consultancy. "While anecdotes and stories connect with us as people and bring a face to issues of racial justice and equality, seeing data on disparate outcomes for different races -- as we've seen in data showing the disproportionate impact of COVID-19 on Black Americans -- reinforces that these are systemic issues."
Using data for public good can involve raising awareness and act as a tool for advocacy. Makulec said advocates use data in the form of analysis of large surveys about Americans' living experience. But others are growing awareness using small data, related to specific events. Mona Chalabi, a data journalist from the U.K., has long been creating data visualizations of social justice issues for the public through her hand-drawn illustrations.
At the same time, data scientists are in a position to speak up when they see data misused. When initial reports of a disproportionate impact of COVID-19 on the Black community emerged, data scientists were able to understand and communicate through data how the correlation was a product of systemic racism.
"Data has frequently been weaponized against Black Americans and other marginalized communities, either in the ways algorithms are trained or in how data is collected and the stories told," Makulec said.
Get involved in your community
Makulec recommends data scientists seek out organizations already involved in anti-racism work, whether at a community level or nationally. This can include identifying Black data analysts whose voices and work you can amplify.
The Algorithmic Justice League is working to raise awareness about the impacts of AI and to incite researchers, policymakers and community members to mitigate harm and biases caused by AI. Black in AI fosters partnerships and collaborations for initiatives to increase the presence of Black individuals in AI research. Other efforts, like DataKind, bring together teams of people for focused projects for a variety of humanitarian causes.
Data scientists don't need to build algorithms to help. Organizations such as Black Girls Code, which teaches young women of color how to code and do data science, need volunteers to teach data science skills as they expand programming.
Data scientists can help different outreach programs by building analytics tools to help ensure the programs are succeeding. Julie Kae, executive director of Qlik.org, a corporate responsibility program operated by analytics software vendor Qlik, said the company partnered with New Leaders and the Harlem Educational Activities Fund to build analytics tools to measure and track the quality of each organizations' programming.
Understand the issues
There are a variety of nuances related to racial injustice and bias. Data scientists interested in using data for public good should take the time to learn how bias shows up in data.
We All Count was specifically designed to provide tools to data scientists who want to embed equity into their work, said Heather Krause, CEO of Datassist, which provides data science storytelling for nonprofits and data journalists. Data for Black Lives digs into the nuances and complexity of making sense of data on racial justice.
Data scientists need to think about playing a fundamental role in ensuring the integrity and sources of data. The old adage of "garbage-in, garbage-out" is especially relevant to social justice organizations when working with data.
"All it takes is one mistake for more reactionary corners of the public to pounce, forcing organizations to divert resources away from other efforts [that] can be advancing the cause of equality," said Charles Caldwell, vice president of product management at Logi Analytics. "Data scientists are absolutely critical to minimizing these scenarios, bringing the discipline to dispassionately assess data sources and ensuring integrity."
Find something you care about and get involved
There are tons of public data sets available, and often no one has had the skills or the interest to combine them.
"A lot of data science comes down to data cleaning and collection, and we know how to do that," said Alicia Frame, lead product manager of data science at Neo4j, a graph database vendor.
Alicia FrameLead product manager of data science, Neo4j
There isn't a tremendous public data source on police violence, but there is a comprehensive, crowdsourced effort to put together reliable, verifiable data such as MappingPoliceViolence.org. Data scientists can identify the gaps in data collected by government agencies and use their skills with natural language processing, survey design, sampling or data aggregation to find the unknowns and correct them.
Projects like Solve for Good can connect data scientists with organizations that need them. But Frame believes you could make the most impact on local organizations that don't have splashy websites.
"Find something you care about and reach out since smaller organizations are often strapped for resources and glad to have your skills," Frame said.
Where to find public data sets
Makulec said that resources like Data.world and Kaggle have terabytes of open data linked through their catalogs. She recommends subscribing to Data is Plural, a weekly newsletter highlighting interesting open data sets for a more curated experience.
Tableau has announced that it is investing $10 million in a racial data justice initiative that will curate open data for promoting equity. Caldwell said that Awesome-public-datasets is a great open source list that can help any organization that is focused on social justice in their own efforts to use data for public good. Frame recommends the American Community Survey for finding reliable demographic data relevant to social justice questions.
Tarak Shah, data scientist at the Human Rights Data Analysis Group, a nonprofit organization that analyzes global human rights violations, recommends checking out the Invisible Institute's Citizens Police Data Project, which provides a public database of complaints against the Chicago Police Department. In addition to the data, the processing code for that database is on GitHub.
But data scientists need to tread carefully when using this data for public good.
"Unlike the publicly available data for physical sciences, the publicly available data for social sciences are often flawed or contain biases that are of no fault of the data but rather the nature of how the data is collected," said Leah Butler, a data specialist at TOP Agency, a global marketing agency network. "It is one thing to know these biases and account for them when drawing conclusions, but I don't think that general data scientists will be familiar with these biases."