funkyfrogstock - Fotolia

2020 census data collection being affected by COVID-19

The COVID-19 crisis could hurt the 2020 census by making it impossible for the Census Bureau to collect data by going door-to-door.

With the COVID-19 crisis making it all but impossible for people to go door-to-door to collect data for the 2020 census, the United States Census Bureau is more reliant than ever on the internet for collecting vital information.

Census data is used to determine everything from the makeup of the House of Representatives to apportioning federal funds for social programs and local public schools and fire departments. It's also used in the private sector to help organizations make data-driven decisions.

And without the ability to send people out into neighborhoods to knock on doors and count those who may not have responded to telephone or mail inquiries and may not have access to the internet, parts of the population are at risk of going uncounted. That could have weaken their representation in Congress and result in lower federal funding for states, cities and towns.

At the same time, however, reliance on the internet could also improve the accuracy of the data in the 2020 census by making it easier for people with internet access to respond.

Trifacta, founded in 2012 and based in San Francisco, is an analytics vendor specializing in data wrangling software that helps self-service analysts find and prepare the right data for their analytics needs.

Tom Beck is Trifacta's federal account executive. He recently discussed the effect COVID-19 will have on the 2020 census data, who might be most vulnerable to going unaccounted for without door-to-door data collection, and how the Census Bureau can still try to find and count people who don't respond to the survey in order to most accurately reflect the U.S. population.

Due to the COVID-19 crisis, how is 2020 census data being collected differently than in the past?

Tom BeckTom Beck

Tom Beck: One big difference is that because of COVID-19, the Census Bureau will not be able to deploy employees to do face-to-face interviews, house-to-house calls. They have three mechanisms that they use now -- telephone responses to the survey, written responses to the survey, and the most popular one this year is going to be the web-based response to the survey. Typically, in the past, they would also supplement those with folks out in the field that do face-to-face, door-to-door surveys to make sure they cover as many people as possible.

What are the risks in terms of the accuracy and quality of the 2020 census data that might arise because the census cannot be conducted door-to-door for those who don't respond via other methods?

With COVID-19 coming along so fast and right on top of when they were starting the survey, they may not have had a chance to test all of their systems to be able to ingest all the data, collect all the data, sort all the data and do it at volume and scale.
Tom BeckFederal account executive, Trifacta

Beck: One risk would be that they're relying heavily on the data from the website. That's going to be the big difference. They're pushing as many folks to the website as possible. A second risk associated with that would be that with COVID-19 coming along so fast and right on top of when they were starting the survey, they may not have had a chance to test all of their systems to be able to ingest all the data, collect all the data, sort all the data and do it at volume and at scale. There are a couple of risks there -- not being able to test it, and not having done this before without human supplemental teams.

What are the potential consequences of the combination of no door-to-door data collection and any technological issues the website may face?

Beck: In my opinion, it all boils down to data-driven decision-making. That's essentially what the census is all about. The main objective is to count the U.S. population and to apportion the House of Representatives, but as we know that data that comes from the census every 10 years is used in many other ways and touches many facets of our country -- political, socioeconomic, both public sector and private sector -- so the impact is very high. That, to me, is what it boils down to, all the data-driven decisions that come from the data that is being collected through this census.

Are there any subsets of the population in particular who are at greater risk of not being included in the 2020 census data due to the lack of door-to-door interviews?

Beck: Certainly more rural areas that don't have access to the internet. Those are obvious ones that would be at risk, and they're going to have to use either the telephone response or the written response or maybe they could go to a public facility that has internet access.

What about the elderly and people in poorer communities who may not have internet access in their homes -- are they going to be able to be found and counted?

Beck: Elderly folks may or may not have access to the internet, and then a lot of them are in nursing facilities that are locked down because of COVID-19, so that's another potential risk area.

Are there ways to account for people who may not respond, perhaps go back to a past census and extrapolate from that, or if a large portion of a given population doesn't respond, are they just not going to get some of the funding they may have in the past?

Beck: I believe they would extrapolate. They also do ongoing data collection in between the decennial years -- they do ongoing surveys and recalibration of their data each year. I think they probably have some sort of a backup plan for the rural areas and the elderly and other areas you mentioned.

You're looking at data quality at that point, so that's where a strong data quality posture comes into play for an organization like the Census Bureau. They're going to need to be able to take all of these disparate data sources from past and present data collection and blend them together to be able to do their analysis, and then of course all of their consumers that need their data will need to do the same.

So there is a way that if large swaths of the country go uncounted in the 2020 census data that in 2021 that can rectified and they wouldn't have to wait until 2030, is that correct?

Beck: Yes, but in the early aftermath of this census, the data that they're going to be using to make decisions will be impacted for folks that can't wait for any updates.

Who might someone like that be?

Beck: Going back to the idea of data-driven decision-making, you're collecting many data points and then doing analytics for either patterns or anomalies or clusters, and folks are doing this not only in the government but the private sector, so folks that might not be able to wait would be the private sector businesses that need take the census data and blend it with other data sources to do some kind of downstream analytic application, and if the data isn't good then the outcome isn't going to be good. Another example of that would be what's happening right now with COVID-19. Folks are doing analysis of how to reopen their businesses, so I could see that process ongoing not only through the end of this year but also on into 2021 when the census data comes out, so I think census data is going to be important for COVID-19 response, not just this year but ongoing.

While the absence of door-to-door data collection could have a negative impact on 2020 census data, could the reliance on the internet have a positive impact?

Beck: I'm hopeful that with the heavy reliance on the internet they'll actually have better data collection. Hopefully, people will find it more convenient to respond via the web-based collection method. I know that my family did that. We wanted to be responsive, we wanted to do our part as good citizens, and I'm hoping that maybe a lot of other folks who maybe aren't in those risk areas we talked about earlier like rural or elderly, folks that maybe normally would have plenty of access to the internet but for whatever reason didn't get around to responding, maybe this time there's going to be a better response from the rest of the population because of the heavy emphasis on the internet.

I'm hopeful that we're going to get better data from those non-risk areas that will hopefully lead to better data quality.

Editor's note: This Q&A has been edited for clarity and conciseness.

Next Steps

Trifacta unveils new integrations to enable data wrangling

Dig Deeper on Data science and analytics

Data Management
Content Management