What is public data?
Public data is information that can be shared, used, reused and redistributed without restriction. It encompasses a range of formats and sizes such as data sets and statistics, as well as both processed structured data and raw unstructured data. Public data is typically kept and accessed on corporate or government websites, and also stored at businesses and other data providers.
There are many reasons to publicly share data. These include protecting the public when sharing criminal data, transparency in the case of government entities that serve a general populus, and advancing new technologies in the case of artificial intelligence (AI) and machine learning (ML).
Ideally, industries can use public data that's relevant to their needs, for purposes such as to better target customers. For example, in the tech sector, if relevant public data is easily accessible, enterprises can use it to train AI and ML models to analyze information and glean insights.
Examples of public data providers and repositories
Providers of public data sets and statistics include both government-affiliated and nongovernment sources. In the U.S., the Freedom of Information Act guarantees that various types of data can be shared publicly, including environmental information and real estate and driving records. Some providers or repositories of public data include the following:
- Data.gov. This online database allows federal, state and city government agencies in the U.S. to transparently present data to the public. The catalog here is pertinent to many vertical markets. As one example, a data set on registered electric vehicles in the state of Washington serves a purpose for the automotive industry.
- HealthData.gov. This website catalogs healthcare-specific data sets. Public data from the U.S. Department of Health and Human Services, for example, can be found here.
- World Bank. Thousands of global development data sets are publicly accessible in the website's catalog.
- U.S. Bureau of Labor Statistics. Economic data from the U.S. federal government found here covers a range of categories, from employment and unemployment to inflation and workplace injuries.
- Kaggle. This community is geared toward the tech industry -- and data scientists in particular -- without any government affiliation. It aggregates massive amounts of data sets and makes them public for the purpose of advancing the field of data science.
How public data is different from open data
The terms public data and open data are often used interchangeably. However, open data is more accessible compared with public data. Only a small percentage of all public data in existence is considered open data.
Open data is typically prepared and presented in structured formats and available to anyone on government websites. For example, the World Bank's website touts its data sets as open data that's preformatted, structured and lacking restrictions. Meanwhile, public data encompasses both open data and data that's unstructured -- or public yet less accessible.
How public data is different from private data
Private data dictates that certain information or whole data sets are made available only to designated individuals. Private data often contains information about people or businesses that would be too sensitive to share openly or downright detrimental when in the wrong hands.
Private data about individuals can include medical information, financial and bank records, Social Security numbers, and other forms of government identification. For businesses, private data regarding customers or employees can only be shared with specific individuals.
In certain cases, aspects of individuals' private data can be made public as long as personally identifiable information remains private. For example, transcripts of phone calls and text messages can be available to government entities, especially if they pertain to government business. These calls can be anonymized, and their metadata can be used in public data sets if necessary.
To protect and govern the use of private data, data privacy is now a relevant topic. Laws are being implemented to ensure its effectiveness.
Learn how data anonymization best practices protect sensitive data and explore the top five U.S. open data use cases from federal data sets.