Data and data management
Terms related to data, including definitions about data warehousing and words and phrases about data management.- 3 V's (volume, velocity and variety) - The 3 V's (volume, velocity and variety) are three defining properties or dimensions of big data.
- 5V's of big data - The 5 V's of big data -- velocity, volume, value, variety and veracity -- are the five main and innate characteristics of big data.
- 99.999 (Five nines or Five 9s) - In computers, 99.
- address space - Address space is the amount of memory allocated for all possible addresses for a computational entity -- for example, a device, a file, a server or a networked computer.
- Allscripts - Allscripts is a vendor of electronic health record systems for physician practices, hospitals and healthcare systems.
- alternate data stream (ADS) - An alternate data stream (ADS) is a feature of Windows New Technology File System (NTFS) that contains metadata for locating a specific file by author or title.
- Anaplan - Anaplan is a web-based enterprise platform for business planning.
- Apache Solr - Apache Solr is an open source search platform built upon a Java library called Lucene.
- Apple User Enrollment - Apple User Enrollment (UE) is a form of mobile device management (MDM) for Apple products that supports iOS 13 and macOS Catalina.
- atomic data - In a data warehouse, atomic data is the lowest level of detail.
- Azure Data Studio (formerly SQL Operations Studio) - Azure Data Studio is a Microsoft tool, originally named SQL Operations Studio, for managing SQL Server databases and cloud-based Azure SQL Database and Azure SQL Data Warehouse systems.
- big data - Big data is a combination of structured, semi-structured and unstructured data that organizations collect, analyze and mine for information and insights.
- big data as a service (BDaaS) - Big data as a service (BDaS) is the delivery of data platforms and tools by a cloud provider to help organizations process, manage and analyze large data sets so they can generate insights to improve business operations and gain a competitive advantage.
- big data management - Big data management is the organization, administration and governance of large volumes of both structured and unstructured data.
- big data storage - Big data storage is a compute-and-storage architecture that collects and manages large data sets and enables real-time data analytics.
- block diagram - A block diagram is a visual representation of a system that uses simple, labeled blocks that represent single or multiple items, entities or concepts, connected by lines to show relationships between them.
- blockchain storage - Blockchain storage is a way of saving data in a decentralized network, which utilizes the unused hard disk space of users across the world to store files.
- brontobyte - A brontobyte is an unofficial measure of memory or data storage that is equal to 10 to the 27th power of bytes.
- business continuity - Business continuity is an organization's ability to maintain critical business functions during and after a disaster has occurred.
- capacity management - Capacity management is the broad term describing a variety of IT monitoring, administration and planning actions that ensure that a computing infrastructure has adequate resources to handle current data processing requirements, as well as the capacity to accommodate future loads.
- chatbot - A chatbot is a software or computer program that simulates human conversation or "chatter" through text or voice interactions.
- CICS (Customer Information Control System) - CICS (Customer Information Control System) is middleware that sits between the z/OS IBM mainframe operating system and business applications.
- clickstream data (clickstream analytics) - Clickstream data and clickstream analytics are the processes involved in collecting, analyzing and reporting aggregate data about which pages a website visitor visits -- and in what order.
- clinical data analyst - A clinical data analyst -- also referred to as a 'healthcare data analyst' -- is a healthcare information professional who verifies the validity of scientific experiments and data gathered from research.
- clinical decision support system (CDSS) - A clinical decision support system (CDSS) is an application that analyzes data to help healthcare providers make decisions and improve patient care.
- Cloud Data Management Interface (CDMI) - The Cloud Data Management Interface (CDMI) is an international standard that defines a functional interface that applications use to create, retrieve, update and delete data elements from cloud storage.
- cloud SLA (cloud service-level agreement) - A cloud SLA (cloud service-level agreement) is an agreement between a cloud service provider and a customer that ensures a minimum level of service is maintained.
- cloud storage - Cloud storage is a service model in which data is transmitted and stored on remote storage systems, where it is maintained, managed, backed up and made available to users over a network (typically the internet).
- cloud storage API - A cloud storage API is an application programming interface that connects a locally based application to a cloud-based storage system so that a user can send data to it and access and work with data stored in it.
- cloud storage service - A cloud storage service is a business that maintains and manages its customers' data and makes that data accessible over a network, usually the internet.
- cluster quorum disk - A cluster quorum disk is the storage medium on which the configuration database is stored for a cluster computing network.
- cold backup (offline backup) - A cold backup is a backup of an offline database.
- complex event processing (CEP) - Complex event processing (CEP) is the use of technology to predict high-level events.
- compliance as a service (CaaS) - Compliance as a service (CaaS) is a cloud service that specifies how a managed service provider (MSP) helps an organization meet its regulatory compliance mandates.
- conflict-free replicated data type (CRDT) - A conflict-free replicated data type (CRDT) is a data structure that lets multiple people or applications make changes to the same piece of data.
- consumer data - Consumer data is the information that organizations collect from individuals who use internet-connected platforms, including websites, social media networks, mobile apps, text messaging apps or email systems.
- containers (container-based virtualization or containerization) - Containers are a type of software that can virtually package and isolate applications for deployment.
- content personalization - Content personalization is a branding and marketing strategy in which webpages, email and other forms of content are tailored to match the characteristics, preferences or behaviors of individual users.
- Continuity of Care Document (CCD) - A Continuity of Care Document (CCD) is an electronic, patient-specific document detailing a patient's medical history.
- Continuity of Care Record (CCR) - The Continuity of Care Record, or CCR, provides a standardized way to create electronic snapshots about a patient's health information.
- core banking system - A core banking system is the software that banks use to manage their most critical processes, such as customer accounts, transactions and risk management.
- correlation - Correlation is a statistical measure that indicates the extent to which two or more variables fluctuate in relation to each other.
- correlation coefficient - A correlation coefficient is a statistical measure of the degree to which changes to the value of one variable predict change to the value of another.
- CRM (customer relationship management) analytics - CRM (customer relationship management) analytics comprises all of the programming that analyzes data about customers and presents it to an organization to help facilitate and streamline better business decisions.
- cryptographic nonce - A nonce is a random or semi-random number that is generated for a specific use.
- curation - Curation is a field of endeavor involved with assembling, managing and presenting some type of collection.
- customer data integration (CDI) - Customer data integration (CDI) is the process of defining, consolidating and managing customer information across an organization's business units and systems to achieve a "single version of the truth" for customer data.
- data abstraction - Data abstraction is the reduction of a particular body of data to a simplified representation of the whole.
- data anonymization - Data anonymization describes various techniques to remove or block data containing personally identifiable information (PII).
- data availability - Data availability is a term used by computer storage manufacturers and storage service providers to describe how data should be available at a required level of performance in situations ranging from normal through disastrous.
- data breach - A data breach is a cyber attack in which sensitive, confidential or otherwise protected data has been accessed or disclosed in an unauthorized fashion.
- data catalog - A data catalog is a software application that creates an inventory of an organization's data assets to help data professionals and business users find relevant data for analytics uses.
- data center chiller - A data center chiller is a cooling system used in a data center to remove heat from one element and deposit it into another element.
- data center services - Data center services provide the supporting components necessary to the proper operation of a data center.
- data citizen - A data citizen is an employee who relies on data to make decisions and perform job responsibilities.
- data classification - Data classification is the process of organizing data into categories that make it easy to retrieve, sort and store for future use.
- data clean room - A data clean room is a technology service that helps content platforms keep first person user data private when interacting with advertising providers.
- data cleansing (data cleaning, data scrubbing) - Data cleansing, also referred to as data cleaning or data scrubbing, is the process of fixing incorrect, incomplete, duplicate or otherwise erroneous data in a data set.
- data collection - Data collection is the process of gathering data for use in business decision-making, strategic planning, research and other purposes.
- data curation - Data curation is the process of creating, organizing and maintaining data sets so they can be accessed and used by people looking for information.
- data de-identification - Data de-identification is decoupling or masking data, to prevent certain data elements from being associated with the individual.
- data destruction - Data destruction is the process of destroying data stored on tapes, hard disks and other forms of electronic media so that it's completely unreadable and can't be accessed or used for unauthorized purposes.
- data dignity - Data dignity, also known as data as labor, is a theory positing that people should be compensated for the data they have created.
- Data Dredging (data fishing) - Data dredging -- sometimes referred to as data fishing -- is a data mining practice in which large data volumes are analyzed to find any possible relationships between them.
- data engineer - A data engineer is an IT professional whose primary job is to prepare data for analytical or operational uses.
- data feed - A data feed is an ongoing stream of structured data that provides users with updates of current information from one or more sources.
- data governance policy - A data governance policy is a documented set of guidelines for ensuring that an organization's data and information assets are managed consistently and used properly.
- data gravity - Data gravity is the ability of a body of data to attract applications, services and other data.
- data in motion - Data in motion, also referred to as data in transit or data in flight, is a process in which digital information is transported between locations either within or between computer systems.
- data in use - Data in use is data that is currently being updated, processed, accessed and read by a system.
- data integration - Data integration is the process of combining data from multiple source systems to create unified sets of information for both operational and analytical uses.
- data integrity - Data integrity is the assurance that digital information is uncorrupted and can only be accessed or modified by those authorized to do so.
- data lake - A data lake is a storage repository that holds a vast amount of raw data in its native format until it is needed for analytics applications.
- data lakehouse - A data lakehouse is a data management architecture that combines the key features and the benefits of a data lake and a data warehouse.
- data lifecycle management (DLM) - Data lifecycle management (DLM) is a policy-based approach to managing the flow of an information system's data throughout its lifecycle: from creation and initial storage to when it becomes obsolete and is deleted.
- data loss - Data loss is the intentional or unintentional destruction of information.
- data management platform (DMP) - A data management platform (DMP), also referred to as a unified data management platform (UDMP), is a centralized system for collecting and analyzing large sets of data originating from disparate sources.
- data marketplace (data market) - A data marketplace, or data market, is an online store where people can buy data.
- data masking - Data masking is a method of creating a structurally similar but inauthentic version of an organization's data that can be used for purposes such as software testing and user training.
- data mesh - Data mesh is a decentralized data management architecture for analytics and data science.
- data migration - Data migration is the process of transferring data between data storage systems, data formats or computer systems.
- data minimization - Data minimization aims to reduce the amount of collected data to only include necessary information for a specific purpose.
- data mining - Data mining is the process of sorting through large data sets to identify patterns and relationships that can help solve business problems through data analysis.
- data modeling - Data modeling is the process of creating a simplified visual diagram of a software system and the data elements it contains, using text and symbols to represent the data and how it flows.
- data observability - Data observability is a process and set of practices that aim to help data teams understand the overall health of the data in their organization's IT systems.
- data pipeline - A data pipeline is a set of network connections and processing steps that moves data from a source system to a target location and transforms it for planned business uses.
- data preprocessing - Data preprocessing, a component of data preparation, describes any type of processing performed on raw data to prepare it for another data processing procedure.
- data processing - Data processing refers to essential operations executed on raw data to transform the information into a useful format or structure that provides valuable insights to a user or organization.
- data profiling - Data profiling refers to the process of examining, analyzing, reviewing and summarizing data sets to gain insight into the quality of data.
- data protection as a service (DPaaS) - Data protection as a service (DPaaS) involves managed services that safeguard an organization's data.
- data protection authorities - Data protection authorities (DPAs) are public authorities responsible for enforcing data protection laws and regulations within a specific jurisdiction.
- data protection management (DPM) - Data protection management (DPM) is the administration, monitoring and management of backup processes to ensure backup tasks run on schedule and data is securely backed up and recoverable.
- data retention policy - In business settings, data retention is a concept that encompasses all processes for storing and preserving data, as well as the specific time periods and policies businesses enforce that determine how and for how long data should be retained.
- data scientist - A data scientist is an analytics professional who is responsible for collecting, analyzing and interpreting data to help drive decision-making in an organization.
- data source name (DSN) - A data source name (DSN) is a data structure containing information about a specific database to which an Open Database Connectivity (ODBC) driver needs to connect.
- data splitting - Data splitting is when data is divided into two or more subsets.
- data stewardship - Data stewardship is the management and oversight of an organization's data assets to help provide business users with high-quality data that is easily accessible in a consistent manner.
- data structure - A data structure is a specialized format for organizing, processing, retrieving and storing data.
- Data Transfer Project (DTP) - Data Transfer Project (DTP) is an open source initiative to facilitate customer-controlled data transfers between two online services.
- data transformation - Data transformation is the process of converting data from one format, such as a database file, XML document or Excel spreadsheet, into another.