Data and data management
Terms related to data, including definitions about data warehousing and words and phrases about data management.- database management system (DBMS) - A database management system (DBMS) is a software system for creating and managing databases.
- database marketing - Database marketing is a systematic approach to the gathering, consolidation and processing of consumer data.
- database replication - Database replication is the frequent electronic copying of data from a database in one computer or server to a database in another -- so that all users share the same level of information.
- DataOps - DataOps is an Agile approach to designing, implementing and maintaining a distributed data architecture that will support a wide range of open source tools and frameworks in production.
- Db2 - Db2 is a family of database management system (DBMS) products from IBM that serve a number of different operating system (OS) platforms.
- decision-making process - A decision-making process is a series of steps one or more individuals take to determine the best option or course of action to address a specific problem or situation.
- deep analytics - Deep analytics is the application of sophisticated data processing techniques to yield information from large and typically multi-source data sets comprised of both unstructured and semi-structured data.
- digital wallet - In general, a digital wallet is a software application, usually for a smartphone, that serves as an electronic version of a physical wallet.
- dimension - In data warehousing, a dimension is a collection of reference information that supports a measurable event, such as a customer transaction.
- dimension table - In data warehousing, a dimension table is a database table that stores attributes describing the facts in a fact table.
- disambiguation - Disambiguation is the process of determining a word's meaning -- or sense -- within its specific context.
- disaster recovery (DR) - Disaster recovery (DR) is an organization's ability to respond to and recover from an event that negatively affects business operations.
- distributed ledger technology (DLT) - Distributed ledger technology (DLT) is a digital system for recording the transaction of assets in which the transactions and their details are recorded in multiple places at the same time.
- Dublin Core - Dublin Core is an international metadata standard formally known as the Dublin Core Metadata Element Set and includes 15 metadata (data that describes data) terms.
- ebXML (Electronic Business XML) - EbXML (Electronic Business XML or e-business XML) is a project to use the Extensible Markup Language (XML) to standardize the secure exchange of business data.
- Eclipse (Eclipse Foundation) - Eclipse is a free, Java-based development platform known for its plugins that allow developers to develop and test code written in other programming languages.
- edge analytics - Edge analytics is an approach to data collection and analysis in which an automated analytical computation is performed on data at a sensor, network switch or other device instead of waiting for the data to be sent back to a centralized data store.
- empirical analysis - Empirical analysis is an evidence-based approach to the study and interpretation of information.
- encoding and decoding - Encoding and decoding are used in many forms of communications, including computing, data communications, programming, digital electronics and human communications.
- encryption key management - Encryption key management is the practice of generating, organizing, protecting, storing, backing up and distributing encryption keys.
- enterprise search - Enterprise search is a type of software that lets users find data spread across organizations' internal repositories, such as content management systems, knowledge bases and customer relationship management (CRM) systems.
- entity - An entity is a single thing with a distinct separate existence.
- Epic Systems - Epic Systems, also known simply as Epic, is one of the largest providers of health information technology, used primarily by large U.
- erasure coding (EC) - Erasure coding (EC) is a method of data protection in which data is broken into fragments, expanded and encoded with redundant data pieces, and stored across a set of different locations or storage media.
- exabyte (EB) - An exabyte (EB) is a large unit of computer data storage, two to the sixtieth power bytes.
- Excel - Excel is a spreadsheet program from Microsoft and a component of its Office product group for business applications.
- exponential function - An exponential function is a mathematical function used to calculate the exponential growth or decay of a given set of data.
- extension - An extension typically refers to a file name extension.
- facial recognition - Facial recognition is a category of biometric software that maps an individual's facial features to confirm their identity.
- fact table - In data warehousing, a fact table is a database table in a dimensional model.
- file extension (file format) - In computing, a file extension is a suffix added to the name of a file to indicate the file's layout, in terms of how the data within the file is organized.
- file synchronization (file sync) - File synchronization (file sync) is a method of keeping files that are stored in several different physical locations up to date.
- FIX protocol (Financial Information Exchange protocol) - The Financial Information Exchange (FIX) protocol is an open specification intended to streamline electronic communications in the financial securities industry.
- foreign key - A foreign key is a column or columns of data in one table that refers to the unique data values -- often the primary key data -- in another table.
- garbage in, garbage out (GIGO) - Garbage in, garbage out, or GIGO, refers to the idea that in any system, the quality of output is determined by the quality of the input.
- Google BigQuery - Google BigQuery is a cloud-based big data analytics web service for processing very large read-only data sets.
- GPS coordinates - GPS coordinates are a unique identifier of a precise geographic location on the earth, usually expressed in alphanumeric characters.
- gradient descent - Gradient descent is an optimization algorithm that refines a machine learning (ML) model's parameters to create a more accurate model.
- gzip (GNU zip) - Gzip (GNU zip) is a free and open source algorithm for file compression.
- Hadoop - Hadoop is an open source distributed processing framework that manages data processing and storage for big data applications in scalable clusters of computer servers.
- Hadoop Distributed File System (HDFS) - The Hadoop Distributed File System (HDFS) is the primary data storage system Hadoop applications use.
- hashing - Hashing is the process of transforming any given key or a string of characters into another value.
- heartbeat (computing) - In computing, a heartbeat is a program that runs specialized scripts automatically whenever a system is initialized or rebooted.
- heat map (heatmap) - A heat map is a two-dimensional representation of data in which various values are represented by colors.
- histogram - A histogram is a type of chart that shows the frequency distribution of data points across a continuous range of numerical values.
- historical data - Historical data, in a broad context, is data collected about past events and circumstances pertaining to a particular subject.
- IBM IMS (Information Management System) - IBM IMS (Information Management System) is a database and transaction management system that was first introduced by IBM in 1968.
- ICD-10-CM (Clinical Modification) - The ICD-10-CM (International Classification of Diseases, 10th Revision, Clinical Modification) is a system used by physicians and other healthcare providers to classify and code all diagnoses, symptoms and procedures related to inpatient and outpatient medical care in the United States.
- in-memory analytics - In-memory analytics is an approach to querying data residing in a computer's random access memory (RAM) as opposed to querying data stored on physical drives.
- information - Information is the output that results from analyzing, contextualizing, structuring, interpreting or in other ways processing data.
- information asset - An information asset is a collection of knowledge or data that is organized, managed and valuable.
- information assurance (IA) - Information assurance (IA) is the practice of protecting physical and digital information and the systems that support the information.
- information governance - Information governance is a holistic approach to managing corporate information by implementing processes, roles, controls and metrics that treat information as a valuable business asset.
- information lifecycle management (ILM) - Information lifecycle management (ILM) is a comprehensive approach to managing an organization's data and associated metadata, starting with its creation and acquisition through when it becomes obsolete and is deleted.
- IT incident management - IT incident management is a component of IT service management (ITSM) that aims to rapidly restore services to normal following an incident while minimizing adverse effects on the business.
- Java Database Connectivity (JDBC) - Java Database Connectivity (JDBC) is an API packaged with the Java SE edition that makes it possible to connect from a Java Runtime Environment (JRE) to external, relational database systems.
- job - In certain computer operating systems, a job is the unit of work that a computer operator -- or a program called a job scheduler -- gives to the OS.
- job scheduler - A job scheduler is a computer program that enables an enterprise to schedule and, in some cases, monitor computer 'batch' jobs (units of work).
- key-value pair (KVP) - A key-value pair (KVP) is a set of two linked data items: a key, which is a unique identifier for some item of data, and the value, which is either the data that is identified or a pointer to the location of that data.
- knowledge base - In general, a knowledge base is a centralized repository of information.
- knowledge management (KM) - Knowledge management is the process an enterprise uses to gather, organize, share and analyze its knowledge in a way that's easily accessible to employees.
- laboratory information system (LIS) - A laboratory information system (LIS) is computer software that processes, stores and manages data from patient medical processes and tests.
- legal health record (LHR) - A legal health record (LHR) refers to documentation about a patient's personal health information that is created by a healthcare organization or provider.
- Lisp (programming language) - Lisp, an acronym for list processing, is a functional programming language that was designed for easy manipulation of data strings.
- LTO-8 (Linear Tape-Open 8) - LTO-8, or Linear Tape-Open 8, is a tape format from the Linear Tape-Open Consortium released in late 2017.
- medical scribe - A medical scribe is a professional who specializes in documenting patient encounters in real time under the direction of a physician.
- metadata - Often referred to as data that describes other data, metadata is structured reference data that helps to sort and identify attributes of the information it describes.
- Microsoft Azure Data Lake - Microsoft Azure Data Lake is a highly scalable public cloud service that allows developers, scientists, business professionals and other Microsoft customers to gain insight from large, complex data sets.
- Microsoft System Center - Microsoft System Center is a suite of software products designed to simplify the deployment, configuration and management of IT infrastructure and virtualized software-defined data centers.
- middleware - Middleware is software that bridges the gap between applications and operating systems by providing a method for communication and data management.
- MPP database (massively parallel processing database) - An MPP database is a database that is optimized to be processed in parallel for many operations to be performed by many processing units at a time.
- multidimensional database (MDB) - A multidimensional database (MDB) is a type of database that is optimized for data warehouse and online analytical processing (OLAP) applications.
- national identity card - A national identity card is a portable document, typically a plasticized card with digitally embedded information, that is used to verify aspects of a person's identity.
- noisy data - Noisy data is a data set that contains extra meaningless data.
- object-oriented database management system (OODBMS) - An object-oriented database management system (OODBMS), sometimes shortened to ODBMS for object database management system, is a database management system (DBMS) that supports the modelling and creation of data as objects.
- OLAP (online analytical processing) - OLAP (online analytical processing) is a computing method that enables users to easily and selectively extract and query data in order to analyze it from different points of view.
- Open Database Connectivity (ODBC) - Open Database Connectivity (ODBC) is an open standard application programming interface (API) that allows application programmers to easily access data stored in a database.
- operational data store (ODS) - An operational data store (ODS) is a type of database that's often used as an interim logical area for a data warehouse.
- operational efficiency - Operational efficiency refers to an organization's ability to reduce waste of time, effort and material while still producing a high-quality service or product.
- operational intelligence (OI) - Operational intelligence (OI) is an approach to data analysis that enables decisions and actions in business operations to be based on real-time data as it's generated or collected by companies.
- parallel file system - A parallel file system is a software component designed to store data across multiple networked servers.
- pebibyte (PiB) - A pebibyte (PiB) is a unit of measure that describes data capacity.
- performance and accountability reporting (PAR) - Performance and accountability reporting (PAR) is the process of compiling and documenting factors that quantify an organization's achievements, efficiency and adherence to budget, comparing actual results against previously articulated goals.
- personal health record (PHR) - A personal health record (PHR) is an electronic summary of health information that a patient maintains control of themselves, as opposed to their healthcare provider.
- precision agriculture - Precision agriculture (PA) is a farming management concept based on observing, measuring and responding to inter- and intra-field variability in crops.
- predictive modeling - Predictive modeling is a mathematical process used to predict future events or outcomes by analyzing patterns in a given set of input data.
- primary key (primary keyword) - A primary key, also called a primary keyword, is a column in a relational database table that's distinctive for each record.
- product data management (PDM) - Product data management (PDM) is the process of capturing and managing the electronic information related to a product so it can be reused in business processes such as design, production, distribution and marketing.
- public data - Public data is information that can be shared, used, reused and redistributed without restriction.
- radiology information system (RIS) - A radiology information system (RIS) is a networked software system for managing medical imagery and associated data.
- raw data (source data or atomic data) - Raw data is the data originally generated by a system, device or operation, and has not been processed or changed in any way.
- RDBMS (relational database management system) - A relational database management system (RDBMS) is a collection of programs and capabilities that enable IT teams and others to create, update, administer and otherwise interact with a relational database.
- real-time analytics - Real-time analytics is the use of data and related resources for analysis as soon as it enters the system.
- record - In computer data processing, a record is a collection of data items arranged for processing by a program.
- records information management (RIM) - Records information management (RIM) is a corporate area of endeavor involving the administration of all business records through their life cycle.
- refactoring - Refactoring is the process of restructuring code, while not changing its original functionality.
- relational database - A relational database is a type of database that organizes data points with defined relationships for easy access.
- Report on Compliance (ROC) - A Report on Compliance (ROC) is a form that must be completed by all Level 1 Visa merchants undergoing a PCI DSS (Payment Card Industry Data Security Standard) audit.
- restore point - A system restore point is a backup copy of important Windows operating system (OS) files and settings that can be used to recover the system to an earlier point of time in the event of system failure or instability.
- RFM analysis (recency, frequency, monetary) - RFM analysis is a marketing technique used to quantitatively rank and group customers based on the recency, frequency and monetary total of their recent transactions to identify the best customers and perform targeted marketing campaigns.