Concept graph
Glossary
Short definitions with deeper context and cross-links to sibling terms.
Data Processing
Data Annotation
The process of labeling data for training machine learning models.
Data Preparation
Data Cleaning
The process of identifying and correcting errors or inconsistencies in data to improve its quality.
Product
data contamination
data contamination is a core generative-AI concept used across modeling, product, and governance discussions.
Model Maintenance
Data Drift
Changes in data distribution that can impact model performance over time.
Data Management
Data Governance
Frameworks and policies for managing data integrity and usage.
Data Preprocessing
Data Imbalance
A situation in which classes in a dataset are not represented equally.
Data Preprocessing
Data Imputation
The process of replacing missing data with substituted values.
Data Management
Data Lake
A centralized repository that stores large volumes of raw data in its native format.
Data Management
Data Lineage
The tracking of data movement and transformation throughout its lifecycle.
Data Preparation
Data Normalization
The process of scaling individual data points to a common scale, often to improve the performance of machine learning models.
Data Management
Data Pipeline
A series of data processing steps that involve the collection, processing, and storage of data.
Data Preparation
Data Preprocessing
The steps taken to clean and prepare raw data for analysis and modeling.
AI Ethics
Data Privacy
The protection of personal data from unauthorized access and misuse.
Data Governance
Data Sovereignty
The concept that data is subject to the laws and governance structures within the nation it is collected.
Data Processing
Data Wrangling
The process of cleaning and organizing raw data into a desired format.
Data Processing
data-augmentation
The process of increasing the size and diversity of a training dataset by applying transformations.
Data Security
data-encryption
The process of converting data into a coded format to prevent unauthorized access.
Data Management
data-integration
The process of combining data from different sources to provide a unified view.
Data Preparation
data-labelling
The process of assigning meaningful labels to data points to facilitate supervised learning.
Data Management
data-migration
The process of transferring data between storage types or formats.
Data Analysis
data-mining
The process of discovering patterns and knowledge from large amounts of data.
Data Management
data-quality
A measure of the condition and reliability of data used in analysis and decision-making processes.
Data Processing
data-sampling
The process of selecting a subset of data from a larger dataset.
Data Processing
data-sourcing
The process of obtaining and collecting data from various sources.