Concept graph
Glossary
Short definitions with deeper context and cross-links to sibling terms.
Data Preprocessing
Data Imbalance
A situation in which classes in a dataset are not represented equally.
Data Preprocessing
Data Imputation
The process of replacing missing data with substituted values.
Data Management
Data Lake
A centralized repository that stores large volumes of raw data in its native format.
Data Management
Data Lineage
The tracking of data movement and transformation throughout its lifecycle.
Data Preparation
Data Normalization
The process of scaling individual data points to a common scale, often to improve the performance of machine learning models.
Data Management
Data Pipeline
A series of data processing steps that involve the collection, processing, and storage of data.
Data Preparation
Data Preprocessing
The steps taken to clean and prepare raw data for analysis and modeling.
AI Ethics
Data Privacy
The protection of personal data from unauthorized access and misuse.
Data Governance
Data Sovereignty
The concept that data is subject to the laws and governance structures within the nation it is collected.
Data Processing
Data Wrangling
The process of cleaning and organizing raw data into a desired format.
Data Processing
data-augmentation
The process of increasing the size and diversity of a training dataset by applying transformations.
Data Security
data-encryption
The process of converting data into a coded format to prevent unauthorized access.
Data Management
data-integration
The process of combining data from different sources to provide a unified view.
Data Preparation
data-labelling
The process of assigning meaningful labels to data points to facilitate supervised learning.
Data Management
data-migration
The process of transferring data between storage types or formats.
Data Analysis
data-mining
The process of discovering patterns and knowledge from large amounts of data.
Data Management
data-quality
A measure of the condition and reliability of data used in analysis and decision-making processes.
Data Processing
data-sampling
The process of selecting a subset of data from a larger dataset.
Data Processing
data-sourcing
The process of obtaining and collecting data from various sources.
Data Presentation
data-visualization
The graphical representation of information and data to communicate insights clearly.
Data Management
Dataset
A structured collection of data used for analysis and training machine learning models.
Data Management
Dataset Splitting
The process of dividing a dataset into training, validation, and test sets.
Machine Learning
Decision Tree
A decision support tool that uses a tree-like model of decisions.
Safety
decoder-only
decoder-only is a core generative-AI concept used across modeling, product, and governance discussions.