Concept graph
Glossary
Short definitions with deeper context and cross-links to sibling terms.
Data Augmentation
augment-data
A technique to artificially increase the size of a training dataset by creating modified copies of existing data.
Data Preparation
Augmented Data
Synthetic data created to enhance the diversity and quantity of training datasets.
Machine Learning
batch-learning
A learning method where the model is trained on a fixed dataset in one go.
Machine Learning
batch-training
A method of training machine learning models using a subset of the dataset in each iteration.
AI Ethics
Bias Mitigation
Techniques and strategies aimed at reducing bias in AI models and datasets.
Data Preprocessing
Data Imbalance
A situation in which classes in a dataset are not represented equally.
Data Processing
data-augmentation
The process of increasing the size and diversity of a training dataset by applying transformations.
Data Processing
data-sampling
The process of selecting a subset of data from a larger dataset.
Data Management
Dataset
A structured collection of data used for analysis and training machine learning models.
Data Management
Dataset Splitting
The process of dividing a dataset into training, validation, and test sets.
privacy technology
differential-privacy
A method to ensure that individual data points cannot be identified in datasets.
Data Processing
dimensionality-reduction
The process of reducing the number of features in a dataset while preserving important information.
Machine Learning
fine-tuning
The process of adjusting a pre-trained model on a new, often smaller dataset to improve performance on a specific task.
Training
fine-tuning dataset
fine-tuning dataset is a core generative-AI concept used across modeling, product, and governance discussions.
data-quality
noisy-labels
Labels in a dataset that are inaccurate or wrong, often leading to misguidance in model training.
data-analysis
outlier
An observation point that is distant from other observations in the dataset.
Data Science
Principal Component Analysis (PCA)
A dimensionality reduction technique used to simplify datasets while preserving variance.
Machine Learning
scalable-dot-product-attention
An efficient variant of attention mechanism designed for large datasets.
Data Sampling
stratified-sampling
A sampling method that ensures representation from different subgroups in a dataset.