GENAIWIKI

Concept graph

Glossary

Short definitions with deeper context and cross-links to sibling terms.

Data Augmentation

augment-data

A technique to artificially increase the size of a training dataset by creating modified copies of existing data.

Data Preparation

Augmented Data

Synthetic data created to enhance the diversity and quantity of training datasets.

Machine Learning

batch-learning

A learning method where the model is trained on a fixed dataset in one go.

Machine Learning

batch-training

A method of training machine learning models using a subset of the dataset in each iteration.

AI Ethics

Bias Mitigation

Techniques and strategies aimed at reducing bias in AI models and datasets.

Data Preprocessing

Data Imbalance

A situation in which classes in a dataset are not represented equally.

Data Processing

data-augmentation

The process of increasing the size and diversity of a training dataset by applying transformations.

Data Processing

data-sampling

The process of selecting a subset of data from a larger dataset.

Data Management

Dataset

A structured collection of data used for analysis and training machine learning models.

Data Management

Dataset Splitting

The process of dividing a dataset into training, validation, and test sets.

privacy technology

differential-privacy

A method to ensure that individual data points cannot be identified in datasets.

Data Processing

dimensionality-reduction

The process of reducing the number of features in a dataset while preserving important information.

Machine Learning

fine-tuning

The process of adjusting a pre-trained model on a new, often smaller dataset to improve performance on a specific task.

Training

fine-tuning dataset

fine-tuning dataset is a core generative-AI concept used across modeling, product, and governance discussions.

data-quality

noisy-labels

Labels in a dataset that are inaccurate or wrong, often leading to misguidance in model training.

data-analysis

outlier

An observation point that is distant from other observations in the dataset.

Data Science

Principal Component Analysis (PCA)

A dimensionality reduction technique used to simplify datasets while preserving variance.

Machine Learning

scalable-dot-product-attention

An efficient variant of attention mechanism designed for large datasets.

Data Sampling

stratified-sampling

A sampling method that ensures representation from different subgroups in a dataset.