GENAIWIKI

Concept graph

Glossary

Short definitions with deeper context and cross-links to sibling terms.

Data Processing

Data Annotation

The process of labeling data for training machine learning models.

Data Preparation

Data Cleaning

The process of identifying and correcting errors or inconsistencies in data to improve its quality.

Product

data contamination

data contamination is a core generative-AI concept used across modeling, product, and governance discussions.

Model Maintenance

Data Drift

Changes in data distribution that can impact model performance over time.

Data Management

Data Governance

Frameworks and policies for managing data integrity and usage.

Data Preprocessing

Data Imbalance

A situation in which classes in a dataset are not represented equally.

Data Preprocessing

Data Imputation

The process of replacing missing data with substituted values.

Data Management

Data Lake

A centralized repository that stores large volumes of raw data in its native format.

Data Management

Data Lineage

The tracking of data movement and transformation throughout its lifecycle.

Data Preparation

Data Normalization

The process of scaling individual data points to a common scale, often to improve the performance of machine learning models.

Data Management

Data Pipeline

A series of data processing steps that involve the collection, processing, and storage of data.

Data Preparation

Data Preprocessing

The steps taken to clean and prepare raw data for analysis and modeling.

AI Ethics

Data Privacy

The protection of personal data from unauthorized access and misuse.

Data Governance

Data Sovereignty

The concept that data is subject to the laws and governance structures within the nation it is collected.

Data Processing

Data Wrangling

The process of cleaning and organizing raw data into a desired format.

Data Processing

data-augmentation

The process of increasing the size and diversity of a training dataset by applying transformations.

Data Security

data-encryption

The process of converting data into a coded format to prevent unauthorized access.

Data Management

data-integration

The process of combining data from different sources to provide a unified view.

Data Preparation

data-labelling

The process of assigning meaningful labels to data points to facilitate supervised learning.

Data Management

data-migration

The process of transferring data between storage types or formats.

Data Analysis

data-mining

The process of discovering patterns and knowledge from large amounts of data.

Data Management

data-quality

A measure of the condition and reliability of data used in analysis and decision-making processes.

Data Processing

data-sampling

The process of selecting a subset of data from a larger dataset.

Data Processing

data-sourcing

The process of obtaining and collecting data from various sources.