GENAIWIKI

Data Preprocessing

Data Imbalance

A situation in which classes in a dataset are not represented equally.

Expanded definition

Data imbalance occurs when one class in a classification problem has significantly more samples than another, leading to biased model training. This imbalance can result in models that perform well on the majority class but poorly on the minority class, which is often of greater interest. Techniques to address data imbalance include resampling methods, synthetic data generation, and the use of specialized algorithms designed to handle such situations.

Related terms

Explore adjacent ideas in the knowledge graph.