GENAIWIKI

Deep Learning

transformer-architecture

A neural network architecture designed for sequence-to-sequence tasks.

Expanded definition

Introduced in the paper 'Attention is All You Need', the transformer architecture uses self-attention mechanisms to weigh the influence of different words in a sentence on each other. This approach allows it to process sequences in parallel, in contrast to RNNs which process data sequentially. Misconceptions often arise regarding its requirement for vast amounts of data and compute power, which can be mitigated with proper techniques like transfer learning.

Related terms

Explore adjacent ideas in the knowledge graph.