Deep Learning
transformer-architecture
A neural network architecture designed for sequence-to-sequence tasks.
Expanded definition
Introduced in the paper 'Attention is All You Need', the transformer architecture uses self-attention mechanisms to weigh the influence of different words in a sentence on each other. This approach allows it to process sequences in parallel, in contrast to RNNs which process data sequentially. Misconceptions often arise regarding its requirement for vast amounts of data and compute power, which can be mitigated with proper techniques like transfer learning.
Related terms
Explore adjacent ideas in the knowledge graph.