Deep Learning

transformer-architecture

A neural network architecture designed for sequence-to-sequence tasks.

Expanded definition

Introduced in the paper 'Attention is All You Need', the transformer architecture uses self-attention mechanisms to weigh the influence of different words in a sentence on each other. This approach allows it to process sequences in parallel, in contrast to RNNs which process data sequentially. Misconceptions often arise regarding its requirement for vast amounts of data and compute power, which can be mitigated with proper techniques like transfer learning.

Related terms

Explore adjacent ideas in the knowledge graph.

self-attention sequence-to-sequence BERT GPT

Comparisons, tools, and models that connect to this idea.

Azure Openai Vs Amazon Bedrock (comparisons)
Generative Model (glossary)
Claude 3 5 Sonnet (models)
Adversarial Training (glossary)
Generative Adversarial Network Gan (glossary)
Graph Machine Learning (glossary)