Attention is All You Need


Introduces the Transformer, a sequence-to-sequence model by Vaswani et al. (2017) which uses the attention mechanism to learn dependencies between input and output tokens. Goes in-depth into the self-attention and multi-head attention mechanisms and discusses the advantages of the Transformer over recurrent neural networks.

Talk URL