Paper Review: Attention is All You Need


Paper review of the Attention is All You Need paper by Vaswani et al, which introduces the Transformer, a sequence-to-sequence model by Vaswani et al. (2017) which uses the attention mechanism to learn dependencies between input and output tokens. Goes in-depth into the self-attention and multi-head attention mechanisms and discusses the advantages of the Transformer over recurrent neural networks.

Talk URL