Language ToggleAttention Is All You Need
Published in , 2024
1. Introduction
2. Background
3. Model Arcitecture
3.1 Encoder and Decode Stacks
3.2 Attention
3.2.1 Scaled Dot-Product Attention
3.2.2 Multi-Head Attention
3.2.3 Applications mof Attention in our Model
3.3 Position-wise Feed-Forward Networks
3.4 Embeddings and Softmax
3.5 Positional Encoding
4. Why Self-Attention
5. Training
5.1 Training Data and Batching
5.2 Hardware and Schedule
5.3 Optimizer
5.4 Regularization
6. Results
6.1 Machine Translation
6.2 Model Variations
6.2 English Constituency Parsing
7. Conclusion
Download paper here