Adam Optimizer
Adaptive learning rates with momentum for deep learning
Batch Normalization
Normalizing layer inputs to accelerate deep network training
Dropout: Regularization for Neural Networks
Randomly dropping units during training to prevent overfitting
GPipe: Easy Scaling with Micro-Batch Pipeline Parallelism
Training giant neural networks by pipelining micro-batches across devices
Layer Normalization
Normalizing each example across its features
Pre-training
The stage where a model learns broad patterns from a very large dataset