2 pages
Training giant neural networks by pipelining micro-batches across devices
The initial phase of training foundation models on vast amounts of data