#pre-training

2 pages

BERT: Bidirectional Transformers

Pre-training deep bidirectional representations for NLP

GPT: Generative Pre-Training

Autoregressive language models that learn to predict the next token