Deep Q-Networks (DQN)
Combining Q-learning with deep neural networks for Atari-level game playing
Maximum Likelihood Reinforcement Learning (MaxRL)
A recent idea for training models on pass-fail tasks when sampling matters
Policy Gradient Methods
Directly optimizing policies through gradient ascent on expected returns
Proximal Policy Optimization (PPO)
A stable, sample-efficient policy gradient algorithm for reinforcement learning
Reinforcement Learning
Learning by trial and error through rewards
RLHF: Reinforcement Learning from Human Feedback
Teaching language models to prefer responses that people rank higher
World of Bits
Open-domain platform for web-based reinforcement learning agents