2 pages
A recent idea for training models on pass-fail tasks when sampling matters
A stable, sample-efficient policy gradient algorithm for reinforcement learning