2 pages
A paradigm where LLMs treat context as an environment and recursively call themselves on sub-problems
Learning by trial and error through rewards