Ability to tackle long context tasks is so important for the most useful of applications for LLMs.
A lot of research involves disproving hypotheses. Aiding researchers by allowing them to set the skeleton for exhaustive search, and then using an LLM as an evolution function has been proven to work (see Alpha Evolve, Shinka Evolve, Darwin-Gödel Machines).
Training this ability to break outside the box through RL of these trajectories, paired with techniques to allow for unbounded input and output context length (RLM) seems to be the key.
