Advanced reasoning in large language models has achieved remarkable performance on challenging tasks, but the prevailing long-context reasoning paradigm faces critical limitations: quadratic computational scaling with sequence length, reasoning constrained by maximum context boundaries, and performance degradation beyond pre-training context windows. Existing approaches primarily compress reasoning chains without addressing the fundamental scaling problem. To overcome these challenges, we introduce InftyThink, a paradigm that transforms monolithic reasoning into an iterative process with intermediate summarization. By interleaving short reasoning segments with concise progress summaries, our approach enables unbounded reasoning depth while maintaining bounded computational costs. This creates a characteristic sawtooth memory pattern that significantly reduces computational complexity compared to traditional approaches. Furthermore, we develop a methodology for reconstructing long-context reasoning datasets into our iterative format, transforming OpenR1-Math into 333K training instances. Experiments across multiple model architectures demonstrate that our approach reduces computational costs while improving performance, with Qwen2.5-Math-7B showing 3-13% improvements across MATH500, AIME24, and GPQA_diamond benchmarks. Our work challenges the assumed trade-off between reasoning depth and computational efficiency, providing a more scalable approach to complex reasoning without architectural modifications.
While our InftyThink paradigm offers a theoretically compelling approach to unbounded reasoning, it requires appropriate training data to enable models to learn this iterative reasoning process. Prior work has established that models can acquire sophisticated reasoning capabilities through supervised fine-tuning on data generated by highly capable reasoners. Building on this insight, we develop a principled methodology for transforming existing long-context reasoning datasets into our iterative format. We select OpenR1-Math as our source dataset, which is a collection of mathematical reasoning generated by DeepSeek-R1 in response to questions from NuminaMath-1.5. This dataset spans a diverse spectrum of mathematical domains and difficulty levels, from elementary mathematics to competition-level problems, making it an ideal testbed for our approach.
Systematic pipeline for reconstructing vanilla-style long-context reasoning data into the InftyThink-style format. I. Original reasoning processes are partitioned into optimally sized fragments based on parameter eta, preserving semantic coherence. II. Meta-Llama-3.3-Instruct generates concise yet comprehensive summaries for each reasoning fragment. III. The original fragments and their generated summaries are systematically recombined to create InftyThink-style training instances that teach the model to reason iteratively.
Our main experimental results. The results are obtained by sampling the model 16 times with a temperature of 0.7. Acc stands for average accuracy(%), Tok stands for average number of generated tokens (K), and TPS stands for average number of tokens generated per second (K/s).
@misc{yan2025inftythink,
title={InftyThink: Breaking the Length Limits of Long-Context Reasoning in Large Language Models},
author={Yuchen Yan and Yongliang Shen and Yang Liu and Jin Jiang and Mengdi Zhang and Jian Shao and Yueting Zhuang},
year={2025},
eprint={2503.06692},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2503.06692},
}