Let LLMs Break Free from Overthinking via Self-Braking Tuning

Haoran Zhao^1,2*, Yuchen Yan^1*, Yongliang Shen^1†, Haolei Xu¹, Wenqi Zhang¹, Kaitao Song³, Jian Shao¹, Weiming Lu¹, Jun Xiao¹ Yueting Zhuang¹

¹Zhejiang University, ²Tianjin University, ³Microsoft Research Asia
Preprint. Under review.
^*Equal Contribution, ^†Corresponding Author

Abstract

Large reasoning models (LRMs), such as OpenAI o1 and DeepSeek-R1, have significantly enhanced their reasoning capabilities by generating longer chains of thought, demonstrating outstanding performance across a variety of tasks. However, this performance gain comes at the cost of a substantial increase in redundant reasoning during the generation process, leading to high computational overhead and exacerbating the issue of overthinking. Although numerous existing approaches aim to address the problem of overthinking, they often rely on external interventions. In this paper, we propose a novel framework, Self-Braking Tuning (SBT), which tackles overthinking from the perspective of allowing the model to regulate its own reasoning process, thus eliminating the reliance on external control mechanisms. We construct a set of overthinking identification metrics based on standard answers and design a systematic method to detect redundant reasoning. This method accurately identifies unnecessary steps within the reasoning trajectory and generates training signals for learning self-regulation behaviors. Building on this foundation, we develop a complete strategy for constructing data with adaptive reasoning lengths and introduce an innovative braking prompt mechanism that enables the model to naturally learn when to terminate reasoning at an appropriate point. Experiments across mathematical benchmarks (AIME, AMC, MATH500, GSM8K) demonstrate that our method reduces token consumption by up to 60% while maintaining comparable accuracy to unconstrained models.

Method

To address overthinking in Large Reasoning Models (LRMs), we first analyze reasoning trajectories to identify inefficiency patterns and propose two metrics: Reasoning Efficiency Ratio (measuring early correctness) and Overthinking Marker Ratio (detecting redundant linguistic patterns). Based on these, we design Self-Braking Tuning (SBT), which includes two data construction strategies: SBT-E truncates reasoning at fixed solution-level ratios to preserve essential steps, while SBT-D dynamically halts reasoning when overthinking scores exceed thresholds. During training, redundant segments are masked to prevent reinforcement, and natural language "braking prompts" (e.g., "I’ve verified my answer, time to end thinking") teach models to autonomously terminate reasoning. This framework significantly reduces token consumption while maintaining accuracy, enabling models to self-regulate reasoning length without external constraints.

Overview of Self-Braking Tuning. Left: Data construction process with overthinking identification and self braking truncation strategies. Right: An example of automatic reasoning termination in a trained Self-Braking LLM.

Results

To evaluate the effectiveness of Self-Braking Tuning (SBT), we conducted Supervised Fine-Tuning experiments on both mathematical specialists (Qwen2.5-Math-1.5B/7B-Instruct and general-purpose models (Llama-3.2-1B and Llama-3.1-8B-Instruct), using datasets including OpenR1-Math-SBT-E and OpenR1-Math-SBT-D. Models trained with SBT demonstrated significant reductions in token consumption while maintaining strong performance across various mathematical benchmarks (e.g., AIME, MATH500, GSM8K). This validates the efficacy of addressing overthinking by enabling models to self-regulate reasoning length.

Performance of different models with Self-Braking Tuning applied, evaluated across GSM8K, MATH500, AMC23, and AIME (including AIME24 and AIME25) benchmarks.

BibTeX

@misc{zhao2025letllmsbreakfree, title={Let LLMs Break Free from Overthinking via Self-Braking Tuning}, author={Haoran Zhao and Yuchen Yan and Yongliang Shen and Haolei Xu and Wenqi Zhang and Kaitao Song and Jian Shao and Weiming Lu and Jun Xiao and Yueting Zhuang}, year={2025}, eprint={2505.14604}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2505.14604}, }

Let LLMs Break Free from Overthinking via Self-Braking Tuning

Abstract

Method

Results

Impact of Overthinking Thresholds

Overthinking Identification Threshold : A lower overthinking threshold achieves efficient pruning of redundant reasoning while maintaining high accuracy, reducing token consumption by ~49%.

Preserved Reasoning and Redundancy Masking Trade-off

Trade-off in Retaining Reasoning vs. Redundant Masking : Preserving two complete reasoning solutions and masking only a few redundant sentences strikes the optimal balance between accuracy and efficiency.

Step-Level vs. Token-Level Overthinking Detection

Granularity Comparison : Step-level detection outperforms token-level detection, as retaining coherent logical units improves model learning of consistent reasoning.

Natural Language Guidance vs. Special Token Guidance

Natural Language Guidance vs. Special Tokens : Natural language guidance leverages the model's inherent semantic understanding and self-reflection capabilities, aligning with its natural reasoning process more effectively than artificial control signals like special tokens

BibTeX