- [2025-09] Welcome to our latest survey paper "A Survey of Reinforcement Learning for Large Reasoning Models". Efficient reasoning is important for Reinforcement Learning of Large Reasoning Models.
- [2025-08] Welcome to our latest survey paper "Speed Always Wins: A Survey on Efficient Architectures for Large Language Models". Efficient architectures are natural for achieving efficient reasoning.
- [2025-07] Welcome to our latest paper "SafeWork-R1: Coevolving Safety and Intelligence under the AI-45 Law". Efficient reasoning is important for model safety and building trustworthy models.
- [2025-06] Welcome to our latest papers "OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning" and "Thinking with Images for Multimodal Reasoning". Efficient reasoning is important for multimodality and may be especially important for Thinking with images.
- [2025-06] Welcome to our latest paper "Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning". We propose an Efficient-Length Reward for training multimodal reasoning models.
- [2025-06] We include MEM1, where efficient reasoning is important for building Long-Horizon Agents.
- [2025-05] We include LIMOPro for Efficient and Effective Reasoning in Test-time Scaling.
- [2025-05] We update more papers on adaptive reasoning, which describe how a system/model autonomously switches between long and short reasoning chains based on problem complexity.
- [2025-05] Welcome to our latest paper "Scaling Reasoning, Losing Control", which shows that the longer the reasoning chain, the poorer its instruction-following ability. Therefore, efficient reasoning may also be important for instruction following in LRMs.
- [2025-04] We include AgentPrune, where efficient reasoning is important for agent systems.
- [2025-04] We include benchmarks for Efficient Reasoning: MME-CoT, S1-Bench, DUMB500.
- [2025-04] We add Mamba Reasoning models (e.g M1) and Hybrid models (e.g Mamba-Transformer) in Efficient Reasoning during Pre-training. It is naturally efficient to infer.
- [2025-04] We add a new "Model Merge" category in Efficient Reasoning during Inference. It is feasible to be a promising direction.
- [2025-04] 📢 Our work is reported by both Synced (机器之心) and Zhuanzhi (专知).
- [2025-03] 📢 Our work is reported by both Deep Learning and NLP (深度学习自然语言处理) and Machine Learning and NLP (机器学习算法与自然语言处理).
- [2025-03] We released our survey "A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond". This is the first survey for efficient reasoning of Large Reasoning Models, covering language, multimodality, agent, and applications. We provide several promising future directions in our survey.
- [2025-03] We created this repository to maintain a paper list on Awesome-Efficient-LRM-Reasoning.
If you find our survey useful for your research, please consider citing:
@article{qu2025survey,
title={A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond},
author={Qu, Xiaoye and Li, Yafu and Su, Zhaochen and Sun, Weigao and Yan, Jianhao and Liu, Dongrui and Cui, Ganqu and Liu, Daizong and Liang, Shuxian and He, Junxian and others},
journal={arXiv preprint arXiv:2503.21614},
year={2025}
}
- Awesome-Efficient-LRM-Reasoning
In the age of LRMs, we propose that "Efficiency is the essence of intelligence." Just as a wise human knows when to stop thinking and start deciding, a wise model should know when to halt unnecessary deliberation. An intelligent model should manipulate the token economy, i.e., allocating tokens purposefully, skipping redundancy, and optimizing the path to a solution. Rather than naively traversing every possible reasoning path, it should emulate a master strategist, balancing cost and performance with elegant precision.
To summarize, this survey makes the following key contributions to the literature:
- Instead of offering a general overview of LRMs, we focus on the emerging and critical topic of efficient reasoning in LRMs, providing an in-depth and targeted analysis.
- We identify and characterize common patterns of reasoning inefficiency, and outline the current challenges that are unique to improving reasoning efficiency in large models.
- We provide a comprehensive review of recent advancements aimed at enhancing reasoning efficiency, structured across the end-to-end LRM development pipeline, from pretraining and supervised fine-tuning to reinforcement learning and inference.
- Done Is Better than Perfect: Unlocking Efficient Reasoning by Structured Multi-Turn Decomposition
- Concise Reasoning, Big Gains: Pruning Long Reasoning Trace with Difficulty-Aware Prompting
- AdaCtrl: Towards Adaptive and Controllable Reasoning via Difficulty-Aware Budgeting
- Think or Not? Exploring Thinking Efficiency in Large Reasoning Models via an Information-Theoretic Lens
- Dynamic Early Exit in Reasoning Models
- Thought Manipulation: External Thought Can Be Efficient for Large Reasoning Models
- Reasoning Models Can Be Effective Without Thinking
- How Well do LLMs Compress Their Own Chain-of-Thought? A Token Complexity Approach
- Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching
- Chain of Draft: Thinking Faster by Writing Less
- SafeChain: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities
- s1: Simple test-time scaling
- Token-budget-aware llm reasoning
- Efficiently Serving LLM Reasoning Programs with Certaindex
- Make every penny count: Difficulty-adaptive self-consistency for cost-efficient reasoning
- Scaling llm test-time compute optimally can be more effective than scaling model parameters
- Concise thoughts: Impact of output length on llm reasoning and cost
- The impact of reasoning step length on large language models
- The benefits of a concise chain of thought on problem-solving in large language models
- Guiding language model reasoning with planning tokens
- DynamicMind: A Tri-Mode Thinking System for Large Language Models
- Fast-Slow-Thinking: Complex Task Solving with Large Language Models
- Think More, Hallucinate Less: Mitigating Hallucinations via Dual Process of Fast and Slow Thinking
- Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces
- Visual Agents as Fast and Slow Thinkers
- System-1.x: Learning to Balance Fast and Slow Planning with Language Models
- DynaThink: Fast or slow? A dynamic decision-making framework for large language models
- Accelerated Test-Time Scaling with Model-Free Speculative Sampling
- Learning Adaptive Parallel Reasoning with Language Models
- SplitReason: Learning To Offload Reasoning
- SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning
- MixLLM: Dynamic Routing in Mixed Large Language Models
- Closer Look at Efficient Inference Methods: A Survey of Speculative Decoding
- EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees
- RouteLLM: Learning to Route LLMs with Preference Data
- LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding
- EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
- Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
- Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models
- Speculative Decoding with Big Little Decoder
- Sampling-Efficient Test-Time Scaling: Self-Estimating the Best-of-N Sampling in Early Decoding
- Efficient Test-Time Scaling via Self-Calibration
- Scalable Best-of-N Selection for Large Language Models via Self-Certainty
- Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models
- Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback
- Fast Best-of-N Decoding via Speculative Rejection
- TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling
- Scaling llm test-time compute optimally can be more effective than scaling model parameters
- DRP: Distilled Reasoning Pruning with Skill-aware Step Decomposition for Efficient Large Reasoning Models
- Long-Short Chain-of-Thought Mixture Supervised Fine-Tuning Eliciting Efficient Reasoning in Large Language Models
- Z1: Efficient Test-time Scaling with Code
- Self-Training Elicits Concise Reasoning in Large Language Models
- TokenSkip: Controllable Chain-of-Thought Compression in LLMs
- Stepwise Perplexity-Guided Refinement for Efficient Chain-of-Thought Reasoning in Large Language Models
- C3oT: Generating Shorter Chain-of-Thought without Compromising Effectiveness
- Can Language Models Learn to Skip Steps?
- Distilling System 2 into System 1
- Beyond Chains of Thought: Benchmarking Latent-Space Reasoning Abilities in Large Language Models
- From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step
- CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation
- Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning
- SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs
- LightThinker: Thinking Step-by-Step Compression
- Efficient Reasoning with Hidden Thinking
- Training Large Language Models to Reason in a Continuous Latent Space
- Compressed Chain of Thought: Efficient Reasoning Through Dense Representations
- Reconsidering Overthinking: Penalizing Internal and External Redundancy in CoT Reasoning
- Train Long, Think Short: Curriculum Learning for Efficient Reasoning
- Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning
- SABER: Switchable and Balanced Training for Efficient LLM Reasoning
- Promoting Efficient Reasoning with Verifiable Stepwise Reward
- Aware First, Think Less: Dynamic Boundary Self-Awareness Drives Extreme Reasoning Efficiency in Large Language Models
- SmartThinker: Learning to Compress and Preserve Reasoning by Step-Level Length Control
- How Far Are We from Optimal Reasoning Efficiency?
- ConciseRL: Conciseness-Guided Reinforcement Learning for Efficient Reasoning Models
- When to Continue Thinking: Adaptive Thinking Mode Switching for Efficient Reasoning
- Learn to Reason Efficiently with Adaptive Length-based Reward Shaping
- Incentivizing Dual Process Thinking for Efficient Large Language Model Reasoning
- ARM: Adaptive Reasoning Model
- ShorterBetter: Guiding Reasoning Models to Find Optimal Inference Length for Efficient Reasoning
- HAWKEYE: Efficient Reasoning with Model Collaboration
- ThinkPrune: Pruning Long Chain-of-Thought of LLMs via Reinforcement Learning
- Think When You Need: Self-Adaptive Chain-of-Thought Learning
- DAST: Difficulty-Adaptive Slow-Thinking for Large Reasoning Models
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning
- Demystifying Long Chain-of-Thought Reasoning in LLMs
- Training Language Models to Reason Efficiently
- O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning
- Kimi k1.5: Scaling Reinforcement Learning with LLMs
- Concise Reasoning via Reinforcement Learning
- Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
- Think Smarter not Harder: Adaptive Reasoning with Inference Aware Optimization
- Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs
- LLM Pretraining with Continuous Concepts
- Scalable Language Models with Posterior Inference of Latent Thought Vectors
- Byte latent transformer: Patches scale better than tokens
- Large Concept Models: Language Modeling in a Sentence Representation Space
- RWKV-7 "Goose" with Expressive Dynamic State Evolution
- LASP-2: Rethinking Sequence Parallelism for Linear Attention and Its Hybrid
- Native sparse attention: Hardware-aligned and natively trainable sparse attention
- MoBA: Mixture of Block Attention for Long-Context LLMs
- MoM: Linear Sequence Modeling with Mixture-of-Memories
- Gated Delta Networks: Improving Mamba2 with Delta Rule
- Transformers are SSMs: Generalized models and efficient algorithms through structured state space duality
- Various Lengths, Constant Speed: Efficient Language Modeling with Lightning Attention
- Gated linear attention transformers with hardware-efficient training
- Liger: Linearizing Large Language Models to Gated Recurrent Structures
- Llamba: Scaling Distilled Recurrent Models for Efficient Language Processing
- LoLCATs: On Low-Rank Linearizing of Large Language Models
- The Mamba in the Llama: Distilling and Accelerating Hybrid Models
- M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
- Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models
- Compositional Reasoning with Transformers, RNNs, and Chain of Thought
- Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
- Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners
- [Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning](https://arxiv.org/pdf/2506.04207]
- Think or Not Think: A Study of Explicit Thinking in Rule-Based Visual Reinforcement Fine-Tunin
- Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models
- Fast-Slow Thinking for Large Vision-Language Model Reasoning
- Can Atomic Step Decomposition Enhance the Self-structured Reasoning of Multimodal Large Models?
- Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought
- Value-Guided Search for Efficient Chain-of-Thought Reasoning
- LIMOPro: Reasoning Refinement for Efficient and Effective Test-time Scaling
- Efficient Test-Time Scaling via Self-Calibration
- Dynamic self-consistency: Leveraging reasoning paths for efficient llm sampling
- X-Boundary: Establishing Exact Safety Boundary to Shield LLMs from Multi-Turn Jailbreaks without Compromising Usability
- Deliberative alignment: Reasoning enables safer language models
- MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents
- The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks
- Chain-of-Retrieval Augmented Generation
- Cut the Crap: An Economical Communication Pipeline for LLM-based Multi-Agent Systems
- MorphoBench: A Benchmark with Difficulty Adaptive to Model Reasoning
- ICCV 2025 MARS2 Workshop and Challenge "Multimodal Reasoning and Slow Thinking in the Large Model Era: Towards System 2 and Beyond''
- THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models
- S1-Bench: A Simple Benchmark for Evaluating System 1 Thinking Capability of Large Reasoning Models
- DNA Bench: When Silence is Smarter -- Benchmarking Over-Reasoning in Reasoning LLMs
- MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency
- Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs
⭐" Join us in improving this repository! If you know of any important works we've missed, please contribute. Your efforts are highly valued! "


