LLM Research Papers: The 2025 List (January to June)

As some of you know, I keep a running list of research papers I (want to) read and reference.About six months ago, I shared my 2024 list, which many readers found useful. So, I was thinking about doing this again. However, this time, I am incorporating that one piece of feedback kept coming up: “Can you organize the papers by topic instead of date?”The categories I came up with are:Reasoning Models- 1a. Training Reasoning Models- 1b. Inference-Time Reasoning Strategies- 1c. Evaluating LLMs and/or Understanding ReasoningOther Reinforcement Learning Methods for LLMsOther Inference-Time Scaling MethodsEfficient Training & ArchitecturesDiffusion-Based Language ModelsMultimodal & Vision-Language ModelsData & Pre-training DatasetsAlso, as LLM research continues to be shared at a rapid pace, I have decided to break the list into bi-yearly updates. This way, the list stays digestible, timely, and hopefully useful for anyone looking for solid summer reading material.Please note that this is just a curated list for now. In future articles, I plan to revisit and discuss some of the more interesting or impactful papers in larger topic-specific write-ups. Stay tuned!1. Reasoning ModelsThis year, my list is very reasoning model-heavy. So, I decided to subdivide it into 3 categories: Training, inference-time scaling, and more general understanding/evaluation.1a. Training Reasoning ModelsThis subsection focuses on training strategies specifically designed to improve reasoning abilities in LLMs. As you may see, much of the recent progress has centered around reinforcement learning (with verifiable rewards), which I covered in more detail in a previous article.Annotated figure from Reinforcement Pre-Training, https://arxiv.org/abs/2506.080078 Jan, Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought, https://arxiv.org/abs/2501.0468213 Jan, The Lessons of Developing Process Reward Models in Mathematical Reasoning, https://arxiv.org/abs/2501.0730116 Jan, Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models, https://arxiv.org/abs/2501.0968620 Jan, Reasoning Language Models: A Blueprint, https://arxiv.org/abs/2501.1122322 Jan, Kimi k1.5: Scaling Reinforcement Learning with LLMs, https://arxiv.org/abs//2501.1259922 Jan, DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning, https://arxiv.org/abs/2501.129483 Feb, Competitive Programming with Large Reasoning Models, https://arxiv.org/abs/2502.068075 Feb, Demystifying Long Chain-of-Thought Reasoning in LLMs, Demystifying Long Chain-of-Thought Reasoning in LLMs, https://arxiv.org/abs/2502.033735 Feb, LIMO: Less is More for Reasoning, https://arxiv.org/abs/2502.033875 Feb, Teaching Language Models to Critique via Reinforcement Learning, https://arxiv.org/abs/2502.034926 Feb, Training Language Models to Reason Efficiently, https://arxiv.org/abs/2502.0446310 Feb, Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning, https://arxiv.org/abs/2502.0678110 Feb, On the Emergence of Thinking in LLMs I: Searching for the Right Intuition, https://arxiv.org/abs/2502.0677311 Feb, LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!, https://arxiv.org/abs/2502.0737412 Feb, Fino1: On the Transferability of Reasoning Enhanced LLMs to Finance, https://arxiv.org/abs/2502.0812713 Feb, Adapting Language-Specific LLMs to a Reasoning Model in One Day via Model Merging – An Open Recipe, https://arxiv.org/abs/2502.0905620 Feb, Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning, https://arxiv.org/abs/2502.1476825 Feb, SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution, https://arxiv.org/abs/2502.184494 Mar, Learning from Failures in Multi-Attempt Reinforcement Learning, https://arxiv.org/abs/2503.048084 Mar, The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models, https://arxiv.org/abs/2503.0287510 Mar, R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning, https://arxiv.org/abs/2503.0559210 Mar, LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL, https://arxiv.org/abs/2503.0753612 Mar, Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning, https://arxiv.org/abs/2503.0951616 Mar, Towards Hierarchical Multi-Step Reward Models for Enhanced Reasoning in Large Language Models, https://arxiv.org/abs/2503.1355120 Mar, Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn’t, https://arxiv.org/abs/2503.1621925 Mar, ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning, https://arxiv.org/abs/2503.1947026 Mar, Understanding R1-Zero-Like Training: A Critical Perspective, https://arxiv.org/abs/2503.2078330 Mar, RARE: Retrieval-Augmented Reasoning Modeling, https://arxiv.org/abs/2503.2351331 Mar, Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model, https://arxiv.org/abs/2503.2429031 Mar, JudgeLRM: Large Reasoning Models as a Judge, https://arxiv.org/abs/2504.000507 Apr, Concise Reasoning via Reinforcement Learning, https://arxiv.org/abs/2504.0518510 Apr, VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning, https://arxiv.org/abs/2504.0883711 Apr, Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning, https://arxiv.org/abs/2504.0867213 Apr, Leveraging Reasoning Model Answers to Enhance Non-Reasoning Model Capability, https://arxiv.org/abs/2504.0963921 Apr, Learning to Reason under Off-Policy Guidance, https://arxiv.org/abs/2504.1494522 Apr, Tina: Tiny Reasoning Models via LoRA, https://arxiv.org/abs/2504.1577729 Apr, Reinforcement Learning for Reasoning in Large Language Models with One Training Example, https://arxiv.org/abs/2504.2057130 Apr, Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math, https://arxiv.org/abs/2504.212332 May, Llama-Nemotron: Efficient Reasoning Models, https://arxiv.org/abs/2505.009495 May, RM-R1: Reward Modeling as Reasoning, https://arxiv.org/abs/2505.023876 May, Absolute Zero: Reinforced Self-play Reasoning with Zero Data, https://arxiv.org/abs/2505.0333512 May, INTELLECT-2: A Reasoning Model Trained Through Globally Decentralized Reinforcement Learning, https://arxiv.org/abs/2505.0729112 May, MiMo: Unlocking the Reasoning Potential of Language Model — From Pretraining to Posttraining, https://arxiv.org/abs/2505.0760814 May, Qwen3 Technical Report, https://arxiv.org/abs/2505.0938815 May, Beyond ‘Aha!’: Toward Systematic Meta-Abilities Alignment in Large Reasoning Models, https://arxiv.org/abs/2505.1055419 May, AdaptThink: Reasoning Models Can Learn When to Think, https://arxiv.org/abs/2505.1341719 May, Thinkless: LLM Learns When to Think, https://arxiv.org/abs/2505.1337920 May, General-Reasoner: Advancing LLM Reasoning Across All Domains, https://arxiv.org/abs/2505.1465221 May, Learning to Reason via Mixture-of-Thought for Logical Reasoning, https://arxiv.org/abs/2505.1581721 May, RL Tango: Reinforcing Generator and Verifier Together for Language Reasoning, https://arxiv.org/abs/2505.1503423 May, QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning, https://www.arxiv.org/abs/2505.1766726 May, Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles, https://arxiv.org/abs/2505.1991426 May, Learning to Reason without External Rewards, https://arxiv.org/abs/2505.1959029 May, Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents, https://arxiv.org/abs/2505.2295430 May, Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning, https://arxiv.org/abs/2505.2472630 May, ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models, https://arxiv.org/abs/2505.248642 Jun, Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning, https://arxiv.org/abs/2506.019393 Jun, Rewarding the Unlikely: Lifting GRPO Beyond Distribution Sharpening, https://www.arxiv.org/abs/2506.023559 Jun, Reinforcement Pre-Training, https://arxiv.org/abs/2506.0800710 Jun, RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling, https://arxiv.org/abs/2506.0867210 Jun, Reinforcement Learning Teachers of Test Time Scaling, https://www.arxiv.org/abs/2506.0838812 Jun, Magistral, https://arxiv.org/abs/2506.1091012 Jun, Spurious Rewards: Rethinking Training Signals in RLVR, https://arxiv.org/abs/2506.1094716 Jun, AlphaEvolve: A coding agent for scientific and algorithmic discovery, https://arxiv.org/abs/2506.1313117 Jun, Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs, https://arxiv.org/abs/2506.1424523 Jun, Programming by Backprop: LLMs Acquire Reusable Algorithmic Abstractions During Code Training, https://arxiv.org/abs/2506.1877726 Jun, Bridging Offline and Online Reinforcement Learning for LLMs, https://arxiv.org/abs/2506.214951b. Inference-Time Reasoning StrategiesThis part of the list covers methods that improve reasoning dynamically at test time, without requiring retraining. Often, these papers are focused on trading of computational performance for modeling performance.

Read more