This episode explores a paper arguing that language models can reason more effectively at test time if they are trained to use divide-and-conquer strategies instead of defaulting to a single linear chain of thought. It explains the core distinction between ordinary step-by-step reasoning and structured decomposition into subproblems, then situates that idea alongside prior work such as Tree of Thoughts, Least-to-Most prompting, self-consistency, and recent reasoning-focused post-training. The discussion highlights the paper’s main claim that current post-training regimes bias models toward linear reasoning habits, which can make naive divide-and-conquer prompting underperform unless the decomposition behavior itself is explicitly trained. A listener would find it interesting because it gets at a central question in modern AI: whether better inference-time scaling comes from simply generating longer reasoning traces, or from teaching models to search, branch, and recombine intermediate results in a more algorithmic way. Sources: 1. Training LLMs for Divide-and-Conquer Reasoning Elevates Test-Time Scalability — Xiao Liang, Zhong-Zhi Li, Zhenghao Lin, Eric Hancheng Jiang, Hengyuan Zhang, Yelong Shen, Kai-Wei Chang, Ying Nian Wu, Yeyun Gong, Weizhu Chen, 2026 http://arxiv.org/abs/2602.02477 2. Tree of Thoughts: Deliberate Problem Solving with Large Language Models — Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, Karthik Narasimhan, 2023 https://scholar.google.com/scholar?q=Tree+of+Thoughts:+Deliberate+Problem+Solving+with+Large+Language+Models 3. Parsel: Algorithmic Reasoning with Language Models by Composing Decompositions — Eric Zelikman, Qian Huang, Gabriel Poesia, Noah Goodman, Nick Haber, 2023 https://scholar.google.com/scholar?q=Parsel:+Algorithmic+Reasoning+with+Language+Models+by+Composing+Decompositions 4. Training LLMs for Divide-and-Conquer Reasoning Elevates Test-Time Scalability — Xiao Liang, Zhong-Zhi Li, Zhenghao Lin, Eric Hancheng Jiang, Hengyuan Zhang, Yelong Shen, Kai-Wei Chang, Ying Nian Wu, Yeyun Gong, Weizhu Chen, 2026 https://scholar.google.com/scholar?q=Training+LLMs+for+Divide-and-Conquer+Reasoning+Elevates+Test-Time+Scalability 5. Self-Consistency Improves Chain of Thought Reasoning in Language Models — Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, Denny Zhou, 2022 https://scholar.google.com/scholar?q=Self-Consistency+Improves+Chain+of+Thought+Reasoning+in+Language+Models 6. Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters — Charlie Snell, Jaehoon Lee, Kelvin Xu, Aviral Kumar, 2024 https://scholar.google.com/scholar?q=Scaling+LLM+Test-Time+Compute+Optimally+can+be+More+Effective+than+Scaling+Model+Parameters 7. Learning to Reason with LLMs — OpenAI, 2024 https://scholar.google.com/scholar?q=Learning+to+Reason+with+LLMs 8. DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning — Daya Guo, Dejian Yang and the DeepSeek-AI team, 2025 https://scholar.google.com/scholar?q=DeepSeek-R1:+Incentivizing+Reasoning+Capability+in+LLMs+via+Reinforcement+Learning 9. Least-to-Most Prompting Enables Complex Reasoning in Large Language Models — Denny Zhou, Nathanael Schärli, Le Hou, Jason Wei, Nathan Scales, Xuezhi Wang, Dale Schuurmans, Claire Cui, Olivier Bousquet, Quoc Le, Ed Chi, 2022 https://scholar.google.com/scholar?q=Least-to-Most+Prompting+Enables+Complex+Reasoning+in+Large+Language+Models 10. Decomposed Prompting: A Modular Approach for Solving Complex Tasks — Tushar Khot, Harsh Trivedi, Matthew Finlayson, Yao Fu, Kyle Richardson, Peter Clark, Ashish Sabharwal, 2022 https://scholar.google.com/scholar?q=Decomposed+Prompting:+A+Modular+Approach+for+Solving+Complex+Tasks 11. DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition — Z. Z. Ren, Zhihong Shao, Junxiao Song, Huajian Xin and the DeepSeek team, 2025 https://scholar.google.com/scholar?q=DeepSeek-Prover-V2:+Advancing+Formal+Mathematical+Reasoning+via+Reinforcement+Learning+for+Subgoal+Decomposition 12. Decompose, Analyze and Rethink: Solving Intricate Problems with Human-like Reasoning Cycle — Shangzi Xue, Zhenya Huang, Jiayu Liu, Xin Lin, Yuting Ning, Binbin Jin, Xin Li, Qi Liu, 2024 https://scholar.google.com/scholar?q=Decompose,+Analyze+and+Rethink:+Solving+Intricate+Problems+with+Human-like+Reasoning+Cycle 13. Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving — Luoxin Chen, Jinming Gu, Liankai Huang, Wenhao Huang, Zhicheng Jiang, Allan Jie, Xiaoran Jin, Xing Jin, Chenggang Li, Kaijing Ma, Cheng Ren, Jiawei Shen, Wenlei Shi, Tong Sun, He Sun, Jiahui Wang, Siran Wang, Zhihong Wang, Chenrui Wei, Shufa Wei, Yonghui Wu, Yuchen Wu, et al., 2025 https://scholar.google.com/scholar?q=Seed-Prover:+Deep+and+Broad+Reasoning+for+Automated+Theorem+Proving 14. DAPO: An Open-Source LLM Reinforcement Learning System at Scale — Qiying Yu, Zheng Zhang, Ruofei Zhu, Yufeng Yuan, Xiaochen Zuo, Yu Yue, Weinan Dai, Tiantian Fan, Gaohong Liu, Lingjun Liu, Xin Liu, Haibin Lin, Zhiqi Lin, Bole Ma, Guangming Sheng, Yuxuan Tong, Chi Zhang, Mofan Zhang, Wang Zhang, Hang Zhu, Jinhua Zhu, Jiaze Chen, Jiangjie Chen, Chengyi Wang, Hongli Yu, Yuxuan Song, Xiangpeng Wei, Hao Zhou, Jingjing Liu, Wei-Ying Ma, Ya-Qin Zhang, Lin Yan, Mu Qiao, Yonghui Wu, Mingxuan Wang, 2025 https://scholar.google.com/scholar?q=DAPO:+An+Open-Source+LLM+Reinforcement+Learning+System+at+Scale 15. Continuous Chain of Thought Enables Parallel Exploration and Reasoning — Halil Alperen Gozeten et al., 2025 https://scholar.google.com/scholar?q=Continuous+Chain+of+Thought+Enables+Parallel+Exploration+and+Reasoning 16. How to Think Step-by-Step: A Mechanistic Understanding of Chain-of-Thought Reasoning — Subhabrata Dutta et al., 2024 https://scholar.google.com/scholar?q=How+to+Think+Step-by-Step:+A+Mechanistic+Understanding+of+Chain-of-Thought+Reasoning 17. Decompose-ToM: Enhancing Theory of Mind Reasoning in Large Language Models through Simulation and Task Decomposition — Sneheel Sarangi et al., 2025 https://scholar.google.com/scholar?q=Decompose-ToM:+Enhancing+Theory+of+Mind+Reasoning+in+Large+Language+Models+through+Simulation+and+Task+Decomposition 18. Select-Then-Decompose: From Empirical Analysis to Adaptive Selection Strategy for Task Decomposition in Large Language Models — Shuodi Liu et al., 2025 https://scholar.google.com/scholar?q=Select-Then-Decompose:+From+Empirical+Analysis+to+Adaptive+Selection+Strategy+for+Task+Decomposition+in+Large+Language+Models 19. LearNAT: Learning NL2SQL with AST-guided Task Decomposition for Large Language Models — Weibin Liao et al., 2025 https://scholar.google.com/scholar?q=LearNAT:+Learning+NL2SQL+with+AST-guided+Task+Decomposition+for+Large+Language+Models 20. Large Language Models Reasoning Abilities Under Non-Ideal Conditions After RL-Fine-Tuning — Chang Tian et al., 2025 https://scholar.google.com/scholar?q=Large+Language+Models+Reasoning+Abilities+Under+Non-Ideal+Conditions+After+RL-Fine-Tuning 21. AI Post Transformers: Learning to Reason with 13 Parameters — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-14-learning-to-reason-with-13-parameters-54c87f.mp3 22. AI Post Transformers: IMO-Bench for Robust Mathematical Reasoning — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-04-imo-bench-for-robust-mathematical-reason-143489.mp3 23. AI Post Transformers: Test-time Scaling for Multi-Agent Collaborative Reasoning — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-22-test-time-scaling-for-multi-agent-collab-082570.mp3 24. AI Post Transformers: Chain-of-Thought Reasoning: A Brittle Mirage? — Hal Turing & Dr. Ada Shannon, 2025 https://podcast.do-not-panic.com/episodes/chain-of-thought-reasoning-a-brittle-mirage/ 25. AI Post Transformers: NeurIPS 2025: Reinforcement Learning for Reasoning in Large Language Models with One Training Example — Hal Turing & Dr. Ada Shannon, 2025 https://podcast.do-not-panic.com/episodes/neurips-2025-reinforcement-learning-for-reasoning-in-large-language-models-with/ 26. AI Post Transformers: TraceRL: Reinforcement Learning for Diffusion Language Models — Hal Turing & Dr. Ada Shannon, 2025 https://podcast.do-not-panic.com/episodes/tracerl-reinforcement-learning-for-diffusion-language-models/ Interactive Visualization: Training LLMs for Divide-and-Conquer Reasoning