Giuliano
's Collections
LLM Reasoning
updated
STaR: Bootstrapping Reasoning With Reasoning
Paper
β’
2203.14465
β’
Published
β’
8
Let's Verify Step by Step
Paper
β’
2305.20050
β’
Published
β’
10
Training Large Language Models to Reason in a Continuous Latent Space
Paper
β’
2412.06769
β’
Published
β’
66
Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions
Paper
β’
2411.14405
β’
Published
β’
58
Alphazero-like Tree-Search can Guide Large Language Model Decoding and
Training
Paper
β’
2309.17179
β’
Published
β’
2
Paper
β’
2412.15115
β’
Published
β’
335
A Comparative Study on Reasoning Patterns of OpenAI's o1 Model
Paper
β’
2410.13639
β’
Published
β’
16
O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple
Distillation, Big Progress or Bitter Lesson?
Paper
β’
2411.16489
β’
Published
β’
41
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level
Mathematical Reasoning
Paper
β’
2410.02884
β’
Published
β’
53
Tree of Problems: Improving structured problem solving with
compositionality
Paper
β’
2410.06634
β’
Published
β’
8
Are Your LLMs Capable of Stable Reasoning?
Paper
β’
2412.13147
β’
Published
β’
91
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Paper
β’
2407.21787
β’
Published
β’
12
Scaling LLM Test-Time Compute Optimally can be More Effective than
Scaling Model Parameters
Paper
β’
2408.03314
β’
Published
β’
53
π
QwQ-32B-Preview
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper
β’
2412.16145
β’
Published
β’
36
The Surprising Effectiveness of Test-Time Training for Abstract
Reasoning
Paper
β’
2411.07279
β’
Published
β’
3
Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs
Paper
β’
2410.18451
β’
Published
β’
16
Skywork/Skywork-Reward-Gemma-2-27B-v0.2
Text Classification
β’
Updated
β’
5.4k
β’
25
Generative Verifiers: Reward Modeling as Next-Token Prediction
Paper
β’
2408.15240
β’
Published
β’
13
Understanding Hidden Computations in Chain-of-Thought Reasoning
Paper
β’
2412.04537
β’
Published
Paper
β’
2410.12832
β’
Published
β’
6
B-STaR: Monitoring and Balancing Exploration and Exploitation in
Self-Taught Reasoners
Paper
β’
2412.17256
β’
Published
β’
44
RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement
Learning
Paper
β’
2410.02089
β’
Published
β’
12
V-STaR: Training Verifiers for Self-Taught Reasoners
Paper
β’
2402.06457
β’
Published
β’
9
RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented
Verification and Refinement
Paper
β’
2412.12881
β’
Published
β’
1
Reinforcement Learning Enhanced LLMs: A Survey
Paper
β’
2412.10400
β’
Published
Scaling of Search and Learning: A Roadmap to Reproduce o1 from
Reinforcement Learning Perspective
Paper
β’
2412.14135
β’
Published
SPaR: Self-Play with Tree-Search Refinement to Improve
Instruction-Following in Large Language Models
Paper
β’
2412.11605
β’
Published
β’
16