-
Compressed Chain of Thought: Efficient Reasoning Through Dense Representations
Paper • 2412.13171 • Published • 31 -
o1-Coder: an o1 Replication for Coding
Paper • 2412.00154 • Published • 42 -
Critical Tokens Matter: Token-Level Contrastive Estimation Enhence LLM's Reasoning Capability
Paper • 2411.19943 • Published • 56 -
MALT: Improving Reasoning with Multi-Agent LLM Training
Paper • 2412.01928 • Published • 40
August Moharrami
August4293
·
AI & ML interests
None yet
Recent Activity
updated
a dataset
7 days ago
August4293/tldr-preference-sft-trl-style-sample
updated
a collection
12 days ago
RL Fine-tuning Reasoning
liked
a model
13 days ago
trl-internal-testing/tiny-LlamaForCausalLM-3.2
Organizations
Collections
3
Collection of papers that utilize reinforcement learning to enhance tool usage and function calling.
-
Toolformer: Language Models Can Teach Themselves to Use Tools
Paper • 2302.04761 • Published • 11 -
On the Tool Manipulation Capability of Open-source Large Language Models
Paper • 2305.16504 • Published • 2 -
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning
Paper • 2411.02337 • Published • 35
models
4
datasets
6
August4293/tldr-preference-sft-trl-style-sample
Viewer
•
Updated
•
100
•
40
August4293/tool_sample_dataset
Viewer
•
Updated
•
200
•
70
•
1
August4293/gsm8k_preference_dataset_it_2
Viewer
•
Updated
•
379
•
30
August4293/gsm8k_preference_dataset_it_1
Viewer
•
Updated
•
895
•
28
August4293/Self_Alignment_Preference-Dataset
Viewer
•
Updated
•
4.45k
•
31
August4293/CS_QA
Viewer
•
Updated
•
969
•
4