Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2409.02392

alignment_24_best

KTO: Model Alignment as Prospect Theoretic Optimization

Paper • 2402.01306 • Published Feb 2, 2024 • 16
Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Paper • 2305.18290 • Published May 29, 2023 • 50
SimPO: Simple Preference Optimization with a Reference-Free Reward

Paper • 2405.14734 • Published May 23, 2024 • 11
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment

Paper • 2408.06266 • Published Aug 12, 2024 • 10

Attention Heads of Large Language Models: A Survey

Paper • 2409.03752 • Published Sep 5, 2024 • 89
FuzzCoder: Byte-level Fuzzing Test via Large Language Model

Paper • 2409.01944 • Published Sep 3, 2024 • 45
Building Math Agents with Multi-Turn Iterative Preference Learning

Paper • 2409.02392 • Published Sep 4, 2024 • 15
Statically Contextualizing Large Language Models with Typed Holes

Paper • 2409.00921 • Published Sep 2, 2024 • 4

Building Math Agents with Multi-Turn Iterative Preference Learning

Paper • 2409.02392 • Published Sep 4, 2024 • 15
Statically Contextualizing Large Language Models with Typed Holes

Paper • 2409.00921 • Published Sep 2, 2024 • 4

Automated Design of Agentic Systems

Paper • 2408.08435 • Published Aug 15, 2024 • 39
Self-Refine: Iterative Refinement with Self-Feedback

Paper • 2303.17651 • Published Mar 30, 2023 • 2
Automating Thought of Search: A Journey Towards Soundness and Completeness

Paper • 2408.11326 • Published Aug 21, 2024 • 1
Building Math Agents with Multi-Turn Iterative Preference Learning

Paper • 2409.02392 • Published Sep 4, 2024 • 15

Papers I want to read

Papers in my to-read list

RLHF Workflow: From Reward Modeling to Online RLHF

Paper • 2405.07863 • Published May 13, 2024 • 66
Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16, 2024 • 126
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

Paper • 2405.15574 • Published May 24, 2024 • 53
An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27, 2024 • 87

Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms

Paper • 2406.02900 • Published Jun 5, 2024 • 11
Building Math Agents with Multi-Turn Iterative Preference Learning

Paper • 2409.02392 • Published Sep 4, 2024 • 15

Can large language models explore in-context?

Paper • 2403.15371 • Published Mar 22, 2024 • 32
Advancing LLM Reasoning Generalists with Preference Trees

Paper • 2404.02078 • Published Apr 2, 2024 • 44
Long-context LLMs Struggle with Long In-context Learning

Paper • 2404.02060 • Published Apr 2, 2024 • 36
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences

Paper • 2404.03715 • Published Apr 4, 2024 • 60

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Paper • 2403.05530 • Published Mar 8, 2024 • 61
Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order

Paper • 2404.00399 • Published Mar 30, 2024 • 41
Rho-1: Not All Tokens Are What You Need

Paper • 2404.07965 • Published Apr 11, 2024 • 88
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

Paper • 2406.08464 • Published Jun 12, 2024 • 65

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs