daily_paper_coll - a Jerrycool Collection

Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Jerrycool 's Collections

daily_paper_coll

daily_paper_coll

updated Sep 9, 2024

Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Paper • 2402.19427 • Published Feb 29, 2024 • 52
Beyond Language Models: Byte Models are Digital World Simulators

Paper • 2402.19155 • Published Feb 29, 2024 • 49
StarCoder 2 and The Stack v2: The Next Generation

Paper • 2402.19173 • Published Feb 29, 2024 • 136
Simple linear attention language models balance the recall-throughput tradeoff

Paper • 2402.18668 • Published Feb 28, 2024 • 18
Priority Sampling of Large Language Models for Compilers

Paper • 2402.18734 • Published Feb 28, 2024 • 16
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 605
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method

Paper • 2402.17193 • Published Feb 27, 2024 • 23
Towards Optimal Learning of Language Models

Paper • 2402.17759 • Published Feb 27, 2024 • 16
Training-Free Long-Context Scaling of Large Language Models

Paper • 2402.17463 • Published Feb 27, 2024 • 19
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases

Paper • 2402.14905 • Published Feb 22, 2024 • 126
Divide-or-Conquer? Which Part Should You Distill Your LLM?

Paper • 2402.15000 • Published Feb 22, 2024 • 22
MathScale: Scaling Instruction Tuning for Mathematical Reasoning

Paper • 2403.02884 • Published Mar 5, 2024 • 15
Design2Code: How Far Are We From Automating Front-End Engineering?

Paper • 2403.03163 • Published Mar 5, 2024 • 93
Wukong: Towards a Scaling Law for Large-Scale Recommendation

Paper • 2403.02545 • Published Mar 4, 2024 • 15
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Paper • 2403.03507 • Published Mar 6, 2024 • 183
ShortGPT: Layers in Large Language Models are More Redundant Than You Expect

Paper • 2403.03853 • Published Mar 6, 2024 • 61
SaulLM-7B: A pioneering Large Language Model for Law

Paper • 2403.03883 • Published Mar 6, 2024 • 77
Backtracing: Retrieving the Cause of the Query

Paper • 2403.03956 • Published Mar 6, 2024 • 10
Learning to Decode Collaboratively with Multiple Language Models

Paper • 2403.03870 • Published Mar 6, 2024 • 18
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference

Paper • 2403.04132 • Published Mar 7, 2024 • 38
Yi: Open Foundation Models by 01.AI

Paper • 2403.04652 • Published Mar 7, 2024 • 62
LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error

Paper • 2403.04746 • Published Mar 7, 2024 • 22
Common 7B Language Models Already Possess Strong Math Capabilities

Paper • 2403.04706 • Published Mar 7, 2024 • 16
How Far Are We from Intelligent Visual Deductive Reasoning?

Paper • 2403.04732 • Published Mar 7, 2024 • 19
Stealing Part of a Production Language Model

Paper • 2403.06634 • Published Mar 11, 2024 • 90
Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU

Paper • 2403.06504 • Published Mar 11, 2024 • 53
Algorithmic progress in language models

Paper • 2403.05812 • Published Mar 9, 2024 • 18
BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences

Paper • 2403.09347 • Published Mar 14, 2024 • 20
Language models scale reliably with over-training and on downstream tasks

Paper • 2403.08540 • Published Mar 13, 2024 • 14
On the Societal Impact of Open Foundation Models

Paper • 2403.07918 • Published Feb 27, 2024 • 16
Long-form factuality in large language models

Paper • 2403.18802 • Published Mar 27, 2024 • 24
Towards a World-English Language Model for On-Device Virtual Assistants

Paper • 2403.18783 • Published Mar 27, 2024 • 4
The Unreasonable Ineffectiveness of the Deeper Layers

Paper • 2403.17887 • Published Mar 26, 2024 • 78
LLM Agent Operating System

Paper • 2403.16971 • Published Mar 25, 2024 • 65
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression

Paper • 2403.15447 • Published Mar 18, 2024 • 16
Can large language models explore in-context?

Paper • 2403.15371 • Published Mar 22, 2024 • 32
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

Paper • 2403.09629 • Published Mar 14, 2024 • 75
Jamba: A Hybrid Transformer-Mamba Language Model

Paper • 2403.19887 • Published Mar 28, 2024 • 104
Transformer-Lite: High-efficiency Deployment of Large Language Models on Mobile Phone GPUs

Paper • 2403.20041 • Published Mar 29, 2024 • 34
Gecko: Versatile Text Embeddings Distilled from Large Language Models

Paper • 2403.20327 • Published Mar 29, 2024 • 47
Advancing LLM Reasoning Generalists with Preference Trees

Paper • 2404.02078 • Published Apr 2, 2024 • 44
Long-context LLMs Struggle with Long In-context Learning

Paper • 2404.02060 • Published Apr 2, 2024 • 36
ReFT: Representation Finetuning for Language Models

Paper • 2404.03592 • Published Apr 4, 2024 • 91
Training LLMs over Neurally Compressed Text

Paper • 2404.03626 • Published Apr 4, 2024 • 21
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline

Paper • 2404.02893 • Published Apr 3, 2024 • 20
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2, 2024 • 104
Red Teaming GPT-4V: Are GPT-4V Safe Against Uni/Multi-Modal Jailbreak Attacks?

Paper • 2404.03411 • Published Apr 4, 2024 • 8
Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models

Paper • 2404.02575 • Published Apr 3, 2024 • 48
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences

Paper • 2404.03715 • Published Apr 4, 2024 • 60
Stream of Search (SoS): Learning to Search in Language

Paper • 2404.03683 • Published Apr 1, 2024 • 29
CantTalkAboutThis: Aligning Language Models to Stay on Topic in Dialogues

Paper • 2404.03820 • Published Apr 4, 2024 • 24
Social Skill Training with Large Language Models

Paper • 2404.04204 • Published Apr 5, 2024 • 15
Pre-training Small Base LMs with Fewer Tokens

Paper • 2404.08634 • Published Apr 12, 2024 • 34
Learn Your Reference Model for Real Good Alignment

Paper • 2404.09656 • Published Apr 15, 2024 • 82
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

Paper • 2404.08801 • Published Apr 12, 2024 • 64
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models

Paper • 2405.01535 • Published May 2, 2024 • 119
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report

Paper • 2405.00732 • Published Apr 29, 2024 • 118
WildChat: 1M ChatGPT Interaction Logs in the Wild

Paper • 2405.01470 • Published May 2, 2024 • 61
NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment

Paper • 2405.01481 • Published May 2, 2024 • 25
FLAME: Factuality-Aware Alignment for Large Language Models

Paper • 2405.01525 • Published May 2, 2024 • 24
nabla^2DFT: A Universal Quantum Chemistry Dataset of Drug-Like Molecules and a Benchmark for Neural Network Potentials

Paper • 2406.14347 • Published Jun 20, 2024 • 98
fka/awesome-chatgpt-prompts

Viewer • Updated Sep 3, 2024 • 170 • 5.85k • 6.71k

Collection guide
Browse collections

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs