Jerrycool
's Collections
daily_paper_coll
updated
Griffin: Mixing Gated Linear Recurrences with Local Attention for
Efficient Language Models
Paper
•
2402.19427
•
Published
•
52
Beyond Language Models: Byte Models are Digital World Simulators
Paper
•
2402.19155
•
Published
•
49
StarCoder 2 and The Stack v2: The Next Generation
Paper
•
2402.19173
•
Published
•
136
Simple linear attention language models balance the recall-throughput
tradeoff
Paper
•
2402.18668
•
Published
•
18
Priority Sampling of Large Language Models for Compilers
Paper
•
2402.18734
•
Published
•
16
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper
•
2402.17764
•
Published
•
605
When Scaling Meets LLM Finetuning: The Effect of Data, Model and
Finetuning Method
Paper
•
2402.17193
•
Published
•
23
Towards Optimal Learning of Language Models
Paper
•
2402.17759
•
Published
•
16
Training-Free Long-Context Scaling of Large Language Models
Paper
•
2402.17463
•
Published
•
19
MobileLLM: Optimizing Sub-billion Parameter Language Models for
On-Device Use Cases
Paper
•
2402.14905
•
Published
•
126
Divide-or-Conquer? Which Part Should You Distill Your LLM?
Paper
•
2402.15000
•
Published
•
22
MathScale: Scaling Instruction Tuning for Mathematical Reasoning
Paper
•
2403.02884
•
Published
•
15
Design2Code: How Far Are We From Automating Front-End Engineering?
Paper
•
2403.03163
•
Published
•
93
Wukong: Towards a Scaling Law for Large-Scale Recommendation
Paper
•
2403.02545
•
Published
•
15
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper
•
2403.03507
•
Published
•
183
ShortGPT: Layers in Large Language Models are More Redundant Than You
Expect
Paper
•
2403.03853
•
Published
•
61
SaulLM-7B: A pioneering Large Language Model for Law
Paper
•
2403.03883
•
Published
•
77
Backtracing: Retrieving the Cause of the Query
Paper
•
2403.03956
•
Published
•
10
Learning to Decode Collaboratively with Multiple Language Models
Paper
•
2403.03870
•
Published
•
18
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
Paper
•
2403.04132
•
Published
•
38
Yi: Open Foundation Models by 01.AI
Paper
•
2403.04652
•
Published
•
62
LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error
Paper
•
2403.04746
•
Published
•
22
Common 7B Language Models Already Possess Strong Math Capabilities
Paper
•
2403.04706
•
Published
•
16
How Far Are We from Intelligent Visual Deductive Reasoning?
Paper
•
2403.04732
•
Published
•
19
Stealing Part of a Production Language Model
Paper
•
2403.06634
•
Published
•
90
Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a
Single GPU
Paper
•
2403.06504
•
Published
•
53
Algorithmic progress in language models
Paper
•
2403.05812
•
Published
•
18
BurstAttention: An Efficient Distributed Attention Framework for
Extremely Long Sequences
Paper
•
2403.09347
•
Published
•
20
Language models scale reliably with over-training and on downstream
tasks
Paper
•
2403.08540
•
Published
•
14
On the Societal Impact of Open Foundation Models
Paper
•
2403.07918
•
Published
•
16
Long-form factuality in large language models
Paper
•
2403.18802
•
Published
•
24
Towards a World-English Language Model for On-Device Virtual Assistants
Paper
•
2403.18783
•
Published
•
4
The Unreasonable Ineffectiveness of the Deeper Layers
Paper
•
2403.17887
•
Published
•
78
LLM Agent Operating System
Paper
•
2403.16971
•
Published
•
65
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient
LLMs Under Compression
Paper
•
2403.15447
•
Published
•
16
Can large language models explore in-context?
Paper
•
2403.15371
•
Published
•
32
Quiet-STaR: Language Models Can Teach Themselves to Think Before
Speaking
Paper
•
2403.09629
•
Published
•
75
Jamba: A Hybrid Transformer-Mamba Language Model
Paper
•
2403.19887
•
Published
•
104
Transformer-Lite: High-efficiency Deployment of Large Language Models on
Mobile Phone GPUs
Paper
•
2403.20041
•
Published
•
34
Gecko: Versatile Text Embeddings Distilled from Large Language Models
Paper
•
2403.20327
•
Published
•
47
Advancing LLM Reasoning Generalists with Preference Trees
Paper
•
2404.02078
•
Published
•
44
Long-context LLMs Struggle with Long In-context Learning
Paper
•
2404.02060
•
Published
•
36
ReFT: Representation Finetuning for Language Models
Paper
•
2404.03592
•
Published
•
91
Training LLMs over Neurally Compressed Text
Paper
•
2404.03626
•
Published
•
21
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models
with a Self-Critique Pipeline
Paper
•
2404.02893
•
Published
•
20
Mixture-of-Depths: Dynamically allocating compute in transformer-based
language models
Paper
•
2404.02258
•
Published
•
104
Red Teaming GPT-4V: Are GPT-4V Safe Against Uni/Multi-Modal Jailbreak
Attacks?
Paper
•
2404.03411
•
Published
•
8
Language Models as Compilers: Simulating Pseudocode Execution Improves
Algorithmic Reasoning in Language Models
Paper
•
2404.02575
•
Published
•
48
Direct Nash Optimization: Teaching Language Models to Self-Improve with
General Preferences
Paper
•
2404.03715
•
Published
•
60
Stream of Search (SoS): Learning to Search in Language
Paper
•
2404.03683
•
Published
•
29
CantTalkAboutThis: Aligning Language Models to Stay on Topic in
Dialogues
Paper
•
2404.03820
•
Published
•
24
Social Skill Training with Large Language Models
Paper
•
2404.04204
•
Published
•
15
Pre-training Small Base LMs with Fewer Tokens
Paper
•
2404.08634
•
Published
•
34
Learn Your Reference Model for Real Good Alignment
Paper
•
2404.09656
•
Published
•
82
Megalodon: Efficient LLM Pretraining and Inference with Unlimited
Context Length
Paper
•
2404.08801
•
Published
•
64
Prometheus 2: An Open Source Language Model Specialized in Evaluating
Other Language Models
Paper
•
2405.01535
•
Published
•
119
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report
Paper
•
2405.00732
•
Published
•
118
WildChat: 1M ChatGPT Interaction Logs in the Wild
Paper
•
2405.01470
•
Published
•
61
NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment
Paper
•
2405.01481
•
Published
•
25
FLAME: Factuality-Aware Alignment for Large Language Models
Paper
•
2405.01525
•
Published
•
24
nabla^2DFT: A Universal Quantum Chemistry Dataset of Drug-Like
Molecules and a Benchmark for Neural Network Potentials
Paper
•
2406.14347
•
Published
•
98
fka/awesome-chatgpt-prompts
Viewer
•
Updated
•
170
•
5.85k
•
6.71k