wolosonovich
's Collections
Research
updated
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme
Long Sequence Transformer Models
Paper
•
2309.14509
•
Published
•
17
LLM Augmented LLMs: Expanding Capabilities through Composition
Paper
•
2401.02412
•
Published
•
36
DeepSeekMoE: Towards Ultimate Expert Specialization in
Mixture-of-Experts Language Models
Paper
•
2401.06066
•
Published
•
44
Tuning Language Models by Proxy
Paper
•
2401.08565
•
Published
•
21
Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding
Paper
•
2401.12954
•
Published
•
29
ReFT: Reasoning with Reinforced Fine-Tuning
Paper
•
2401.08967
•
Published
•
29
MambaByte: Token-free Selective State Space Model
Paper
•
2401.13660
•
Published
•
52
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
Paper
•
2401.18059
•
Published
•
36
Specialized Language Models with Cheap Inference from Limited Domain
Data
Paper
•
2402.01093
•
Published
•
45
Repeat After Me: Transformers are Better than State Space Models at
Copying
Paper
•
2402.01032
•
Published
•
22
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Paper
•
2402.01739
•
Published
•
26
Scaling Laws for Downstream Task Performance of Large Language Models
Paper
•
2402.04177
•
Published
•
17
An Interactive Agent Foundation Model
Paper
•
2402.05929
•
Published
•
27
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
Paper
•
2402.04291
•
Published
•
48
Mixtures of Experts Unlock Parameter Scaling for Deep RL
Paper
•
2402.08609
•
Published
•
34
Aya Model: An Instruction Finetuned Open-Access Multilingual Language
Model
Paper
•
2402.07827
•
Published
•
45
Tandem Transformers for Inference Efficient LLMs
Paper
•
2402.08644
•
Published
•
8
Learning to Learn Faster from Human Feedback with Language Model
Predictive Control
Paper
•
2402.11450
•
Published
•
21
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM
Workflows
Paper
•
2402.10379
•
Published
•
30
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Paper
•
2402.13753
•
Published
•
114
User-LLM: Efficient LLM Contextualization with User Embeddings
Paper
•
2402.13598
•
Published
•
19
OmniPred: Language Models as Universal Regressors
Paper
•
2402.14547
•
Published
•
12
Beyond A*: Better Planning with Transformers via Search Dynamics
Bootstrapping
Paper
•
2402.14083
•
Published
•
47
MegaScale: Scaling Large Language Model Training to More Than 10,000
GPUs
Paper
•
2402.15627
•
Published
•
34
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper
•
2402.17764
•
Published
•
605
Stop Regressing: Training Value Functions via Classification for
Scalable Deep RL
Paper
•
2403.03950
•
Published
•
13
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper
•
2403.03507
•
Published
•
183
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
Paper
•
2403.07816
•
Published
•
39
Chronos: Learning the Language of Time Series
Paper
•
2403.07815
•
Published
•
46
Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a
Single GPU
Paper
•
2403.06504
•
Published
•
53
PERL: Parameter Efficient Reinforcement Learning from Human Feedback
Paper
•
2403.10704
•
Published
•
57
TnT-LLM: Text Mining at Scale with Large Language Models
Paper
•
2403.12173
•
Published
•
19
RLHF Workflow: From Reward Modeling to Online RLHF
Paper
•
2405.07863
•
Published
•
66
SLAB: Efficient Transformers with Simplified Linear Attention and
Progressive Re-parameterized Batch Normalization
Paper
•
2405.11582
•
Published
•
13
Towards Modular LLMs by Building and Reusing a Library of LoRAs
Paper
•
2405.11157
•
Published
•
27
Block Transformer: Global-to-Local Language Modeling for Fast Inference
Paper
•
2406.02657
•
Published
•
37
GEB-1.3B: Open Lightweight Large Language Model
Paper
•
2406.09900
•
Published
•
20
Is It Really Long Context if All You Need Is Retrieval? Towards
Genuinely Difficult Long Context NLP
Paper
•
2407.00402
•
Published
•
22
Agentless: Demystifying LLM-based Software Engineering Agents
Paper
•
2407.01489
•
Published
•
42
On Leakage of Code Generation Evaluation Datasets
Paper
•
2407.07565
•
Published
•
5
SpreadsheetLLM: Encoding Spreadsheets for Large Language Models
Paper
•
2407.09025
•
Published
•
130
E5-V: Universal Embeddings with Multimodal Large Language Models
Paper
•
2407.12580
•
Published
•
39
Improving Text Embeddings for Smaller Language Models Using Contrastive
Fine-tuning
Paper
•
2408.00690
•
Published
•
23
CodexGraph: Bridging Large Language Models and Code Repositories via
Code Graph Databases
Paper
•
2408.03910
•
Published
•
15
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers
Paper
•
2408.06195
•
Published
•
63
OLMoE: Open Mixture-of-Experts Language Models
Paper
•
2409.02060
•
Published
•
77
Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of
Experts
Paper
•
2409.16040
•
Published
•
12
Large Language Models as Markov Chains
Paper
•
2410.02724
•
Published
•
30
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via
Inference-time Hybrid Information Structurization
Paper
•
2410.08815
•
Published
•
44
Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free
Paper
•
2410.10814
•
Published
•
49
HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge
in RAG Systems
Paper
•
2411.02959
•
Published
•
64
Star Attention: Efficient LLM Inference over Long Sequences
Paper
•
2411.17116
•
Published
•
47