-
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 66 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 146 -
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper • 2403.09611 • Published • 126 -
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper • 2403.03507 • Published • 184
Collections
Discover the best community collections!
Collections including paper arxiv:2403.03507
-
Multilingual Instruction Tuning With Just a Pinch of Multilinguality
Paper • 2401.01854 • Published • 10 -
LLaMA Beyond English: An Empirical Study on Language Capability Transfer
Paper • 2401.01055 • Published • 54 -
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
Paper • 2401.01325 • Published • 27 -
Improving Text Embeddings with Large Language Models
Paper • 2401.00368 • Published • 79
-
Parameter-Efficient Transfer Learning of Audio Spectrogram Transformers
Paper • 2312.03694 • Published • 2 -
FaceStudio: Put Your Face Everywhere in Seconds
Paper • 2312.02663 • Published • 30 -
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models
Paper • 2402.07033 • Published • 17 -
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper • 2403.03507 • Published • 184
-
A survey on Kornia: an Open Source Differentiable Computer Vision Library for PyTorch
Paper • 2009.10521 • Published • 1 -
Kornia: an Open Source Differentiable Computer Vision Library for PyTorch
Paper • 1910.02190 • Published • 1 -
Learning Symmetrization for Equivariance with Orbit Distance Minimization
Paper • 2311.07143 • Published • 1 -
GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting
Paper • 2311.11700 • Published • 4
-
System 2 Attention (is something you might need too)
Paper • 2311.11829 • Published • 39 -
TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems
Paper • 2311.11315 • Published • 6 -
Alignment for Honesty
Paper • 2312.07000 • Published • 12 -
Steering Llama 2 via Contrastive Activation Addition
Paper • 2312.06681 • Published • 12
-
UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs
Paper • 2311.09257 • Published • 45 -
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference
Paper • 2310.04378 • Published • 19 -
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Paper • 2309.14717 • Published • 44 -
Exponentially Faster Language Modelling
Paper • 2311.10770 • Published • 117
-
Can LLMs Follow Simple Rules?
Paper • 2311.04235 • Published • 10 -
The Unreasonable Ineffectiveness of the Deeper Layers
Paper • 2403.17887 • Published • 79 -
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper • 2403.03507 • Published • 184 -
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
Paper • 2402.17177 • Published • 88
-
LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models
Paper • 2310.08659 • Published • 25 -
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Paper • 2309.14717 • Published • 44 -
ModuLoRA: Finetuning 3-Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers
Paper • 2309.16119 • Published • 1 -
LoRA ensembles for large language model fine-tuning
Paper • 2310.00035 • Published • 2
-
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
Paper • 2309.14509 • Published • 17 -
LLM Augmented LLMs: Expanding Capabilities through Composition
Paper • 2401.02412 • Published • 36 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 45 -
Tuning Language Models by Proxy
Paper • 2401.08565 • Published • 21