-
Attention Is All You Need
Paper β’ 1706.03762 β’ Published β’ 50 -
LoRA: Low-Rank Adaptation of Large Language Models
Paper β’ 2106.09685 β’ Published β’ 31 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper β’ 2305.18290 β’ Published β’ 53 -
Lost in the Middle: How Language Models Use Long Contexts
Paper β’ 2307.03172 β’ Published β’ 38
Collections
Discover the best community collections!
Collections including paper arxiv:2310.06825
-
open-llm-leaderboard-old/details_TheBloke__Guanaco-3B-Uncensored-v2-GPTQ
Updated β’ 97 -
open-llm-leaderboard-old/details_TheBloke__WizardLM-13B-V1-1-SuperHOT-8K-GPTQ
Updated β’ 149 -
Mistral 7B
Paper β’ 2310.06825 β’ Published β’ 47 -
NousResearch/Yarn-Mistral-7b-128k
Text Generation β’ Updated β’ 20.7k β’ 573
-
Zephyr: Direct Distillation of LM Alignment
Paper β’ 2310.16944 β’ Published β’ 123 -
Mistral 7B
Paper β’ 2310.06825 β’ Published β’ 47 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper β’ 2307.09288 β’ Published β’ 245 -
flax-community/gpt-2-spanish
Text Generation β’ Updated β’ 995 β’ 27
-
Attention Is All You Need
Paper β’ 1706.03762 β’ Published β’ 50 -
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Paper β’ 2307.08691 β’ Published β’ 8 -
Mixtral of Experts
Paper β’ 2401.04088 β’ Published β’ 158 -
Mistral 7B
Paper β’ 2310.06825 β’ Published β’ 47
-
Llemma: An Open Language Model For Mathematics
Paper β’ 2310.10631 β’ Published β’ 52 -
Mistral 7B
Paper β’ 2310.06825 β’ Published β’ 47 -
Qwen Technical Report
Paper β’ 2309.16609 β’ Published β’ 35 -
BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model
Paper β’ 2309.11568 β’ Published β’ 10
-
Attention Is All You Need
Paper β’ 1706.03762 β’ Published β’ 50 -
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Paper β’ 2005.11401 β’ Published β’ 11 -
LoRA: Low-Rank Adaptation of Large Language Models
Paper β’ 2106.09685 β’ Published β’ 31 -
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Paper β’ 2205.14135 β’ Published β’ 13
-
SIMPL: A Simple and Efficient Multi-agent Motion Prediction Baseline for Autonomous Driving
Paper β’ 2402.02519 β’ Published -
Mixtral of Experts
Paper β’ 2401.04088 β’ Published β’ 158 -
Optimal Transport Aggregation for Visual Place Recognition
Paper β’ 2311.15937 β’ Published -
GOAT: GO to Any Thing
Paper β’ 2311.06430 β’ Published β’ 14
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper β’ 2402.17764 β’ Published β’ 609 -
Mixtral of Experts
Paper β’ 2401.04088 β’ Published β’ 158 -
Mistral 7B
Paper β’ 2310.06825 β’ Published β’ 47 -
Don't Make Your LLM an Evaluation Benchmark Cheater
Paper β’ 2311.01964 β’ Published β’ 1