Collections

19

Creative Robot Tool Use with Large Language Models

Paper • 2310.13065 • Published Oct 19, 2023 • 8
CodeCoT and Beyond: Learning to Program and Test like a Developer

Paper • 2308.08784 • Published Aug 17, 2023 • 5
Lemur: Harmonizing Natural Language and Code for Language Agents

Paper • 2310.06830 • Published Oct 10, 2023 • 31
CodePlan: Repository-level Coding using LLMs and Planning

Paper • 2309.12499 • Published Sep 21, 2023 • 74

4

DualMix: Unleashing the Potential of Data Augmentation for Online Class-Incremental Learning

Paper • 2303.07864 • Published Mar 14, 2023 • 1
Self-Evolution Learning for Mixup: Enhance Data Augmentation on Few-Shot Text Classification Tasks

Paper • 2305.13547 • Published May 22, 2023 • 1
MixPro: Simple yet Effective Data Augmentation for Prompt-based Learning

Paper • 2304.09402 • Published Apr 19, 2023 • 2
LM-CPPF: Paraphrasing-Guided Data Augmentation for Contrastive Prompt-Based Few-Shot Fine-Tuning

Paper • 2305.18169 • Published May 29, 2023 • 1

LLM360: Towards Fully Transparent Open-Source LLMs

Paper • 2312.06550 • Published Dec 11, 2023 • 57

Creative Robot Tool Use with Large Language Models

CodeCoT and Beyond: Learning to Program and Test like a Developer

Lemur: Harmonizing Natural Language and Code for Language Agents

CodePlan: Repository-level Coding using LLMs and Planning

DualMix: Unleashing the Potential of Data Augmentation for Online Class-Incremental Learning

Self-Evolution Learning for Mixup: Enhance Data Augmentation on Few-Shot Text Classification Tasks

MixPro: Simple yet Effective Data Augmentation for Prompt-based Learning

LM-CPPF: Paraphrasing-Guided Data Augmentation for Contrastive Prompt-Based Few-Shot Fine-Tuning

A technical note on bilinear layers for interpretability

Interpreting Transformer's Attention Dynamic Memory and Visualizing the Semantic Information Flow of GPT

Explainable AI for Pre-Trained Code Models: What Do They Learn? When They Do Not Work?

The Linear Representation Hypothesis and the Geometry of Large Language Models

The Impact of Depth and Width on Transformer Language Model Generalization

Retentive Network: A Successor to Transformer for Large Language Models

RWKV: Reinventing RNNs for the Transformer Era

Attention Is All You Need

AlpaGasus: Training A Better Alpaca with Fewer Data

The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset

MADLAD-400: A Multilingual And Document-Level Large Audited Dataset

SlimPajama-DC: Understanding Data Combinations for LLM Training

MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series

LLM360/K2

OLMo: Accelerating the Science of Language Models

LLM360: Towards Fully Transparent Open-Source LLMs

Self-Rewarding Language Models

ReFT: Reasoning with Reinforced Fine-Tuning

Tuning Language Models by Proxy

TrustLLM: Trustworthiness in Large Language Models

aMUSEd: An Open MUSE Reproduction

From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations

SteinDreamer: Variance Reduction for Text-to-3D Score Distillation via Stein Identity

LARP: Language-Agent Role Play for Open-World Games

Attention Is All You Need

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

RoBERTa: A Robustly Optimized BERT Pretraining Approach

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

LLM360: Towards Fully Transparent Open-Source LLMs