11 9 3

Furu Wei

thegenerality

AI & ML interests

None yet

Recent Activity

authored a paper 21 days ago

Multimodal Latent Language Modeling with Next-Token Diffusion

upvoted a paper 24 days ago

Multimodal Latent Language Modeling with Next-Token Diffusion

authored a paper about 1 month ago

BitNet a4.8: 4-bit Activations for 1-bit LLMs

View all activity

Organizations

None yet

thegenerality's activity

authored a paper 21 days ago

Multimodal Latent Language Modeling with Next-Token Diffusion

Paper • 2412.08635 • Published 26 days ago • 41

upvoted a paper 24 days ago

Multimodal Latent Language Modeling with Next-Token Diffusion

Paper • 2412.08635 • Published 26 days ago • 41

authored 2 papers about 1 month ago

BitNet a4.8: 4-bit Activations for 1-bit LLMs

Paper • 2411.04965 • Published Nov 7, 2024 • 64

MH-MoE:Multi-Head Mixture-of-Experts

Paper • 2411.16205 • Published Nov 25, 2024 • 24

upvoted a paper about 1 month ago

MH-MoE:Multi-Head Mixture-of-Experts

Paper • 2411.16205 • Published Nov 25, 2024 • 24

upvoted a paper about 2 months ago

BitNet a4.8: 4-bit Activations for 1-bit LLMs

Paper • 2411.04965 • Published Nov 7, 2024 • 64

authored 3 papers 3 months ago

upvoted a paper 3 months ago

Differential Transformer

Paper • 2410.05258 • Published Oct 7, 2024 • 169

upvoted a paper 6 months ago

Q-Sparse: All Large Language Models can be Fully Sparsely-Activated

Paper • 2407.10969 • Published Jul 15, 2024 • 20

authored a paper 6 months ago

Autoregressive Speech Synthesis without Vector Quantization

Paper • 2407.08551 • Published Jul 11, 2024 • 14

upvoted a paper 6 months ago

Direct Preference Knowledge Distillation for Large Language Models

Paper • 2406.19774 • Published Jun 28, 2024 • 21

authored a paper 6 months ago

Direct Preference Knowledge Distillation for Large Language Models

Paper • 2406.19774 • Published Jun 28, 2024 • 21

authored 2 papers 7 months ago

Instruction Pre-Training: Language Models are Supervised Multitask Learners

Paper • 2406.14491 • Published Jun 20, 2024 • 86

VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers

Paper • 2406.05370 • Published Jun 8, 2024 • 15

authored a paper 8 months ago

MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning

Paper • 2405.12130 • Published May 20, 2024 • 46

liked a model 8 months ago

microsoft/kosmos-2.5

Text2Text Generation • Updated Aug 28, 2024 • 1.04k • 178

authored a paper 9 months ago

Multi-Head Mixture-of-Experts

Paper • 2404.15045 • Published Apr 23, 2024 • 59

liked a model 9 months ago

1bitLLM/bitnet_b1_58-3B

Text Generation • Updated Mar 29, 2024 • 1.96k • 242