Qwen

company

alibaba_qwen

QwenLM

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

Wa2erGo authored a paper 1 day ago

Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

chujiezheng updated a dataset 5 days ago

Qwen/ProcessBench

bartowski new activity 7 days ago

Qwen/QVQ-72B-Preview:GGUF weights?

View all activity

Qwen's activity

Wa2erGo

authored a paper 1 day ago

Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

Paper • 2412.18619 • Published 16 days ago • 39

alielfilali01

posted an update 2 days ago

Post

1542

~75% on the challenging GPQA with only 40M parameters 🔥🥳

GREAT ACHIEVEMENT ! Or is it ?

This new Work, "Data Laundering: Artificially Boosting Benchmark Results through Knowledge Distillation", take out the mystery about many models i personally suspected their results. Speacially on leaderboards other than the english one, Like the Open Arabic LLM Leaderbaord OALL/Open-Arabic-LLM-Leaderboard.

The authors of this work, first started by training a model on the GPQA data, which, unsurprisingly, led to the model achieving 100% performance.

Afterward, they trained what they referred to as a 'legitimate' model on legitimate data (MedMCQA). However, they introduced a distillation loss from the earlier, 'cheated' model.

What they discovered was fascinating: the knowledge of GPQA leaked through this distillation loss, even though the legitimate model was never explicitly trained on GPQA during this stage.

This raises important questions about the careful use of distillation in model training, especially when the training data is opaque. As they demonstrated, it’s apparently possible to (intentionally or unintentionally) leak test data through this method.

Find out more: Data Laundering: Artificially Boosting Benchmark Results through Knowledge Distillation (2412.15255)

1 reply

chujiezheng

updated a dataset 5 days ago

Qwen/ProcessBench

Viewer • Updated 5 days ago • 3.4k • 948 • 31

AdinaY

posted an update 6 days ago

Post

3407

The Chinese community is shipping 🚢

DeepSeek V3 (685 B MoE) has quietly released on the hub!
Base: deepseek-ai/DeepSeek-V3-Base
Instruct: deepseek-ai/DeepSeek-V3

Can’t wait to see what’s next!

1 reply

bartowski

in Qwen/QVQ-72B-Preview 7 days ago

GGUF weights?

#1 opened 8 days ago by

luijait

littlebird13

updated a Space 7 days ago

Running

338

🌍

QVQ 72B Preview

littlebird13

in Qwen/QVQ-72B-preview 7 days ago

Fixes 500 error for some users

#1 opened 7 days ago by

Tonic

bluelike

updated a model 7 days ago

Qwen/QVQ-72B-Preview

Image-Text-to-Text • Updated 7 days ago • 34.3k • 419

Tonic

updated a Space 7 days ago

Running

338

🌍

QVQ 72B Preview

Tonic

in Qwen/QVQ-72B-preview 7 days ago

Fixes 500 error for some users

#1 opened 7 days ago by

Tonic

AdinaY

posted an update 8 days ago

Post

2806

QvQ-72B-Preview🎄 an open weight model for visual reasoning just released by Alibaba_Qwen team
Qwen/qvq-676448c820912236342b9888
✨ Combines visual understanding & language reasoning.
✨ Scores 70.3 on MMMU
✨ Outperforms Qwen2-VL-72B-Instruct in complex problem-solving