33 76 207

dame rajee

damerajee

AI & ML interests

None yet

Recent Activity

upvoted an article 3 days ago

🐺🐦‍⬛ LLM Comparison/Test: DeepSeek-V3, QVQ-72B-Preview, Falcon3 10B, Llama 3.3 70B, Nemotron 70B in my updated MMLU-Pro CS benchmark

liked a model 3 days ago

ruliad/deepthought-8b-llama-v0.01-alpha

upvoted a paper 7 days ago

Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment

View all activity

Organizations

damerajee's activity

upvoted an article 3 days ago

Article

🐺🐦‍⬛ LLM Comparison/Test: DeepSeek-V3, QVQ-72B-Preview, Falcon3 10B, Llama 3.3 70B, Nemotron 70B in my updated MMLU-Pro CS benchmark

•

4 days ago

• 30

liked a model 3 days ago

ruliad/deepthought-8b-llama-v0.01-alpha

Text Generation • Updated 30 days ago • 28.6k • 139

upvoted a paper 7 days ago

Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment

Paper • 2412.19326 • Published 10 days ago • 18

liked a model 7 days ago

PowerInfer/SmallThinker-3B-Preview

Text Generation • Updated about 3 hours ago • 4.66k • • 237

liked a model 9 days ago

luodian/OTTER-Video-LLaMA7B-DenseCaption

Text2Text Generation • Updated Jun 23, 2023 • 34 • 18

upvoted a paper 9 days ago

Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Paper • 2412.18319 • Published 13 days ago • 34

upvoted 2 papers 11 days ago

Token-Budget-Aware LLM Reasoning

Paper • 2412.18547 • Published 13 days ago • 42

ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing

Paper • 2412.14711 • Published 18 days ago • 15

upvoted 2 papers 12 days ago

TRecViT: A Recurrent Video Transformer

Paper • 2412.14294 • Published 18 days ago • 12

MixLLM: LLM Quantization with Global Mixed-precision between Output-features and Highly-efficient System Design

Paper • 2412.14590 • Published 18 days ago • 13

liked a model 12 days ago

answerdotai/ModernBERT-base

Fill-Mask • Updated 11 days ago • 96.8k • 603

upvoted an article 13 days ago

Article

Deriving DPO's Loss

•

13 days ago

• 24

New activity in cointegrated/SONAR_200_text_encoder 14 days ago

can you please do the same for decoder

#2 opened 14 days ago by

damerajee

liked a Space 15 days ago

Running

🐨

Video Llava

liked a Space 17 days ago

Running on Zero

372

🚀

Llama-Vision-11B

liked a model 17 days ago

facebook/SONAR

Updated Feb 14, 2024 • 40

liked a dataset 18 days ago

RLHFlow/Deepseek-PRM-Data

Viewer • Updated Nov 9, 2024 • 253k • 71 • 7

upvoted a paper 18 days ago

Proposer-Agent-Evaluator(PAE): Autonomous Skill Discovery For Foundation Model Internet Agents

Paper • 2412.13194 • Published 19 days ago • 12

upvoted a paper 19 days ago

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

Paper • 2408.03314 • Published Aug 6, 2024 • 54

upvoted a paper 20 days ago

Smaller Language Models Are Better Instruction Evolvers

Paper • 2412.11231 • Published 22 days ago • 27