ZrrSkywalker (Renrui)

upvoted a paper 2 months ago

Training-free Regional Prompting for Diffusion Transformers

Paper • 2411.02395 • Published Nov 4, 2024 • 25

upvoted a paper 3 months ago

PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions

Paper • 2409.15278 • Published Sep 23, 2024 • 24

commented a paper 3 months ago

PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions

Paper • 2409.15278 • Published Sep 23, 2024 • 24 •

2

liked a dataset 4 months ago

CaraJ/MMSearch

Viewer • Updated Nov 16, 2024 • 900 • 196 • 19

upvoted 2 papers 4 months ago

MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines

Paper • 2409.12959 • Published Sep 19, 2024 • 37

InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning

Paper • 2409.12568 • Published Sep 19, 2024 • 48

updated a collection 4 months ago

SAM2Point

Collection

1 item • Updated Aug 30, 2024

upvoted a paper 4 months ago

SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners

Paper • 2408.16768 • Published Aug 29, 2024 • 26

liked a Space 4 months ago

Running on Zero

16

🌖

SAM2Point

Segment Any 3D as Videos

upvoted a paper 5 months ago

LLaVA-OneVision: Easy Visual Task Transfer

Paper • 2408.03326 • Published Aug 6, 2024 • 60

New activity in lmms-lab/M4-Instruct-Data 6 months ago

About the difference between the 'm4_instruct_annotations.json' now and before

2

#3 opened 6 months ago by

cyhbrilliant

authored a paper 6 months ago

MAVIS: Mathematical Visual Instruction Tuning

Paper • 2407.08739 • Published Jul 11, 2024 • 31

upvoted a paper 6 months ago

MAVIS: Mathematical Visual Instruction Tuning

Paper • 2407.08739 • Published Jul 11, 2024 • 31

commented a paper 6 months ago

MAVIS: Mathematical Visual Instruction Tuning

Paper • 2407.08739 • Published Jul 11, 2024 • 31 •

3

authored a paper 6 months ago

LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models

Paper • 2407.07895 • Published Jul 10, 2024 • 40

upvoted a paper 6 months ago

LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models

Paper • 2407.07895 • Published Jul 10, 2024 • 40

liked a dataset 9 months ago

Vision-Flan/vision-flan

Viewer • Updated Apr 19, 2024 • 2.24k • 88 • 5

upvoted a paper 9 months ago

CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching

Paper • 2404.03653 • Published Apr 4, 2024 • 33

updated a Space 10 months ago

Running

🧠

Renrui

AI & ML interests

Organizations

ZrrSkywalker's activity

Training-free Regional Prompting for Diffusion Transformers

PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions

PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions

CaraJ/MMSearch

MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines

InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning

SAM2Point

SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners

SAM2Point

LLaVA-OneVision: Easy Visual Task Transfer

About the difference between the 'm4_instruct_annotations.json' now and before

MAVIS: Mathematical Visual Instruction Tuning

MAVIS: Mathematical Visual Instruction Tuning

MAVIS: Mathematical Visual Instruction Tuning

LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models

LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models

Vision-Flan/vision-flan

CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching

Nerfies: Deformable Neural Radiance Fields