Hongsheng LI

hsli-cuhk

https://www.ee.cuhk.edu.hk/~hsli/

AI & ML interests

None yet

Recent Activity

authored a paper 21 days ago

EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM

authored a paper 21 days ago

SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding

authored a paper 25 days ago

StreamChat: Chatting with Streaming Video

View all activity

Organizations

None yet

hsli-cuhk's activity

authored 2 papers 21 days ago

EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM

Paper • 2412.09618 • Published 25 days ago • 21

SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding

Paper • 2412.09604 • Published 25 days ago • 35

authored a paper 25 days ago

StreamChat: Chatting with Streaming Video

Paper • 2412.08646 • Published 26 days ago • 17

authored a paper about 2 months ago

BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices

Paper • 2411.10640 • Published Nov 16, 2024 • 44

authored 3 papers 3 months ago

PUMA: Empowering Unified MLLM with Multi-granular Visual Generation

Paper • 2410.13861 • Published Oct 17, 2024 • 53

Rectified Diffusion: Straightness Is Not Your Need in Rectified Flow

Paper • 2410.07303 • Published Oct 9, 2024 • 18

MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code

Paper • 2410.08196 • Published Oct 10, 2024 • 45

authored 3 papers 4 months ago

MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines

Paper • 2409.12959 • Published Sep 19, 2024 • 37

LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation

Paper • 2408.15881 • Published Aug 28, 2024 • 21

GenCA: A Text-conditioned Generative Model for Realistic and Drivable Codec Avatars

Paper • 2408.13674 • Published Aug 24, 2024 • 18

authored 2 papers 5 months ago

Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining

Paper • 2408.02657 • Published Aug 5, 2024 • 33

AMEX: Android Multi-annotation Expo Dataset for Mobile GUI Agents

Paper • 2407.17490 • Published Jul 3, 2024 • 31

authored a paper 6 months ago

MAVIS: Mathematical Visual Instruction Tuning

Paper • 2407.08739 • Published Jul 11, 2024 • 31

authored 3 papers 7 months ago

Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models

Paper • 2406.11831 • Published Jun 17, 2024 • 21

Phased Consistency Model

Paper • 2405.18407 • Published May 28, 2024 • 46

Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control

Paper • 2405.17414 • Published May 27, 2024 • 10

authored a paper 9 months ago

Urban Architect: Steerable 3D Urban Scene Generation with Layout Prior

Paper • 2404.06780 • Published Apr 10, 2024 • 9

authored 3 papers 10 months ago

MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

Paper • 2403.14624 • Published Mar 21, 2024 • 51

FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis

Paper • 2403.12963 • Published Mar 19, 2024 • 7

Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation

Paper • 2403.13745 • Published Mar 20, 2024 • 11