hllj
's Collections
Vision-Language Model
updated
Visual Instruction Tuning
Paper
•
2304.08485
•
Published
•
13
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities
Paper
•
2308.12966
•
Published
•
7
Improved Baselines with Visual Instruction Tuning
Paper
•
2310.03744
•
Published
•
37
SILC: Improving Vision Language Pretraining with Self-Distillation
Paper
•
2310.13355
•
Published
•
8
CogVLM: Visual Expert for Pretrained Language Models
Paper
•
2311.03079
•
Published
•
23
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
Paper
•
2311.12793
•
Published
•
18
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Paper
•
2403.05525
•
Published
•
40
OmniFusion Technical Report
Paper
•
2404.06212
•
Published
•
74
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language
Models
Paper
•
2404.12387
•
Published
•
38
Pegasus-v1 Technical Report
Paper
•
2404.14687
•
Published
•
30
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal
Models with Open-Source Suites
Paper
•
2404.16821
•
Published
•
55
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video
Dense Captioning
Paper
•
2404.16994
•
Published
•
35
What matters when building vision-language models?
Paper
•
2405.02246
•
Published
•
101
InternLM-XComposer-2.5: A Versatile Large Vision Language Model
Supporting Long-Contextual Input and Output
Paper
•
2407.03320
•
Published
•
93
Understanding Alignment in Multimodal LLMs: A Comprehensive Study
Paper
•
2407.02477
•
Published
•
21
μ-Bench: A Vision-Language Benchmark for Microscopy Understanding
Paper
•
2407.01791
•
Published
•
5