linxi
's Collections
Adapting Large Language Models via Reading Comprehension
Paper
•
2309.09530
•
Published
•
77
An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models
Paper
•
2309.09958
•
Published
•
18
Noise-Aware Training of Layout-Aware Language Models
Paper
•
2404.00488
•
Published
•
8
Streaming Dense Video Captioning
Paper
•
2404.01297
•
Published
•
11
Aurora-M: The First Open Source Multilingual Language Model Red-teamed
according to the U.S. Executive Order
Paper
•
2404.00399
•
Published
•
41
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with
Interleaved Visual-Textual Tokens
Paper
•
2404.03413
•
Published
•
25
LVLM-Intrepret: An Interpretability Tool for Large Vision-Language
Models
Paper
•
2404.03118
•
Published
•
23
Red Teaming GPT-4V: Are GPT-4V Safe Against Uni/Multi-Modal Jailbreak
Attacks?
Paper
•
2404.03411
•
Published
•
8
Mixture-of-Depths: Dynamically allocating compute in transformer-based
language models
Paper
•
2404.02258
•
Published
•
104
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale
Prediction
Paper
•
2404.02905
•
Published
•
65
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image
Generation
Paper
•
2404.02733
•
Published
•
20
FlowMind: Automatic Workflow Generation with LLMs
Paper
•
2404.13050
•
Published
•
33
Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image
Synthesis
Paper
•
2404.13686
•
Published
•
27
SEED-X: Multimodal Models with Unified Multi-granularity Comprehension
and Generation
Paper
•
2404.14396
•
Published
•
18
LAMBDA: A Large Model Based Data Agent
Paper
•
2407.17535
•
Published
•
35
AMEX: Android Multi-annotation Expo Dataset for Mobile GUI Agents
Paper
•
2407.17490
•
Published
•
31
Very Large-Scale Multi-Agent Simulation in AgentScope
Paper
•
2407.17789
•
Published
•
32
Efficient Inference of Vision Instruction-Following Models with Elastic
Cache
Paper
•
2407.18121
•
Published
•
17
VILA^2: VILA Augmented VILA
Paper
•
2407.17453
•
Published
•
39
OutfitAnyone: Ultra-high Quality Virtual Try-On for Any Clothing and Any
Person
Paper
•
2407.16224
•
Published
•
27
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language
Models
Paper
•
2407.15841
•
Published
•
40
An Object is Worth 64x64 Pixels: Generating 3D Object via Image
Diffusion
Paper
•
2408.03178
•
Published
•
38
VidGen-1M: A Large-Scale Dataset for Text-to-video Generation
Paper
•
2408.02629
•
Published
•
13
MiniCPM-V: A GPT-4V Level MLLM on Your Phone
Paper
•
2408.01800
•
Published
•
79
The MERIT Dataset: Modelling and Efficiently Rendering Interpretable
Transcripts
Paper
•
2409.00447
•
Published
•
2