VILA / Molmo

AI & ML interests

None defined yet.

Recent Activity

Ligeng-Zhu authored a paper 6 days ago

LongVILA: Scaling Long-Context Visual Language Models for Long Videos

Ligeng-Zhu authored a paper 6 days ago

VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation

Ligeng-Zhu authored a paper 6 days ago

SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers

View all activity

vila-molmo's activity

Ligeng-Zhu

authored 6 papers 6 days ago

LongVILA: Scaling Long-Context Visual Language Models for Long Videos

Paper • 2408.10188 • Published Aug 19, 2024 • 51

VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation

Paper • 2409.04429 • Published Sep 6, 2024

SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers

Paper • 2410.10629 • Published Oct 14, 2024 • 9

COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training

Paper • 2410.19313 • Published Oct 25, 2024 • 19

TinyTL: Reduce Activations, Not Trainable Parameters for Efficient On-Device Learning

Paper • 2007.11622 • Published Jul 22, 2020

NVILA: Efficient Frontier Visual Language Models

Paper • 2412.04468 • Published Dec 5, 2024 • 57

cydhsieh01

authored a paper 26 days ago

Perception Tokens Enhance Visual Reasoning in Multimodal Language Models

Paper • 2412.03548 • Published Dec 4, 2024 • 17

t1101675

authored a paper 28 days ago

NVILA: Efficient Frontier Visual Language Models

Paper • 2412.04468 • Published Dec 5, 2024 • 57

cydhsieh01

updated a model about 1 month ago

vila-molmo/molmo-dense-captioner-v22-qwen2

Updated Nov 25, 2024 • 18

t1101675

authored a paper 2 months ago

MiniPLM: Knowledge Distillation for Pre-Training Language Models

Paper • 2410.17215 • Published Oct 22, 2024 • 14

t1101675

authored a paper 3 months ago

Data Selection via Optimal Control for Language Models

Paper • 2410.07064 • Published Oct 9, 2024 • 8

Ligeng-Zhu

authored 2 papers 5 months ago

Wolf: Captioning Everything with a World Summarization Framework

Paper • 2407.18908 • Published Jul 26, 2024 • 32

$VILA^2$: VILA Augmented VILA

Paper • 2407.17453 • Published Jul 24, 2024 • 39

cydhsieh01

authored a paper 6 months ago

Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization

Paper • 2406.16008 • Published Jun 23, 2024 • 6

t1101675

authored a paper 6 months ago

Direct Preference Knowledge Distillation for Large Language Models

Paper • 2406.19774 • Published Jun 28, 2024 • 21

cydhsieh01

authored 5 papers 6 months ago

On the (In)fidelity and Sensitivity for Explanations

Paper • 1901.09392 • Published Jan 27, 2019

SugarCrepe: Fixing Hackable Benchmarks for Vision-Language Compositionality

Paper • 2306.14610 • Published Jun 26, 2023

Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes

Paper • 2305.02301 • Published May 3, 2023 • 2

Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models

Paper • 2308.00675 • Published Aug 1, 2023 • 36

A Survey on Programmatic Weak Supervision

Paper • 2202.05433 • Published Feb 11, 2022