Daniil Vodolazsky

s231644

AI & ML interests

None yet

Recent Activity

upvoted a paper 5 days ago

Search-o1: Agentic Search-Enhanced Large Reasoning Models

upvoted a paper 5 days ago

MMDocIR: Benchmarking Multi-Modal Retrieval for Long Documents

upvoted a paper 5 days ago

MiniMax-01: Scaling Foundation Models with Lightning Attention

View all activity

Organizations

None yet

s231644's activity

upvoted 3 papers 5 days ago

upvoted 5 papers about 1 month ago

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Paper • 2412.05271 • Published Dec 6, 2024 • 128

OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations

Paper • 2412.07626 • Published Dec 10, 2024 • 22

POINTS1.5: Building a Vision-Language Model towards Real World Applications

Paper • 2412.08443 • Published Dec 11, 2024 • 38

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Paper • 2412.09596 • Published Dec 12, 2024 • 93

Phi-4 Technical Report

Paper • 2412.08905 • Published Dec 12, 2024 • 104

upvoted 4 papers 3 months ago

StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization

Paper • 2410.08815 • Published Oct 11, 2024 • 44

Toward General Instruction-Following Alignment for Retrieval-Augmented Generation

Paper • 2410.09584 • Published Oct 12, 2024 • 47

MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks

Paper • 2410.10563 • Published Oct 14, 2024 • 38

VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents

Paper • 2410.10594 • Published Oct 14, 2024 • 24

upvoted a collection 4 months ago

Molmo

Collection

Artifacts for open multimodal language models. • 5 items • Updated 14 days ago • 293

upvoted 4 papers 4 months ago

General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Paper • 2409.01704 • Published Sep 3, 2024 • 83

LongRecipe: Recipe for Efficient Long Context Generalization in Large Languge Models

Paper • 2409.00509 • Published Aug 31, 2024 • 38

LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA

Paper • 2409.02897 • Published Sep 4, 2024 • 45

mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding

Paper • 2409.03420 • Published Sep 5, 2024 • 26

upvoted 3 papers 5 months ago

Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22, 2024 • 124

TableBench: A Comprehensive and Complex Benchmark for Table Question Answering

Paper • 2408.09174 • Published Aug 17, 2024 • 52

Jamba-1.5: Hybrid Transformer-Mamba Models at Scale

Paper • 2408.12570 • Published Aug 22, 2024 • 31