-
Enhancing Document Information Analysis with Multi-Task Pre-training: A Robust Approach for Information Extraction in Visually-Rich Documents
Paper • 2310.16527 • Published • 2 -
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 182 -
Unifying Vision, Text, and Layout for Universal Document Processing
Paper • 2212.02623 • Published • 10
Collections
Discover the best community collections!
Collections including paper arxiv:2310.16527
-
Enhancing Document Information Analysis with Multi-Task Pre-training: A Robust Approach for Information Extraction in Visually-Rich Documents
Paper • 2310.16527 • Published • 2 -
CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection
Paper • 2310.02960 • Published • 1 -
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper • 2403.09611 • Published • 126 -
Veagle: Advancements in Multimodal Representation Learning
Paper • 2403.08773 • Published • 9
-
Representing Online Handwriting for Recognition in Large Vision-Language Models
Paper • 2402.15307 • Published • 3 -
Evaluating Sequence-to-Sequence Models for Handwritten Text Recognition
Paper • 1903.07377 • Published • 2 -
Enhancing Document Information Analysis with Multi-Task Pre-training: A Robust Approach for Information Extraction in Visually-Rich Documents
Paper • 2310.16527 • Published • 2 -
Detecting and recognizing characters in Greek papyri with YOLOv8, DeiT and SimCLR
Paper • 2401.12513 • Published • 1
-
Data Incubation -- Synthesizing Missing Data for Handwriting Recognition
Paper • 2110.07040 • Published • 2 -
A Mixture of Expert Approach for Low-Cost Customization of Deep Neural Networks
Paper • 1811.00056 • Published • 2 -
Vulnerability Analysis of Transformer-based Optical Character Recognition to Adversarial Attacks
Paper • 2311.17128 • Published • 2 -
Data Generation for Post-OCR correction of Cyrillic handwriting
Paper • 2311.15896 • Published • 3