Collections
Discover the best community collections!
Collections including paper arxiv:2404.07972
-
Octopus v2: On-device language model for super agent
Paper • 2404.01744 • Published • 56 -
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
Paper • 2404.05719 • Published • 82 -
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Paper • 2404.07972 • Published • 46 -
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
Paper • 2404.12253 • Published • 54
-
AQuA: A Benchmarking Tool for Label Quality Assessment
Paper • 2306.09467 • Published • 1 -
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Paper • 2404.07972 • Published • 46 -
BLINK: Multimodal Large Language Models Can See but Not Perceive
Paper • 2404.12390 • Published • 24 -
Vision language models are blind
Paper • 2407.06581 • Published • 83
-
Event Camera Demosaicing via Swin Transformer and Pixel-focus Loss
Paper • 2404.02731 • Published • 1 -
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models
Paper • 2309.12284 • Published • 19 -
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis
Paper • 2404.03204 • Published • 7 -
Adapting LLaMA Decoder to Vision Transformer
Paper • 2404.06773 • Published • 17
-
Communicative Agents for Software Development
Paper • 2307.07924 • Published • 4 -
Self-Refine: Iterative Refinement with Self-Feedback
Paper • 2303.17651 • Published • 2 -
ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent
Paper • 2312.10003 • Published • 37 -
ReAct: Synergizing Reasoning and Acting in Language Models
Paper • 2210.03629 • Published • 15
-
Long-context LLMs Struggle with Long In-context Learning
Paper • 2404.02060 • Published • 36 -
Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks
Paper • 2211.12588 • Published • 3 -
StructLM: Towards Building Generalist Models for Structured Knowledge Grounding
Paper • 2402.16671 • Published • 26 -
Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model
Paper • 2404.04167 • Published • 12
-
Non-Autoregressive Neural Machine Translation
Paper • 1711.02281 • Published • 1 -
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation
Paper • 2107.07651 • Published • 1 -
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Paper • 2404.07972 • Published • 46 -
AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning
Paper • 2402.15506 • Published • 14
-
The FinBen: An Holistic Financial Benchmark for Large Language Models
Paper • 2402.12659 • Published • 17 -
Long-context LLMs Struggle with Long In-context Learning
Paper • 2404.02060 • Published • 36 -
Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs
Paper • 2312.17080 • Published • 1 -
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Paper • 1804.07461 • Published • 4
-
Can large language models explore in-context?
Paper • 2403.15371 • Published • 32 -
Long-context LLMs Struggle with Long In-context Learning
Paper • 2404.02060 • Published • 36 -
PIQA: Reasoning about Physical Commonsense in Natural Language
Paper • 1911.11641 • Published • 2 -
AQuA: A Benchmarking Tool for Label Quality Assessment
Paper • 2306.09467 • Published • 1