Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2404.07972

Papers - Operating Systems

OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

Paper • 2404.07972 • Published Apr 11, 2024 • 46

daily paper selected

Octopus v2: On-device language model for super agent

Paper • 2404.01744 • Published Apr 2, 2024 • 56
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs

Paper • 2404.05719 • Published Apr 8, 2024 • 82
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

Paper • 2404.07972 • Published Apr 11, 2024 • 46
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

Paper • 2404.12253 • Published Apr 18, 2024 • 54

Papers - Benchmarks - Image

AQuA: A Benchmarking Tool for Label Quality Assessment

Paper • 2306.09467 • Published Jun 15, 2023 • 1
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

Paper • 2404.07972 • Published Apr 11, 2024 • 46
BLINK: Multimodal Large Language Models Can See but Not Perceive

Paper • 2404.12390 • Published Apr 18, 2024 • 24
Vision language models are blind

Paper • 2407.06581 • Published Jul 9, 2024 • 83

Papers - University - Hong Kong University of Science and Te

Event Camera Demosaicing via Swin Transformer and Pixel-focus Loss

Paper • 2404.02731 • Published Apr 3, 2024 • 1
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models

Paper • 2309.12284 • Published Sep 21, 2023 • 19
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis

Paper • 2404.03204 • Published Apr 4, 2024 • 7
Adapting LLaMA Decoder to Vision Transformer

Paper • 2404.06773 • Published Apr 10, 2024 • 17

Collection of resources related to Agents.

Communicative Agents for Software Development

Paper • 2307.07924 • Published Jul 16, 2023 • 4
Self-Refine: Iterative Refinement with Self-Feedback

Paper • 2303.17651 • Published Mar 30, 2023 • 2
ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent

Paper • 2312.10003 • Published Dec 15, 2023 • 37
ReAct: Synergizing Reasoning and Acting in Language Models

Paper • 2210.03629 • Published Oct 6, 2022 • 15

Papers - University - University of Waterloo

Long-context LLMs Struggle with Long In-context Learning

Paper • 2404.02060 • Published Apr 2, 2024 • 36
Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks

Paper • 2211.12588 • Published Nov 22, 2022 • 3
StructLM: Towards Building Generalist Models for Structured Knowledge Grounding

Paper • 2402.16671 • Published Feb 26, 2024 • 26
Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model

Paper • 2404.04167 • Published Apr 5, 2024 • 12

Papers - Coding - Operating Systems - Memory

Enabling Memory Safety of C Programs using LLMs

Paper • 2404.01096 • Published Apr 1, 2024 • 1
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

Paper • 2404.07972 • Published Apr 11, 2024 • 46

Papers - Salesforce

Non-Autoregressive Neural Machine Translation

Paper • 1711.02281 • Published Nov 7, 2017 • 1
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation

Paper • 2107.07651 • Published Jul 16, 2021 • 1
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

Paper • 2404.07972 • Published Apr 11, 2024 • 46
AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning

Paper • 2402.15506 • Published Feb 23, 2024 • 14

Papers - Benchmarks

The FinBen: An Holistic Financial Benchmark for Large Language Models

Paper • 2402.12659 • Published Feb 20, 2024 • 17
Long-context LLMs Struggle with Long In-context Learning

Paper • 2404.02060 • Published Apr 2, 2024 • 36
Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs

Paper • 2312.17080 • Published Dec 28, 2023 • 1
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding

Paper • 1804.07461 • Published Apr 20, 2018 • 4

Papers - University - Carnegie Mellon University

Can large language models explore in-context?

Paper • 2403.15371 • Published Mar 22, 2024 • 32
Long-context LLMs Struggle with Long In-context Learning

Paper • 2404.02060 • Published Apr 2, 2024 • 36
PIQA: Reasoning about Physical Commonsense in Natural Language

Paper • 1911.11641 • Published Nov 26, 2019 • 2
AQuA: A Benchmarking Tool for Label Quality Assessment

Paper • 2306.09467 • Published Jun 15, 2023 • 1

Previous
1
2
3
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs