Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation
Abstract
The rapid development of vision language models (VLMs) demands rigorous and reliable evaluation. However, current visual question answering (VQA) benchmarks often depend on open-ended questions, making accurate evaluation difficult due to the variability in natural language responses. To address this, we introduce AutoConverter, an agentic framework that automatically converts these open-ended questions into multiple-choice format, enabling objective evaluation while reducing the costly question creation process. Our experiments demonstrate that AutoConverter can generate correct and challenging multiple-choice questions, with VLMs demonstrating consistently similar or lower accuracy on these questions compared to human-created ones. Using AutoConverter, we construct VMCBench, a benchmark created by transforming 20 existing VQA datasets into a unified multiple-choice format, totaling 9,018 questions. We comprehensively evaluate 33 state-of-the-art VLMs on VMCBench, setting a new standard for scalable, consistent, and reproducible VLM evaluation.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Document Haystacks: Vision-Language Reasoning Over Piles of 1000+ Documents (2024)
- VLRewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models (2024)
- Large Vision-Language Models for Remote Sensing Visual Question Answering (2024)
- VideoSAVi: Self-Aligned Video Language Models without Human Supervision (2024)
- LLM Distillation for Efficient Few-Shot Multiple Choice Question Answering (2024)
- FM2DS: Few-Shot Multimodal Multihop Data Synthesis with Knowledge Distillation for Question Answering (2024)
- Setting Standards in Turkish NLP: TR-MMLU for Large Language Model Evaluation (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper