Clembench: Using Game Play to Evaluate Chat-Optimized Language Models as Conversational Agents Paper • 2305.13455 • Published May 22, 2023 • 3
How Many Parameters Does it Take to Change a Light Bulb? Evaluating Performance in Self-Play of Conversational Games as a Function of Model Characteristics Paper • 2406.14051 • Published Jun 20, 2024 • 9
Two Giraffes in a Dirt Field: Using Game Play to Investigate Situation Modelling in Large Multimodal Models Paper • 2406.14035 • Published Jun 20, 2024 • 13