🔥 New LLM leaderboard on the hub: an Enterprise Scenarios Leaderboard!
This work evaluates LLMs on several real world use cases (Finance documents, Legal confidentiality, Customer support, ...), which makes it grounded, and interesting for companies! 🏢 Bonus: the test set is private, so it's hard to game 🔥 PatronusAI/enterprise_scenarios_leaderboard
Side note: I discovered through this benchmark that you could evaluate "Engagingness" of an LLM, which could also be interesting for our LLM fine-tuning community out there.