BigCodeBench: Benchmarking Large Language Models on Solving Practical and Challenging Programming Tasks Jun 18, 2024 • 43
Falcon 2: An 11B parameter pretrained language model and VLM, trained on over 5000B tokens tokens and 11 languages May 24, 2024 • 25
CyberSecEval 2 - A Comprehensive Evaluation Framework for Cybersecurity Risks and Capabilities of Large Language Models May 24, 2024 • 21
The Open Medical-LLM Leaderboard: Benchmarking Large Language Models in Healthcare Apr 19, 2024 • 126
Introducing the LiveCodeBench Leaderboard - Holistic and Contamination-Free Evaluation of Code LLMs Apr 16, 2024 • 14
Introducing ConTextual: How well can your Multimodal model jointly reason over text and image in text-rich scenes? Mar 5, 2024 • 4
Introducing the Open Ko-LLM Leaderboard: Leading the Korean LLM Evaluation Ecosystem Feb 20, 2024 • 3
NPHardEval Leaderboard: Unveiling the Reasoning Abilities of Large Language Models through Complexity Classes and Dynamic Updates Feb 2, 2024 • 3
Introducing the Enterprise Scenarios Leaderboard: a Leaderboard for Real World Use Cases Jan 31, 2024 • 3
The Hallucinations Leaderboard, an Open Effort to Measure Hallucinations in Large Language Models Jan 29, 2024 • 17
A guide to setting up your own Hugging Face leaderboard: an end-to-end example with Vectara's hallucination leaderboard Jan 12, 2024 • 6
view article Article Bridging the Gap Between Physical Numerical Simulations and Machine Learning: Introducing The Well By rubenohana • Dec 2, 2024 • 17
view article Article Halo: Open Source Health Tracking with Wearables By cyrilzakka • Nov 19, 2024 • 99
view article Article Releasing Outlines-core 0.1.0: structured generation in Rust and Python Oct 22, 2024 • 44
view article Article Democratization of AI, Open Source, and AI Auditing: Thoughts from the DisinfoCon Panel in Berlin By frimelle • Oct 8, 2024 • 5
Molmo Collection Artifacts for open multimodal language models. • 5 items • Updated Nov 27, 2024 • 290
MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures Paper • 2406.06565 • Published Jun 3, 2024 • 9
🎭 Avatars Collection The latest AI-powered technologies usher in a new era of realistic avatars! 🚀 • 70 items • Updated 11 days ago • 79
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale Paper • 2406.17557 • Published Jun 25, 2024 • 87
view article Article Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent Apr 22, 2024 • 80
view article Article LLM Comparison/Test: Llama 3 Instruct 70B + 8B HF/GGUF/EXL2 (20 versions tested and compared!) By wolfram • Apr 24, 2024 • 60
Granite Code Models Collection A series of code models trained by IBM licensed under Apache 2.0 license. We release both the base pretrained and instruct models. • 23 items • Updated 17 days ago • 181
view article Article Bringing the Artificial Analysis LLM Performance Leaderboard to Hugging Face May 3, 2024 • 13
view article Article A guide to setting up your own Hugging Face leaderboard: an end-to-end example with Vectara's hallucination leaderboard Jan 12, 2024 • 6