view article Article β΄οΈ ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer Use By Ziyang β’ 3 days ago β’ 9
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings Paper β’ 2501.01257 β’ Published 4 days ago β’ 40
view article Article Introducing Observers: AI Observability with Hugging Face datasets through a lightweight SDK By davidberenstein1957 β’ Nov 21, 2024 β’ 35
view article Article πΊπ¦ββ¬ LLM Comparison/Test: DeepSeek-V3, QVQ-72B-Preview, Falcon3 10B, Llama 3.3 70B, Nemotron 70B in my updated MMLU-Pro CS benchmark By wolfram β’ 4 days ago β’ 30
Executable Code Actions Elicit Better LLM Agents Paper β’ 2402.01030 β’ Published Feb 1, 2024 β’ 32
Open LLM Leaderboard best models β€οΈβπ₯ Collection A daily uploaded list of models with best evaluations on the LLM leaderboard: β’ 64 items β’ Updated about 1 hour ago β’ 497
GTE models Collection General Text Embedding Models Released by Tongyi Lab of Alibaba Group β’ 19 items β’ Updated 16 days ago β’ 19
OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain Paper β’ 2412.13018 β’ Published 20 days ago β’ 41
Judging LLM-as-a-judge with MT-Bench and Chatbot Arena Paper β’ 2306.05685 β’ Published Jun 9, 2023 β’ 32
π± Sailor2 Language Models Collection Sailing in South-East Asia with Inclusive Multilingual LLMs β’ 9 items β’ Updated Dec 3, 2024 β’ 22
Open-Sora Plan: Open-Source Large Video Generation Model Paper β’ 2412.00131 β’ Published Nov 28, 2024 β’ 33
Qwen2.5 Collection Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. β’ 45 items β’ Updated Nov 28, 2024 β’ 453
Star Attention: Efficient LLM Inference over Long Sequences Paper β’ 2411.17116 β’ Published Nov 26, 2024 β’ 48
LLaVA-o1: Let Vision Language Models Reason Step-by-Step Paper β’ 2411.10440 β’ Published Nov 15, 2024 β’ 112
view article Article Releasing the largest multilingual open pretraining dataset By Pclanglais β’ Nov 13, 2024 β’ 98