Yi Cui
PRO
onekq
Β·
AI & ML interests
Benchmark, Code Generation Model
Recent Activity
Organizations
view post
October version of Claude 3.5 lifts SOTA (set by its June version) by 7 points.
onekq-ai/WebApp1K-models-leaderboardClosed sourced models are widening the gap again.Note: Our frontier leaderboard now uses double test scenarios because the single-scenario test suit has been saturated.