Yi Cui PRO

onekq

AI & ML interests

Benchmark, Code Generation Model

Recent Activity

Articles

Organizations

MLX Community's profile picture ONEKQ AI's profile picture

Posts 7

view post
Post
574
October version of Claude 3.5 lifts SOTA (set by its June version) by 7 points.
onekq-ai/WebApp1K-models-leaderboard

Closed sourced models are widening the gap again.

Note: Our frontier leaderboard now uses double test scenarios because the single-scenario test suit has been saturated.

models

None public yet

datasets

None public yet