-
The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism
Paper • 2407.10457 • Published • 22 -
Adding Error Bars to Evals: A Statistical Approach to Language Model Evaluations
Paper • 2411.00640 • Published • 3 -
Law of the Weakest Link: Cross Capabilities of Large Language Models
Paper • 2409.19951 • Published • 54
Vignesh
Vigneshwaran
AI & ML interests
None yet
Recent Activity
updated
a dataset
9 days ago
tau/scrolls
new activity
9 days ago
tau/scrolls:Calculate metric using evaluate instead of datasets
updated
a collection
17 days ago
evaluation
Organizations
Collections
5
models
None public yet
datasets
None public yet