Muennighoff's picture
Merge eval
5443e66
raw
history blame
1.06 kB
task,metric,value,err,version
anli_r1,acc,0.337,0.014955087918653603,0
anli_r2,acc,0.349,0.015080663991563102,0
anli_r3,acc,0.36666666666666664,0.013916893275819938,0
arc_challenge,acc,0.2790102389078498,0.013106784883601346,0
arc_challenge,acc_norm,0.3165529010238908,0.013592431519068077,0
arc_easy,acc,0.6039562289562289,0.010035580962097942,0
arc_easy,acc_norm,0.5702861952861953,0.010157908005763674,0
boolq,acc,0.5636085626911315,0.008674000467432068,1
cb,acc,0.44642857142857145,0.067031892279424,1
cb,f1,0.3176100628930817,,1
copa,acc,0.8,0.040201512610368445,0
hellaswag,acc,0.4722166899024099,0.004982072108448081,0
hellaswag,acc_norm,0.6184027086237801,0.004847857546957481,0
piqa,acc,0.7431991294885746,0.010192864802278045,0
piqa,acc_norm,0.7568008705114254,0.010009611953858915,0
rte,acc,0.5379061371841155,0.03000984891252911,0
sciq,acc,0.842,0.011539894677559568,0
sciq,acc_norm,0.789,0.012909130321042092,0
storycloze_2016,acc,0.7194013896312133,0.010389809647288821,0
winogrande,acc,0.56353591160221,0.013938569465677023,0