Muennighoff's picture
Merge eval
5443e66
raw
history blame
1.06 kB
task,metric,value,err,version
anli_r1,acc,0.345,0.015039986742055235,0
anli_r2,acc,0.325,0.014818724459095526,0
anli_r3,acc,0.31416666666666665,0.013405399314984096,0
arc_challenge,acc,0.30204778156996587,0.01341751914471642,0
arc_challenge,acc_norm,0.32764505119453924,0.013715847940719344,0
arc_easy,acc,0.6405723905723906,0.009845958893373766,0
arc_easy,acc_norm,0.6212121212121212,0.00995373765654204,0
boolq,acc,0.6275229357798165,0.008455846866956085,1
cb,acc,0.39285714285714285,0.0658538889806635,1
cb,f1,0.3647495361781076,,1
copa,acc,0.82,0.038612291966536955,0
hellaswag,acc,0.4819757020513842,0.004986538243846636,0
hellaswag,acc_norm,0.6387173869747063,0.004793904922401888,0
piqa,acc,0.7551686615886833,0.01003230910556879,0
piqa,acc_norm,0.76550598476605,0.00988520314324054,0
rte,acc,0.48736462093862815,0.030086851767188564,0
sciq,acc,0.92,0.008583336977753653,0
sciq,acc_norm,0.907,0.009188875634996702,0
storycloze_2016,acc,0.7386424371993586,0.010160471460690485,0
winogrande,acc,0.5832675611681136,0.013856250072796322,0