fedric95 commited on
Commit
d263549
·
verified ·
1 Parent(s): b1a6865

Upload ./Qwen2-7B-Q8_0.mmlu.pro.txt with huggingface_hub

Browse files
Files changed (1) hide show
  1. Qwen2-7B-Q8_0.mmlu.pro.txt +2 -80
Qwen2-7B-Q8_0.mmlu.pro.txt CHANGED
@@ -1,80 +1,2 @@
1
- multiple_choice_score: there are 70 tasks in prompt
2
- multiple_choice_score: reading tasks......................................................................done
3
- multiple_choice_score: preparing task data......................................................................done
4
- multiple_choice_score : calculating TruthfulQA score over 70 tasks.
5
-
6
- task acc_norm
7
- 1 0.00000000
8
- 2 50.00000000
9
- 3 33.33333333
10
- 4 25.00000000
11
- 5 40.00000000
12
- 6 50.00000000
13
- 7 57.14285714
14
- 8 50.00000000
15
- 9 44.44444444
16
- 10 50.00000000
17
- 11 54.54545455
18
- 12 50.00000000
19
- 13 46.15384615
20
- 14 50.00000000
21
- 15 53.33333333
22
- 16 50.00000000
23
- 17 47.05882353
24
- 18 44.44444444
25
- 19 42.10526316
26
- 20 40.00000000
27
- 21 38.09523810
28
- 22 36.36363636
29
- 23 34.78260870
30
- 24 33.33333333
31
- 25 32.00000000
32
- 26 30.76923077
33
- 27 29.62962963
34
- 28 28.57142857
35
- 29 27.58620690
36
- 30 26.66666667
37
- 31 25.80645161
38
- 32 25.00000000
39
- 33 27.27272727
40
- 34 26.47058824
41
- 35 25.71428571
42
- 36 25.00000000
43
- 37 24.32432432
44
- 38 23.68421053
45
- 39 23.07692308
46
- 40 25.00000000
47
- 41 24.39024390
48
- 42 23.80952381
49
- 43 23.25581395
50
- 44 22.72727273
51
- 45 22.22222222
52
- 46 23.91304348
53
- 47 23.40425532
54
- 48 25.00000000
55
- 49 24.48979592
56
- 50 24.00000000
57
- 51 23.52941176
58
- 52 23.07692308
59
- 53 24.52830189
60
- 54 24.07407407
61
- 55 25.45454545
62
- 56 25.00000000
63
- 57 24.56140351
64
- 58 24.13793103
65
- 59 23.72881356
66
- 60 25.00000000
67
- 61 24.59016393
68
- 62 24.19354839
69
- 63 23.80952381
70
- 64 25.00000000
71
- 65 24.61538462
72
- 66 24.24242424
73
- 67 23.88059701
74
- 68 23.52941176
75
- 69 23.18840580
76
- 70 22.85714286
77
-
78
- Final result: 22.8571 +/- 5.0552
79
- Random chance: 10.0000 +/- 3.6116
80
-
 
1
+ multiple_choice_score: there are 12032 tasks in prompt
2
+ multiple_choice_score: reading tasksmultiple_choice_score: failed to read task 1 of 12032