czczup commited on
Commit
23ac785
·
verified ·
1 Parent(s): a5038f9

Delete 20241205_033944

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. 20241205_033944/configs/20241205_033944_98865.py +0 -0
  2. 20241205_033944/logs/eval/internvl-chat-20b/C3.out +0 -6
  3. 20241205_033944/logs/eval/internvl-chat-20b/GaokaoBench_2010-2013_English_MCQs.out +0 -6
  4. 20241205_033944/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_Biology_MCQs.out +0 -6
  5. 20241205_033944/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_Chemistry_MCQs.out +0 -6
  6. 20241205_033944/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_Chinese_Lang_and_Usage_MCQs.out +0 -6
  7. 20241205_033944/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_Chinese_Language_Famous_Passages_and_Sentences_Dictation.out +0 -6
  8. 20241205_033944/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_Chinese_Modern_Lit.out +0 -6
  9. 20241205_033944/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_English_Fill_in_Blanks.out +0 -6
  10. 20241205_033944/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_English_Reading_Comp.out +0 -6
  11. 20241205_033944/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_Geography_MCQs.out +0 -6
  12. 20241205_033944/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_History_MCQs.out +0 -6
  13. 20241205_033944/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_Math_II_Fill-in-the-Blank.out +0 -6
  14. 20241205_033944/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_Math_II_MCQs.out +0 -6
  15. 20241205_033944/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_Math_I_Fill-in-the-Blank.out +0 -6
  16. 20241205_033944/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_Math_I_MCQs.out +0 -6
  17. 20241205_033944/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_Physics_MCQs.out +0 -6
  18. 20241205_033944/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_Political_Science_MCQs.out +0 -6
  19. 20241205_033944/logs/eval/internvl-chat-20b/GaokaoBench_2012-2022_English_Cloze_Test.out +0 -6
  20. 20241205_033944/logs/eval/internvl-chat-20b/GaokaoBench_2014-2022_English_Language_Cloze_Passage.out +0 -6
  21. 20241205_033944/logs/eval/internvl-chat-20b/IFEval.out +0 -6
  22. 20241205_033944/logs/eval/internvl-chat-20b/TheoremQA.out +0 -0
  23. 20241205_033944/logs/eval/internvl-chat-20b/bbh-boolean_expressions.out +0 -6
  24. 20241205_033944/logs/eval/internvl-chat-20b/bbh-causal_judgement.out +0 -6
  25. 20241205_033944/logs/eval/internvl-chat-20b/bbh-date_understanding.out +0 -8
  26. 20241205_033944/logs/eval/internvl-chat-20b/bbh-disambiguation_qa.out +0 -8
  27. 20241205_033944/logs/eval/internvl-chat-20b/bbh-dyck_languages.out +0 -6
  28. 20241205_033944/logs/eval/internvl-chat-20b/bbh-formal_fallacies.out +0 -6
  29. 20241205_033944/logs/eval/internvl-chat-20b/bbh-geometric_shapes.out +0 -8
  30. 20241205_033944/logs/eval/internvl-chat-20b/bbh-hyperbaton.out +0 -8
  31. 20241205_033944/logs/eval/internvl-chat-20b/bbh-logical_deduction_five_objects.out +0 -8
  32. 20241205_033944/logs/eval/internvl-chat-20b/bbh-logical_deduction_seven_objects.out +0 -8
  33. 20241205_033944/logs/eval/internvl-chat-20b/bbh-logical_deduction_three_objects.out +0 -8
  34. 20241205_033944/logs/eval/internvl-chat-20b/bbh-movie_recommendation.out +0 -8
  35. 20241205_033944/logs/eval/internvl-chat-20b/bbh-multistep_arithmetic_two.out +0 -6
  36. 20241205_033944/logs/eval/internvl-chat-20b/bbh-navigate.out +0 -6
  37. 20241205_033944/logs/eval/internvl-chat-20b/bbh-object_counting.out +0 -6
  38. 20241205_033944/logs/eval/internvl-chat-20b/bbh-penguins_in_a_table.out +0 -8
  39. 20241205_033944/logs/eval/internvl-chat-20b/bbh-reasoning_about_colored_objects.out +0 -8
  40. 20241205_033944/logs/eval/internvl-chat-20b/bbh-ruin_names.out +0 -8
  41. 20241205_033944/logs/eval/internvl-chat-20b/bbh-salient_translation_error_detection.out +0 -8
  42. 20241205_033944/logs/eval/internvl-chat-20b/bbh-snarks.out +0 -8
  43. 20241205_033944/logs/eval/internvl-chat-20b/bbh-sports_understanding.out +0 -6
  44. 20241205_033944/logs/eval/internvl-chat-20b/bbh-temporal_sequences.out +0 -8
  45. 20241205_033944/logs/eval/internvl-chat-20b/bbh-tracking_shuffled_objects_five_objects.out +0 -8
  46. 20241205_033944/logs/eval/internvl-chat-20b/bbh-tracking_shuffled_objects_seven_objects.out +0 -8
  47. 20241205_033944/logs/eval/internvl-chat-20b/bbh-tracking_shuffled_objects_three_objects.out +0 -8
  48. 20241205_033944/logs/eval/internvl-chat-20b/bbh-web_of_lies.out +0 -6
  49. 20241205_033944/logs/eval/internvl-chat-20b/bbh-word_sorting.out +0 -6
  50. 20241205_033944/logs/eval/internvl-chat-20b/ceval-accountant.out +0 -6
20241205_033944/configs/20241205_033944_98865.py DELETED
The diff for this file is too large to render. See raw diff
 
20241205_033944/logs/eval/internvl-chat-20b/C3.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4090296 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 12/05 03:49:11 - OpenCompass - INFO - Task [internvl-chat-20b/C3]: {'accuracy': 76.87671232876713}
6
- 12/05 03:49:11 - OpenCompass - INFO - time elapsed: 15.30s
 
 
 
 
 
 
 
20241205_033944/logs/eval/internvl-chat-20b/GaokaoBench_2010-2013_English_MCQs.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4090278 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 12/05 03:49:05 - OpenCompass - INFO - Task [internvl-chat-20b/GaokaoBench_2010-2013_English_MCQs]: {'score': 55.23809523809524}
6
- 12/05 03:49:05 - OpenCompass - INFO - time elapsed: 11.80s
 
 
 
 
 
 
 
20241205_033944/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_Biology_MCQs.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4090429 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 12/05 03:49:21 - OpenCompass - INFO - Task [internvl-chat-20b/GaokaoBench_2010-2022_Biology_MCQs]: {'score': 50.0}
6
- 12/05 03:49:21 - OpenCompass - INFO - time elapsed: 18.30s
 
 
 
 
 
 
 
20241205_033944/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_Chemistry_MCQs.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4090456 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 12/05 03:49:22 - OpenCompass - INFO - Task [internvl-chat-20b/GaokaoBench_2010-2022_Chemistry_MCQs]: {'score': 30.64516129032258}
6
- 12/05 03:49:22 - OpenCompass - INFO - time elapsed: 18.23s
 
 
 
 
 
 
 
20241205_033944/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_Chinese_Lang_and_Usage_MCQs.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4090416 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 12/05 03:49:21 - OpenCompass - INFO - Task [internvl-chat-20b/GaokaoBench_2010-2022_Chinese_Lang_and_Usage_MCQs]: {'score': 23.75}
6
- 12/05 03:49:21 - OpenCompass - INFO - time elapsed: 18.76s
 
 
 
 
 
 
 
20241205_033944/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_Chinese_Language_Famous_Passages_and_Sentences_Dictation.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4090340 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 12/05 03:49:10 - OpenCompass - INFO - Task [internvl-chat-20b/GaokaoBench_2010-2022_Chinese_Language_Famous_Passages_and_Sentences_Dictation]: {'score': 0}
6
- 12/05 03:49:10 - OpenCompass - INFO - time elapsed: 12.39s
 
 
 
 
 
 
 
20241205_033944/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_Chinese_Modern_Lit.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4090279 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 12/05 03:49:05 - OpenCompass - INFO - Task [internvl-chat-20b/GaokaoBench_2010-2022_Chinese_Modern_Lit]: {'score': 9.195402298850574}
6
- 12/05 03:49:05 - OpenCompass - INFO - time elapsed: 11.22s
 
 
 
 
 
 
 
20241205_033944/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_English_Fill_in_Blanks.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4090339 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 12/05 03:49:10 - OpenCompass - INFO - Task [internvl-chat-20b/GaokaoBench_2010-2022_English_Fill_in_Blanks]: {'score': 0.0}
6
- 12/05 03:49:10 - OpenCompass - INFO - time elapsed: 12.32s
 
 
 
 
 
 
 
20241205_033944/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_English_Reading_Comp.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4090346 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 12/05 03:49:10 - OpenCompass - INFO - Task [internvl-chat-20b/GaokaoBench_2010-2022_English_Reading_Comp]: {'score': 12.553191489361701}
6
- 12/05 03:49:10 - OpenCompass - INFO - time elapsed: 12.31s
 
 
 
 
 
 
 
20241205_033944/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_Geography_MCQs.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4090436 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 12/05 03:49:22 - OpenCompass - INFO - Task [internvl-chat-20b/GaokaoBench_2010-2022_Geography_MCQs]: {'score': 44.21052631578947}
6
- 12/05 03:49:22 - OpenCompass - INFO - time elapsed: 18.57s
 
 
 
 
 
 
 
20241205_033944/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_History_MCQs.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4090415 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 12/05 03:49:22 - OpenCompass - INFO - Task [internvl-chat-20b/GaokaoBench_2010-2022_History_MCQs]: {'score': 68.6411149825784}
6
- 12/05 03:49:22 - OpenCompass - INFO - time elapsed: 18.77s
 
 
 
 
 
 
 
20241205_033944/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_Math_II_Fill-in-the-Blank.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4090312 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 12/05 03:49:14 - OpenCompass - INFO - Task [internvl-chat-20b/GaokaoBench_2010-2022_Math_II_Fill-in-the-Blank]: {'score': 0}
6
- 12/05 03:49:14 - OpenCompass - INFO - time elapsed: 15.85s
 
 
 
 
 
 
 
20241205_033944/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_Math_II_MCQs.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4090323 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 12/05 03:49:14 - OpenCompass - INFO - Task [internvl-chat-20b/GaokaoBench_2010-2022_Math_II_MCQs]: {'score': 13.302752293577983}
6
- 12/05 03:49:14 - OpenCompass - INFO - time elapsed: 15.92s
 
 
 
 
 
 
 
20241205_033944/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_Math_I_Fill-in-the-Blank.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4090394 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 12/05 03:49:17 - OpenCompass - INFO - Task [internvl-chat-20b/GaokaoBench_2010-2022_Math_I_Fill-in-the-Blank]: {'score': 0}
6
- 12/05 03:49:17 - OpenCompass - INFO - time elapsed: 16.37s
 
 
 
 
 
 
 
20241205_033944/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_Math_I_MCQs.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4090419 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 12/05 03:49:22 - OpenCompass - INFO - Task [internvl-chat-20b/GaokaoBench_2010-2022_Math_I_MCQs]: {'score': 10.2803738317757}
6
- 12/05 03:49:22 - OpenCompass - INFO - time elapsed: 18.55s
 
 
 
 
 
 
 
20241205_033944/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_Physics_MCQs.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4090320 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 12/05 03:49:14 - OpenCompass - INFO - Task [internvl-chat-20b/GaokaoBench_2010-2022_Physics_MCQs]: {'score': 10.9375}
6
- 12/05 03:49:14 - OpenCompass - INFO - time elapsed: 16.43s
 
 
 
 
 
 
 
20241205_033944/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_Political_Science_MCQs.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4090420 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 12/05 03:49:21 - OpenCompass - INFO - Task [internvl-chat-20b/GaokaoBench_2010-2022_Political_Science_MCQs]: {'score': 73.75}
6
- 12/05 03:49:21 - OpenCompass - INFO - time elapsed: 17.63s
 
 
 
 
 
 
 
20241205_033944/logs/eval/internvl-chat-20b/GaokaoBench_2012-2022_English_Cloze_Test.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4090317 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 12/05 03:49:14 - OpenCompass - INFO - Task [internvl-chat-20b/GaokaoBench_2012-2022_English_Cloze_Test]: {'score': 0.0}
6
- 12/05 03:49:14 - OpenCompass - INFO - time elapsed: 15.85s
 
 
 
 
 
 
 
20241205_033944/logs/eval/internvl-chat-20b/GaokaoBench_2014-2022_English_Language_Cloze_Passage.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4090287 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 12/05 03:49:10 - OpenCompass - INFO - Task [internvl-chat-20b/GaokaoBench_2014-2022_English_Language_Cloze_Passage]: {'score': 0}
6
- 12/05 03:49:10 - OpenCompass - INFO - time elapsed: 15.13s
 
 
 
 
 
 
 
20241205_033944/logs/eval/internvl-chat-20b/IFEval.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4090535 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 12/05 03:49:35 - OpenCompass - INFO - Task [internvl-chat-20b/IFEval]: {'Prompt-level-strict-accuracy': 19.77818853974122, 'Inst-level-strict-accuracy': 31.894484412470025, 'Prompt-level-loose-accuracy': 22.920517560073936, 'Inst-level-loose-accuracy': 35.13189448441247}
6
- 12/05 03:49:35 - OpenCompass - INFO - time elapsed: 14.98s
 
 
 
 
 
 
 
20241205_033944/logs/eval/internvl-chat-20b/TheoremQA.out DELETED
The diff for this file is too large to render. See raw diff
 
20241205_033944/logs/eval/internvl-chat-20b/bbh-boolean_expressions.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4090516 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 12/05 03:49:24 - OpenCompass - INFO - Task [internvl-chat-20b/bbh-boolean_expressions]: {'score': 56.39999999999999}
6
- 12/05 03:49:24 - OpenCompass - INFO - time elapsed: 11.15s
 
 
 
 
 
 
 
20241205_033944/logs/eval/internvl-chat-20b/bbh-causal_judgement.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4090523 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 12/05 03:49:27 - OpenCompass - INFO - Task [internvl-chat-20b/bbh-causal_judgement]: {'score': 48.1283422459893}
6
- 12/05 03:49:27 - OpenCompass - INFO - time elapsed: 10.68s
 
 
 
 
 
 
 
20241205_033944/logs/eval/internvl-chat-20b/bbh-date_understanding.out DELETED
@@ -1,8 +0,0 @@
1
- srun: Job 4090337 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- Parameter 'function'=<function OpenICLEvalTask._score.<locals>.postprocess at 0x7f230a482950> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
6
-
7
- 12/05 03:49:10 - OpenCompass - INFO - Task [internvl-chat-20b/bbh-date_understanding]: {'score': 54.800000000000004}
8
- 12/05 03:49:10 - OpenCompass - INFO - time elapsed: 12.54s
 
 
 
 
 
 
 
 
 
20241205_033944/logs/eval/internvl-chat-20b/bbh-disambiguation_qa.out DELETED
@@ -1,8 +0,0 @@
1
- srun: Job 4090338 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- Parameter 'function'=<function OpenICLEvalTask._score.<locals>.postprocess at 0x7f443bc32950> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
6
-
7
- 12/05 03:49:10 - OpenCompass - INFO - Task [internvl-chat-20b/bbh-disambiguation_qa]: {'score': 51.6}
8
- 12/05 03:49:10 - OpenCompass - INFO - time elapsed: 12.59s
 
 
 
 
 
 
 
 
 
20241205_033944/logs/eval/internvl-chat-20b/bbh-dyck_languages.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4090539 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 12/05 03:49:33 - OpenCompass - INFO - Task [internvl-chat-20b/bbh-dyck_languages]: {'score': 0.0}
6
- 12/05 03:49:33 - OpenCompass - INFO - time elapsed: 12.01s
 
 
 
 
 
 
 
20241205_033944/logs/eval/internvl-chat-20b/bbh-formal_fallacies.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4090542 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 12/05 03:49:32 - OpenCompass - INFO - Task [internvl-chat-20b/bbh-formal_fallacies]: {'score': 48.0}
6
- 12/05 03:49:32 - OpenCompass - INFO - time elapsed: 9.92s
 
 
 
 
 
 
 
20241205_033944/logs/eval/internvl-chat-20b/bbh-geometric_shapes.out DELETED
@@ -1,8 +0,0 @@
1
- srun: Job 4090513 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- Parameter 'function'=<function OpenICLEvalTask._score.<locals>.postprocess at 0x7fc13523a950> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
6
-
7
- 12/05 03:49:20 - OpenCompass - INFO - Task [internvl-chat-20b/bbh-geometric_shapes]: {'score': 24.8}
8
- 12/05 03:49:20 - OpenCompass - INFO - time elapsed: 10.89s
 
 
 
 
 
 
 
 
 
20241205_033944/logs/eval/internvl-chat-20b/bbh-hyperbaton.out DELETED
@@ -1,8 +0,0 @@
1
- srun: Job 4090518 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- Parameter 'function'=<function OpenICLEvalTask._score.<locals>.postprocess at 0x7fda1fcaa950> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
6
-
7
- 12/05 03:49:25 - OpenCompass - INFO - Task [internvl-chat-20b/bbh-hyperbaton]: {'score': 58.8}
8
- 12/05 03:49:25 - OpenCompass - INFO - time elapsed: 10.79s
 
 
 
 
 
 
 
 
 
20241205_033944/logs/eval/internvl-chat-20b/bbh-logical_deduction_five_objects.out DELETED
@@ -1,8 +0,0 @@
1
- srun: Job 4090529 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- Parameter 'function'=<function OpenICLEvalTask._score.<locals>.postprocess at 0x7f4280c6a950> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
6
-
7
- 12/05 03:49:29 - OpenCompass - INFO - Task [internvl-chat-20b/bbh-logical_deduction_five_objects]: {'score': 21.6}
8
- 12/05 03:49:29 - OpenCompass - INFO - time elapsed: 10.53s
 
 
 
 
 
 
 
 
 
20241205_033944/logs/eval/internvl-chat-20b/bbh-logical_deduction_seven_objects.out DELETED
@@ -1,8 +0,0 @@
1
- srun: Job 4090541 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- Parameter 'function'=<function OpenICLEvalTask._score.<locals>.postprocess at 0x7fafc6daa950> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
6
-
7
- 12/05 03:49:32 - OpenCompass - INFO - Task [internvl-chat-20b/bbh-logical_deduction_seven_objects]: {'score': 17.2}
8
- 12/05 03:49:32 - OpenCompass - INFO - time elapsed: 10.53s
 
 
 
 
 
 
 
 
 
20241205_033944/logs/eval/internvl-chat-20b/bbh-logical_deduction_three_objects.out DELETED
@@ -1,8 +0,0 @@
1
- srun: Job 4090514 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- Parameter 'function'=<function OpenICLEvalTask._score.<locals>.postprocess at 0x7f1d0a64a950> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
6
-
7
- 12/05 03:49:22 - OpenCompass - INFO - Task [internvl-chat-20b/bbh-logical_deduction_three_objects]: {'score': 38.0}
8
- 12/05 03:49:22 - OpenCompass - INFO - time elapsed: 10.64s
 
 
 
 
 
 
 
 
 
20241205_033944/logs/eval/internvl-chat-20b/bbh-movie_recommendation.out DELETED
@@ -1,8 +0,0 @@
1
- srun: Job 4090528 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- Parameter 'function'=<function OpenICLEvalTask._score.<locals>.postprocess at 0x7fae0592e950> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
6
-
7
- 12/05 03:49:29 - OpenCompass - INFO - Task [internvl-chat-20b/bbh-movie_recommendation]: {'score': 53.2}
8
- 12/05 03:49:29 - OpenCompass - INFO - time elapsed: 10.87s
 
 
 
 
 
 
 
 
 
20241205_033944/logs/eval/internvl-chat-20b/bbh-multistep_arithmetic_two.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4090520 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 12/05 03:49:26 - OpenCompass - INFO - Task [internvl-chat-20b/bbh-multistep_arithmetic_two]: {'score': 9.6}
6
- 12/05 03:49:26 - OpenCompass - INFO - time elapsed: 11.22s
 
 
 
 
 
 
 
20241205_033944/logs/eval/internvl-chat-20b/bbh-navigate.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4090530 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 12/05 03:49:29 - OpenCompass - INFO - Task [internvl-chat-20b/bbh-navigate]: {'score': 58.8}
6
- 12/05 03:49:29 - OpenCompass - INFO - time elapsed: 10.34s
 
 
 
 
 
 
 
20241205_033944/logs/eval/internvl-chat-20b/bbh-object_counting.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4090531 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 12/05 03:49:29 - OpenCompass - INFO - Task [internvl-chat-20b/bbh-object_counting]: {'score': 51.2}
6
- 12/05 03:49:29 - OpenCompass - INFO - time elapsed: 10.39s
 
 
 
 
 
 
 
20241205_033944/logs/eval/internvl-chat-20b/bbh-penguins_in_a_table.out DELETED
@@ -1,8 +0,0 @@
1
- srun: Job 4090517 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- Parameter 'function'=<function OpenICLEvalTask._score.<locals>.postprocess at 0x7f38bccc28c0> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
6
-
7
- 12/05 03:49:24 - OpenCompass - INFO - Task [internvl-chat-20b/bbh-penguins_in_a_table]: {'score': 32.87671232876712}
8
- 12/05 03:49:24 - OpenCompass - INFO - time elapsed: 11.15s
 
 
 
 
 
 
 
 
 
20241205_033944/logs/eval/internvl-chat-20b/bbh-reasoning_about_colored_objects.out DELETED
@@ -1,8 +0,0 @@
1
- srun: Job 4090515 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- Parameter 'function'=<function OpenICLEvalTask._score.<locals>.postprocess at 0x7f8aa82f2950> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
6
-
7
- 12/05 03:49:22 - OpenCompass - INFO - Task [internvl-chat-20b/bbh-reasoning_about_colored_objects]: {'score': 52.0}
8
- 12/05 03:49:22 - OpenCompass - INFO - time elapsed: 10.69s
 
 
 
 
 
 
 
 
 
20241205_033944/logs/eval/internvl-chat-20b/bbh-ruin_names.out DELETED
@@ -1,8 +0,0 @@
1
- srun: Job 4090525 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- Parameter 'function'=<function OpenICLEvalTask._score.<locals>.postprocess at 0x7f9b772fa950> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
6
-
7
- 12/05 03:49:28 - OpenCompass - INFO - Task [internvl-chat-20b/bbh-ruin_names]: {'score': 26.8}
8
- 12/05 03:49:28 - OpenCompass - INFO - time elapsed: 10.89s
 
 
 
 
 
 
 
 
 
20241205_033944/logs/eval/internvl-chat-20b/bbh-salient_translation_error_detection.out DELETED
@@ -1,8 +0,0 @@
1
- srun: Job 4090540 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- Parameter 'function'=<function OpenICLEvalTask._score.<locals>.postprocess at 0x7f0cda62a950> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
6
-
7
- 12/05 03:49:32 - OpenCompass - INFO - Task [internvl-chat-20b/bbh-salient_translation_error_detection]: {'score': 16.8}
8
- 12/05 03:49:32 - OpenCompass - INFO - time elapsed: 10.55s
 
 
 
 
 
 
 
 
 
20241205_033944/logs/eval/internvl-chat-20b/bbh-snarks.out DELETED
@@ -1,8 +0,0 @@
1
- srun: Job 4090522 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- Parameter 'function'=<function OpenICLEvalTask._score.<locals>.postprocess at 0x7fa12dc0e8c0> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
6
-
7
- 12/05 03:49:27 - OpenCompass - INFO - Task [internvl-chat-20b/bbh-snarks]: {'score': 48.87640449438202}
8
- 12/05 03:49:27 - OpenCompass - INFO - time elapsed: 10.78s
 
 
 
 
 
 
 
 
 
20241205_033944/logs/eval/internvl-chat-20b/bbh-sports_understanding.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4090526 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 12/05 03:49:28 - OpenCompass - INFO - Task [internvl-chat-20b/bbh-sports_understanding]: {'score': 56.00000000000001}
6
- 12/05 03:49:28 - OpenCompass - INFO - time elapsed: 10.37s
 
 
 
 
 
 
 
20241205_033944/logs/eval/internvl-chat-20b/bbh-temporal_sequences.out DELETED
@@ -1,8 +0,0 @@
1
- srun: Job 4090401 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- Parameter 'function'=<function OpenICLEvalTask._score.<locals>.postprocess at 0x7f499b756950> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
6
-
7
- 12/05 03:49:22 - OpenCompass - INFO - Task [internvl-chat-20b/bbh-temporal_sequences]: {'score': 40.8}
8
- 12/05 03:49:22 - OpenCompass - INFO - time elapsed: 19.32s
 
 
 
 
 
 
 
 
 
20241205_033944/logs/eval/internvl-chat-20b/bbh-tracking_shuffled_objects_five_objects.out DELETED
@@ -1,8 +0,0 @@
1
- srun: Job 4090534 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- Parameter 'function'=<function OpenICLEvalTask._score.<locals>.postprocess at 0x7f8c55906950> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
6
-
7
- 12/05 03:49:32 - OpenCompass - INFO - Task [internvl-chat-20b/bbh-tracking_shuffled_objects_five_objects]: {'score': 19.6}
8
- 12/05 03:49:32 - OpenCompass - INFO - time elapsed: 12.71s
 
 
 
 
 
 
 
 
 
20241205_033944/logs/eval/internvl-chat-20b/bbh-tracking_shuffled_objects_seven_objects.out DELETED
@@ -1,8 +0,0 @@
1
- srun: Job 4090519 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- Parameter 'function'=<function OpenICLEvalTask._score.<locals>.postprocess at 0x7fdeda40e950> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
6
-
7
- 12/05 03:49:25 - OpenCompass - INFO - Task [internvl-chat-20b/bbh-tracking_shuffled_objects_seven_objects]: {'score': 12.8}
8
- 12/05 03:49:25 - OpenCompass - INFO - time elapsed: 10.71s
 
 
 
 
 
 
 
 
 
20241205_033944/logs/eval/internvl-chat-20b/bbh-tracking_shuffled_objects_three_objects.out DELETED
@@ -1,8 +0,0 @@
1
- srun: Job 4090509 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- Parameter 'function'=<function OpenICLEvalTask._score.<locals>.postprocess at 0x7f2d74d22950> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
6
-
7
- 12/05 03:49:19 - OpenCompass - INFO - Task [internvl-chat-20b/bbh-tracking_shuffled_objects_three_objects]: {'score': 30.8}
8
- 12/05 03:49:19 - OpenCompass - INFO - time elapsed: 12.49s
 
 
 
 
 
 
 
 
 
20241205_033944/logs/eval/internvl-chat-20b/bbh-web_of_lies.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4090527 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 12/05 03:49:29 - OpenCompass - INFO - Task [internvl-chat-20b/bbh-web_of_lies]: {'score': 48.4}
6
- 12/05 03:49:29 - OpenCompass - INFO - time elapsed: 10.90s
 
 
 
 
 
 
 
20241205_033944/logs/eval/internvl-chat-20b/bbh-word_sorting.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4090521 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 12/05 03:49:26 - OpenCompass - INFO - Task [internvl-chat-20b/bbh-word_sorting]: {'score': 2.8000000000000003}
6
- 12/05 03:49:26 - OpenCompass - INFO - time elapsed: 11.19s
 
 
 
 
 
 
 
20241205_033944/logs/eval/internvl-chat-20b/ceval-accountant.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4090414 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 12/05 03:49:20 - OpenCompass - INFO - Task [internvl-chat-20b/ceval-accountant]: {'accuracy': 30.612244897959183}
6
- 12/05 03:49:20 - OpenCompass - INFO - time elapsed: 16.82s