Triangle104's picture
Adding Evaluation Results (#1)
9d51581 verified
metadata
license: apache-2.0
library_name: transformers
tags:
  - mergekit
  - merge
base_model:
  - nbeerbower/mistral-nemo-gutenberg-12B-v4
  - Gryphe/Pantheon-RP-1.6.1-12b-Nemo
  - anthracite-org/magnum-v4-12b
model-index:
  - name: Pans_Gutenbergum_V0.1
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: IFEval (0-Shot)
          type: HuggingFaceH4/ifeval
          args:
            num_few_shot: 0
        metrics:
          - type: inst_level_strict_acc and prompt_level_strict_acc
            value: 30.97
            name: strict accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Triangle104/Pans_Gutenbergum_V0.1
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: BBH (3-Shot)
          type: BBH
          args:
            num_few_shot: 3
        metrics:
          - type: acc_norm
            value: 36.08
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Triangle104/Pans_Gutenbergum_V0.1
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MATH Lvl 5 (4-Shot)
          type: hendrycks/competition_math
          args:
            num_few_shot: 4
        metrics:
          - type: exact_match
            value: 10.27
            name: exact match
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Triangle104/Pans_Gutenbergum_V0.1
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GPQA (0-shot)
          type: Idavidrein/gpqa
          args:
            num_few_shot: 0
        metrics:
          - type: acc_norm
            value: 9.73
            name: acc_norm
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Triangle104/Pans_Gutenbergum_V0.1
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MuSR (0-shot)
          type: TAUR-Lab/MuSR
          args:
            num_few_shot: 0
        metrics:
          - type: acc_norm
            value: 16.33
            name: acc_norm
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Triangle104/Pans_Gutenbergum_V0.1
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU-PRO (5-shot)
          type: TIGER-Lab/MMLU-Pro
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 29.96
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Triangle104/Pans_Gutenbergum_V0.1
          name: Open LLM Leaderboard

This seems to be working a bit better than my last merge attempts:

USER: Write a story about a Hobbit called Bilbo Baggins.

ASSISTANT:

Bildo Baggins lived in the quiet Shire village of Hobbiton. He was known far and wide for his love of the simple things in life - good food, fine ale, and quality time spent with friends.

One day, as he sat by the fireside in the Green Dragon Inn, Bildo overheard a group of hobbits discussing their latest hobby: collecting unique and interesting names. Intrigued, Bildo decided to embark on this new pastime himself.

Over the next few weeks, Bildo made it his mission to meet as many interesting characters as possible. He traveled far and wide across Middle-earth, chatting with elves, dwarves, men, and even a few peculiar creatures along the way. Each encounter yielded a new name for his growing collection.

As word of Bildo's quest spread, hobbits from all over began to seek him out, eager to share their own unusual names or to hear about the ones he had gathered. Before long, Bildo had amassed an impressive list - from the whimsical (Pippa Pincherly) to the downright peculiar (Thorn Toecracker).

But Bildo's true passion was not just collecting names, but understanding the stories behind them. He would spend hours listening intently as his new acquaintances recounted tales of adventure and misadventure, weaving a tapestry of Middle-earth's history through the lens of its inhabitants' names.

As time went on, Bildo became something of a local celebrity in Hobbiton. Hobbits would gather at his doorstep to hear him recount stories and share the names he had collected. And so, Bildo Baggins found his true calling - not as an adventurer or scholar, but as a keeper of Middle-earth's oral history, one name at a time.

And that, my dear reader, is the tale of how Bildo Baggins became known far and wide for his love of unusual names and the stories behind them. May your own journeys be filled with wonder and curiosity, just like those of our friend Bildo!


merge

This is a merge of pre-trained language models created using mergekit.

Merge Details

Merge Method

This model was merged using the TIES merge method using anthracite-org/magnum-v4-12b as a base.

Models Merged

The following models were included in the merge:

Configuration

The following YAML configuration was used to produce this model:

models:
  - model: anthracite-org/magnum-v4-12b
    #no parameters necessary for base model
  - model: Gryphe/Pantheon-RP-1.6.1-12b-Nemo
    parameters:
      density: 0.5
      weight: 0.5
  - model: nbeerbower/mistral-nemo-gutenberg-12B-v4
    parameters:
      density: 0.5
      weight: 0.5

merge_method: ties
base_model: anthracite-org/magnum-v4-12b
parameters:
  normalize: false
  int8_mask: true
dtype: float16

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 22.23
IFEval (0-Shot) 30.97
BBH (3-Shot) 36.08
MATH Lvl 5 (4-Shot) 10.27
GPQA (0-shot) 9.73
MuSR (0-shot) 16.33
MMLU-PRO (5-shot) 29.96