File size: 5,353 Bytes

---
license: apache-2.0
model-index:
- name: LMCocktail-Mistral-7B-v1
  results:
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: AI2 Reasoning Challenge (25-Shot)
      type: ai2_arc
      config: ARC-Challenge
      split: test
      args:
        num_few_shot: 25
    metrics:
    - type: acc_norm
      value: 66.21
      name: normalized accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Yhyu13/LMCocktail-Mistral-7B-v1
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: HellaSwag (10-Shot)
      type: hellaswag
      split: validation
      args:
        num_few_shot: 10
    metrics:
    - type: acc_norm
      value: 85.69
      name: normalized accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Yhyu13/LMCocktail-Mistral-7B-v1
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MMLU (5-Shot)
      type: cais/mmlu
      config: all
      split: test
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 61.64
      name: accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Yhyu13/LMCocktail-Mistral-7B-v1
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: TruthfulQA (0-shot)
      type: truthful_qa
      config: multiple_choice
      split: validation
      args:
        num_few_shot: 0
    metrics:
    - type: mc2
      value: 61.37
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Yhyu13/LMCocktail-Mistral-7B-v1
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: Winogrande (5-shot)
      type: winogrande
      config: winogrande_xl
      split: validation
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 77.35
      name: accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Yhyu13/LMCocktail-Mistral-7B-v1
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: GSM8k (5-shot)
      type: gsm8k
      config: main
      split: test
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 47.23
      name: accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Yhyu13/LMCocktail-Mistral-7B-v1
      name: Open LLM Leaderboard
---

# LM-cocktail Mistral 7B v1


This is a 50%-50% model of two best Mistral models

https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2

https://huggingface.co/xDAN-AI/xDAN-L1-Chat-RL-v1

both claimed to be better than chatgpt-3.5-turbo in almost all metrics.

# Alpaca Eval

I am thrilled to announce that ChatGPT has ranked LMCocktail 7B as the second best model next to GPT4 on AlpcaEval in my local community run, even greater than my previously best [LMCocktail-10.7B-v1](https://huggingface.co/Yhyu13/LMCocktail-10.7B-v1.git) model. You can also check the leaderboard at [./Alpaca_eval/chatgpt_fn_--LMCocktail-Mistral-7B-v1/](./Alpaca_eval/chatgpt_fn_--LMCocktail-Mistral-7B-v1/)

```
                        win_rate  standard_error  n_total  avg_length
gpt4                       73.79            1.54      805        1365
LMCocktail-7B-v1(new)      73.54            1.55      805        1870
LMCocktail-10.7B-v1(new)   73.45            1.56      804        1203
claude                     70.37            1.60      805        1082
chatgpt                    66.09            1.66      805         811
wizardlm-13b               65.16            1.67      805         985
vicuna-13b                 64.10            1.69      805        1037
guanaco-65b                62.36            1.71      805        1249
oasst-rlhf-llama-33b       62.05            1.71      805        1079
alpaca-farm-ppo-human      60.25            1.72      805         803
falcon-40b-instruct        56.52            1.74      805         662
text_davinci_003           50.00            0.00      805         307
alpaca-7b                  45.22            1.74      805         396
text_davinci_001           28.07            1.56      805         296
```


# Code

The LM-cocktail is novel technique for merging multiple models https://arxiv.org/abs/2311.13534

Code is backed up by this repo https://github.com/FlagOpen/FlagEmbedding.git

Merging scripts available under the [./scripts](./scripts) folder

# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Yhyu13__LMCocktail-Mistral-7B-v1)

|             Metric              |Value|
|---------------------------------|----:|
|Avg.                             |66.58|
|AI2 Reasoning Challenge (25-Shot)|66.21|
|HellaSwag (10-Shot)              |85.69|
|MMLU (5-Shot)                    |61.64|
|TruthfulQA (0-shot)              |61.37|
|Winogrande (5-shot)              |77.35|
|GSM8k (5-shot)                   |47.23|