|
--- |
|
language: |
|
- en |
|
library_name: transformers |
|
pipeline_tag: text-generation |
|
datasets: |
|
- jondurbin/airoboros-2.2.1 |
|
- Open-Orca/OpenOrca |
|
- garage-bAInd/Open-Platypus |
|
- ehartford/samantha-data |
|
- stingning/ultrachat |
|
tags: |
|
- llama-2 |
|
- code |
|
license: llama2 |
|
model-index: |
|
- name: SpeechlessCoder |
|
results: |
|
- task: |
|
type: text-generation |
|
dataset: |
|
type: openai_humaneval |
|
name: HumanEval |
|
metrics: |
|
- name: pass@1 |
|
type: pass@1 |
|
value: 34.146 |
|
verified: false |
|
--- |
|
|
|
<p><h1> speechless-mistral-six-in-one-7b </h1></p> |
|
|
|
# JUST for TEST! |
|
|
|
This model is a merge of 6 SOTA Mistral-7B based models: |
|
- ehartford/dolphin-2.1-mistral-7b |
|
- Open-Orca/Mistral-7B-OpenOrca |
|
- bhenrym14/mistral-7b-platypus-fp16 |
|
- ehartford/samantha-1.2-mistral-7b |
|
- iteknium/CollectiveCognition-v1.1-Mistral-7B |
|
- CollectiveCognition/chats-data-2023-09-27 |
|
- HuggingFaceH4/zephyr-7b-alpha |
|
|
|
|
|
## HumanEval |
|
|
|
| Metric | Value | |
|
| --- | --- | |
|
| humaneval-python | | |
|
|
|
[Big Code Models Leaderboard](https://huggingface.co/spaces/bigcode/bigcode-models-leaderboard) |
|
|
|
CodeLlama-34B-Python: 53.29 |
|
|
|
CodeLlama-34B-Instruct: 50.79 |
|
|
|
CodeLlama-13B-Instruct: 50.6 |
|
|
|
CodeLlama-34B: 45.11 |
|
|
|
CodeLlama-13B-Python: 42.89 |
|
|
|
CodeLlama-13B: 35.07 |
|
|
|
Mistral-7B-v0.1: 30.488 |
|
|
|
## LM-Evaluation-Harness |
|
|
|
[Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) |
|
|
|
| Metric | Value | |
|
| --- | --- | |
|
| ARC | | |
|
| HellaSwag | | |
|
| MMLU | | |
|
| TruthfulQA | | |
|
| Average | | |
|
|
|
# Model Card for Mistral-7B-v0.1 |
|
|
|
The Mistral-7B-v0.1 Large Language Model (LLM) is a pretrained generative text model with 7 billion parameters. |
|
Mistral-7B-v0.1 outperforms Llama 2 13B on all benchmarks we tested. |
|
|
|
For full details of this model please read our [paper](https://arxiv.org/abs/2310.06825) and [release blog post](https://mistral.ai/news/announcing-mistral-7b/). |
|
|
|
## Model Architecture |
|
|
|
Mistral-7B-v0.1 is a transformer model, with the following architecture choices: |
|
- Grouped-Query Attention |
|
- Sliding-Window Attention |
|
- Byte-fallback BPE tokenizer |
|
|
|
## Troubleshooting |
|
|
|
- If you see the following error: |
|
`` |
|
KeyError: 'mistral' |
|
`` |
|
- Or: |
|
`` |
|
NotImplementedError: Cannot copy out of meta tensor; no data! |
|
`` |
|
|
|
Ensure you are utilizing a stable version of Transformers, 4.34.0 or newer. |
|
|
|
## Notice |
|
|
|
Mistral 7B is a pretrained base model and therefore does not have any moderation mechanisms. |
|
|
|
## The Mistral AI Team |
|
|
|
Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed.` |
|
|