abideen
/

NexoNimbus-MoE-2x7B

Text Generation

Mixture of Experts

abideen/NexoNimbus-7B

mlabonne/NeuralMarcoro14-7B

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

abideen commited on Jan 14, 2024

Commit

c9af20d

·

verified ·

1 Parent(s): afe7dee

Update README.md

Files changed (1) hide show

README.md +12 -12

README.md CHANGED Viewed

@@ -27,21 +27,21 @@ NexoNimbus-MoE-2x7B is the 10th best-performing 13B LLM on the Open LLM Leaderbo
 |    Task     |Version| Metric |Value|   |Stderr|
 |-------------|------:|--------|----:|---|-----:|
-|arc_challenge|      0|acc     |68.25|±  |  1.36|
-|             |       |acc_norm|70.81|±  |  1.38|
-|hellaswag    |      0|acc     |70.86|±  |  0.45|
-|             |       |acc_norm|87.86|±  |  0.32|
-|gsm8k        |      0|acc     |70.35|±  |  1.25|
-|winogrande   |      0|acc     |84.84|±  |  1.00|
-|mmlu         |      0|acc     |64.69|±  |  1.00|
-Average: 73.5%
 ### TruthfulQA
 |    Task     |Version|Metric|Value|   |Stderr|
 |-------------|------:|------|----:|---|-----:|
-|truthfulqa_mc|      1|mc1   |46.26|±  |  1.74|
-|             |       |mc2   |62.42|±  |  1.54|
 ## 🧩 Configuration
@@ -93,7 +93,7 @@ experts:
 ## 💻 Usage
-Here's a [Colab notebook](https://colab.research.google.com/drive/1F9lzL1IeZRMgiSbY9UbgCR__RreIflJh?usp=sharing) to run NexoNimbus-MoE-2x7B in 4-bit precision on a free T4 GPU.
 ```python
 !pip install -qU transformers bitsandbytes accelerate
@@ -111,7 +111,7 @@ pipeline = transformers.pipeline(
     model_kwargs={"torch_dtype": torch.float16, "load_in_4bit": True},
 )
-messages = [{"role": "user", "content": "Explain what a Mixture of Experts is in less than 100 words."}]
 prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
 outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
 print(outputs[0]["generated_text"])

 |    Task     |Version| Metric |Value|   |Stderr|
 |-------------|------:|--------|----:|---|-----:|
+|arc_challenge|      0|acc     |62.28|±  |  1.41|
+|             |       |acc_norm|66.80|±  |  1.37|
+|hellaswag    |      0|acc     |66.83|±  |  0.46|
+|             |       |acc_norm|85.66|±  |  0.34|
+|gsm8k        |      0|acc     |53.52|±  |  1.37|
+|winogrande   |      0|acc     |81.53|±  |  1.09|
+|mmlu         |      0|acc     |64.51|±  |  1.00|
+Average: 67.51%
 ### TruthfulQA
 |    Task     |Version|Metric|Value|   |Stderr|
 |-------------|------:|------|----:|---|-----:|
+|truthfulqa_mc|      1|mc1   |35.98|±  |  1.68|
+|             |       |mc2   |53.05|±  |  1.53|
 ## 🧩 Configuration
 ## 💻 Usage
+Here's a [Colab notebook](https://colab.research.google.com/drive/1B1Q7vO95cDkEJbKIPhOWr6exB9-Q_lr-?usp=sharing) to run NexoNimbus-MoE-2x7B in 4-bit precision on a free T4 GPU.
 ```python
 !pip install -qU transformers bitsandbytes accelerate
     model_kwargs={"torch_dtype": torch.float16, "load_in_4bit": True},
 )
+messages = [{"role": "user", "content": "Explain what is machine learning."}]
 prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
 outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
 print(outputs[0]["generated_text"])