Meta-Llama-3-120B-Instruct
Meta-Llama-3-120B-Instruct is a self-merge with meta-llama/Meta-Llama-3-70B-Instruct.
It was inspired by large merges like:
- alpindale/goliath-120b
- nsfwthrowitaway69/Venus-120b-v1.0
- cognitivecomputations/MegaDolphin-120b
- wolfram/miquliz-120b-v2.0.
π Applications
I recommend using this model for creative writing. It uses the Llama 3 chat template with a default context window of 8K (can be extended with rope theta).
Check the examples in the evaluation section to get an idea of its performance.
β‘ Quantized models
Thanks to Eric Hartford, elinas, and the mlx-community for providing these models.
- GGUF: https://huggingface.co/cognitivecomputations/Meta-Llama-3-120B-Instruct-gguf
- EXL2: https://huggingface.co/elinas/Meta-Llama-3-120B-Instruct-4.0bpw-exl2
- mlx: https://huggingface.co/mlx-community/Meta-Llama-3-120B-Instruct-4bit
π Evaluation
The model looks excellent for creating writing tasks, outperforming GPT-4. Thanks again to Eric Hartford for noticing this.
- X thread by Eric Hartford (creative writing): https://twitter.com/erhartford/status/1787050962114207886
- X thread by Daniel Kaiser (creative writing): https://twitter.com/spectate_or/status/1787257261309518101
- X thread by Simon (reasoning): https://twitter.com/NewDigitalEdu/status/1787403266894020893
- r/LocalLLaMa: https://www.reddit.com/r/LocalLLaMA/comments/1cl525q/goliath_lovers_where_is_the_feedback_about/
𧩠Configuration
slices:
- sources:
- layer_range: [0, 20]
model: meta-llama/Meta-Llama-3-70B-Instruct
- sources:
- layer_range: [10, 30]
model: meta-llama/Meta-Llama-3-70B-Instruct
- sources:
- layer_range: [20, 40]
model: meta-llama/Meta-Llama-3-70B-Instruct
- sources:
- layer_range: [30, 50]
model: meta-llama/Meta-Llama-3-70B-Instruct
- sources:
- layer_range: [40, 60]
model: meta-llama/Meta-Llama-3-70B-Instruct
- sources:
- layer_range: [50, 70]
model: meta-llama/Meta-Llama-3-70B-Instruct
- sources:
- layer_range: [60, 80]
model: meta-llama/Meta-Llama-3-70B-Instruct
merge_method: passthrough
dtype: float16
π» Usage
!pip install -qU transformers accelerate
from transformers import AutoTokenizer
import transformers
import torch
model = "mlabonne/Llama-3-120B"
messages = [{"role": "user", "content": "What is a large language model?"}]
tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
"text-generation",
model=model,
torch_dtype=torch.float16,
device_map="auto",
)
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
- Downloads last month
- 21
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for elinas/Meta-Llama-3-120B-Instruct-4.0bpw-exl2
Base model
meta-llama/Meta-Llama-3-70B
Finetuned
meta-llama/Meta-Llama-3-70B-Instruct