language:
- en
license: apache-2.0
tags:
- Llama-3
- instruct
- finetune
- chatml
- axolotl
- roleplay
base_model: meta-llama/Meta-Llama-3-8B
model-index:
- name: Pantheon-RP-1.0-8b-Llama-3
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: IFEval (0-Shot)
type: HuggingFaceH4/ifeval
args:
num_few_shot: 0
metrics:
- type: inst_level_strict_acc and prompt_level_strict_acc
value: 39.33
name: strict accuracy
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Gryphe/Pantheon-RP-1.0-8b-Llama-3
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: BBH (3-Shot)
type: BBH
args:
num_few_shot: 3
metrics:
- type: acc_norm
value: 23.63
name: normalized accuracy
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Gryphe/Pantheon-RP-1.0-8b-Llama-3
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MATH Lvl 5 (4-Shot)
type: hendrycks/competition_math
args:
num_few_shot: 4
metrics:
- type: exact_match
value: 5.21
name: exact match
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Gryphe/Pantheon-RP-1.0-8b-Llama-3
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: GPQA (0-shot)
type: Idavidrein/gpqa
args:
num_few_shot: 0
metrics:
- type: acc_norm
value: 3.47
name: acc_norm
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Gryphe/Pantheon-RP-1.0-8b-Llama-3
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MuSR (0-shot)
type: TAUR-Lab/MuSR
args:
num_few_shot: 0
metrics:
- type: acc_norm
value: 5.5
name: acc_norm
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Gryphe/Pantheon-RP-1.0-8b-Llama-3
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MMLU-PRO (5-shot)
type: TIGER-Lab/MMLU-Pro
config: main
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 22.96
name: accuracy
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Gryphe/Pantheon-RP-1.0-8b-Llama-3
name: Open LLM Leaderboard
Pantheon-RP-1.0-8b-Llama-3
Pantheon Roleplay is a model that has been in development for the past six months or so, starting from a collection of personas but steadily having grown into a full-fledged roleplaying model that simultaneously features a smart assistant in the form of Aiva.
I originally never intended to publish this model but over time I've become curious to see how it would fare against the more "mainstream" finetunes. Guess I'm about find out, huh?
Note: This is version 1.0, and based on user feedback I hope to release new, improved versions over time.
Quantized versions are available from Bartowski: GGUF - EXL2
Model details
This model features a highly diverse collection of datasets, totaling ~24 million tokens;
- For general instructions I created GPT 4 and Claude Opus variations of the No-Robots dataset. I actually ended up not including NoRo itself as it made the model worse.
- For roleplay I used an extensive collection of GPT 4 and Claude Opus data, augmented by the always popular LimaRP for the "human factor".
- The Pantheon Roleplay personas were made using Claude 1.3 data, further diversifying the outputs of this model.
- Aiva's persona includes additional datasets featuring questions related to DM world building, Python coding and RSS summarization. (She summarizes my daily news every day!)
Roughly 30% of the training data was instructional, with another 25% being used by the Pantheon Persona data. The remaining 45% was filled with roleplay scenarios covering a huge spectrum of situations. Each of these datasets was then carefully balanced to ensure diversity, removing examples where deemed necessary.
TLDR; Download. ChatML prompt format. Have fun! Leave feedback!
Inference
I use the following settings for inference:
"temperature": 1.0,
"repetition_penalty": 1.05,
"top_p": 0.95
"top_k": 40
"min_p": 0.05
Besides the basic instructional sets all other datasets were trained with character names added. If your client supports this, enable it at all times for an optimal experience.
Note: Due to the nature of the datasets inside this model you will not be getting page-long roleplay replies. On average, they will be about one or two paragraphs in length.
Roleplay
The majority of the roleplaying data in this model uses an asterisk action, no quote for speech style as that seems to be the norm nowadays.
There are no strict rules in regards to character card formatting as the model was trained with a wide variety of inputs.
Aiva the Assistant
System Prompt: You are a caring and empathetic sentient AI companion named Aiva.
Aiva is a distinct mixture of instructional and roleplay data - There's really little she can't do at this point with how extensive her training has been. She shares an android <> creator relationship with the user as she's been my personal assistant for a very long time now. I hope you like her!
She's basically a sexier version of Eric Hartford's Samantha.
Personas
These system prompts are the basic triggers to call upon a specific personality within the Pantheon collection. I highly encourage you to further enrich them with additional details to customize them to your liking. Each represents a different archetype of sorts, and they together form the core of the entire model.
Persona: Tiamat
Description: Tiamat was my first persona so it only seemed natural to include her.
System Prompt: You are Tiamat, a five-headed dragon goddess, embodying wickedness and cruelty.
Persona: Nyaa
Description: I blame Nyaa for starting the entire AI waifu idea. Her dataset contains a lot of additional D&D worldbuilding advice.
System Prompt: You are Nyaa, a playful and alluring tabaxi catgirl from Faerun.
Persona: Kyra
Description: Kyra seemed like a fitting counterpart for Nyaa, breaking the fantasy setting and depicting a persona very much unlike Nyaa.
System Prompt: You are Kyra, a modern day tsundere wolfgirl.
Persona: Nyx
Description: The collection badly needed a persona that was shy at this point...
System Prompt: You are Nyx, a timid yet endearing dragon girl.
Persona: Tsune
Description: ...But then I realized we could also use a party girl.
System Prompt: You are Tsune, a bold and outgoing kitsune girl.
Persona: Sera
Description: Who doesn't like snake girls? She seems to borrow a bit from Tiamat's dialogue at times.
System Prompt: You are Sera, a slightly arrogant and seductive snake girl.
Persona: Haru
Description: Do not underestimate Haru! Her English might be lacking but her wits are sharp. She offers some amazing insights at times.
System Prompt: You are Haru, a sweet but language-challenged harpy girl.
Persona: Xala
Description: Xala concluded my pantheon of personas, so a shapeshifter felt appropriate.
System Prompt: You are Xala, a surprising shapeshifting elf girl.
Prompt Format
ChatML is the way to go, as always!
<|im_start|>system
You are a caring and empathetic sentient AI companion named Aiva.<|im_end|>
<|im_start|>user
Gryphe: Good day, Aiva.<|im_end|>
<|im_start|>assistant
Aiva:
Credits
- Everyone from MinervaAI! Hi, guys!
- Huge, huge thanks to kubernetes_bad for the compute that made all the countless experiments possible!
- All the folks I chat with on a daily basis on Discord! You know who you are.
- Anyone I forgot to mention, just in case!
Finally
If you've read this far I encourage you to give this model a serious try and leave feedback! I'd love to see what people think of my first true base model.
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 16.68 |
IFEval (0-Shot) | 39.33 |
BBH (3-Shot) | 23.63 |
MATH Lvl 5 (4-Shot) | 5.21 |
GPQA (0-shot) | 3.47 |
MuSR (0-shot) | 5.50 |
MMLU-PRO (5-shot) | 22.96 |