nsfw

Not-For-All-Audiences

Eval Results

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Gembo-v1.1-70b / README.md

ChuckMcSneed

Update README.md

4447986 verified 10 months ago

preview code

raw

history blame contribute delete

6.83 kB

	---
	language:
	- en
	- ru
	license: llama2
	tags:
	- merge
	- mergekit
	- nsfw
	- not-for-all-audiences
	model-index:
	- name: Gembo-v1.1-70b
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: AI2 Reasoning Challenge (25-Shot)
	type: ai2_arc
	config: ARC-Challenge
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: acc_norm
	value: 70.99
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ChuckMcSneed/Gembo-v1.1-70b
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: HellaSwag (10-Shot)
	type: hellaswag
	split: validation
	args:
	num_few_shot: 10
	metrics:
	- type: acc_norm
	value: 86.9
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ChuckMcSneed/Gembo-v1.1-70b
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MMLU (5-Shot)
	type: cais/mmlu
	config: all
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 70.63
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ChuckMcSneed/Gembo-v1.1-70b
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: TruthfulQA (0-shot)
	type: truthful_qa
	config: multiple_choice
	split: validation
	args:
	num_few_shot: 0
	metrics:
	- type: mc2
	value: 62.45
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ChuckMcSneed/Gembo-v1.1-70b
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Winogrande (5-shot)
	type: winogrande
	config: winogrande_xl
	split: validation
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 80.51
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ChuckMcSneed/Gembo-v1.1-70b
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: GSM8k (5-shot)
	type: gsm8k
	config: main
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 50.64
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ChuckMcSneed/Gembo-v1.1-70b
	name: Open LLM Leaderboard
	---
	![logo-gembo-1.1.png](logo-gembo-1.1.png)
	This is like [Gembo v1](https://huggingface.co/ChuckMcSneed/Gembo-v1-70b), but with 6-7% more human data. Does perform a bit worse on the benches(who cares? I do.), but should be able to write in more diverse styles(See [waxwing-styles.txt](waxwing-styles.txt), tested it with v1, v1 does it better.). Mainly made for RP, but should be okay as an assistant. Turned out quite good, considering the amount of LORAs I merged into it.

	# Observations
	- GPTisms and repetition: put temperature and rep. pen. higher, make GPTisms stop sequences
	- A bit different than the ususal stuff; I'd say that it has so much slop in it that it unslops itself
	- Lightly censored
	- Fairly neutral, can be violent if you ask it really good, Goliath is a bit better at it
	- Has a bit of optimism baked in, but it's not very severe, maybe a tiny bit more than in v1?
	- Don't put too many style tags, here less is better
	- Unlike v1, 1.1 knows a bit better when to stop
	- Needs more wrangling than v1, but once you get it going it's good
	- Sometimes can't handle '
	- Moderately intelligent
	- Quite creative

	# Worth over v1?
	Nah. I prefer hyperslop over this "humanized" one. Maybe I've been poisoned by slop.

	# Naming
	Internal name of this model was euryale-guano-saiga-med-janboros-kim-wing-lima-wiz-tony-d30-s40, but I decided to keep it short, and since it was iteration G in my files, I called it "Gembo".

	# Prompt format
	Alpaca. You can also try some other formats, I'm pretty sure it has a lot of them from all those merges.
	```
	### Instruction:
	{instruction}

	### Response:
	```

	# Settings
	As I already mentioned, high temperature and rep.pen. works great.
	For RP try something like this:
	- temperature=5
	- MinP=0.10
	- rep.pen.=1.15

	Adjust to match your needs.


	# How it was created
	I took Sao10K/Euryale-1.3-L2-70B (Good base model) and added
	- Mikael110/llama-2-70b-guanaco-qlora (Creativity+assistant)
	- IlyaGusev/saiga2_70b_lora (Creativity+assistant)
	- s1ghhh/medllama-2-70b-qlora-1.1 (More data)
	- v2ray/Airoboros-2.1-Jannie-70B-QLoRA (Creativity+assistant)
	- Chat-Error/fiction.live-Kimiko-V2-70B (Creativity)
	- alac/Waxwing-Storytelling-70B-LoRA (New, creativity)
	- Doctor-Shotgun/limarpv3-llama2-70b-qlora (Creativity)
	- v2ray/LLaMA-2-Wizard-70B-QLoRA (Creativity+assistant)
	- v2ray/TonyGPT-70B-QLoRA (Special spice)

	Then I SLERP-merged it with cognitivecomputations/dolphin-2.2-70b (Needed to bridge the gap between this wonderful mess and Smaxxxer, otherwise it's quality is low) with 0.3t and then SLERP-merged it again with ChuckMcSneed/SMaxxxer-v1-70b (Creativity) with 0.4t. For SLERP-merges I used https://github.com/arcee-ai/mergekit.

	# Benchmarks (Do they even mean anything anymore?)
	### NeoEvalPlusN_benchmark
	[My meme benchmark.](https://huggingface.co/datasets/ChuckMcSneed/NeoEvalPlusN_benchmark)
	\| Test name \| Gembo \| Gembo 1.1 \|
	\| ---------- \| ---------- \| ---------- \|
	\| B \| 2.5 \| 2.5 \|
	\| C \| 1.5 \| 1.5 \|
	\| D \| 3 \| 3 \|
	\| S \| 7.5 \| 6.75 \|
	\| P \| 5.25 \| 5.25 \|
	\| Total \| 19.75 \| 19 \|

	### [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
	[Leaderboard on Huggingface](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
	\|Model \|Average\|ARC \|HellaSwag\|MMLU \|TruthfulQA\|Winogrande\|GSM8K\|
	\|--------------\|-------\|-----\|---------\|-----\|----------\|----------\|-----\|
	\|Gembo-v1-70b \|70.51 \|71.25\|86.98 \|70.85\|63.25 \|80.51 \|50.19\|
	\|Gembo-v1.1-70b\|70.35 \|70.99\|86.9 \|70.63\|62.45 \|80.51 \|50.64\|


	Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_ChuckMcSneed__Gembo-v1.1-70b)

	\| Metric \|Value\|
	\|---------------------------------\|----:\|
	\|Avg. \|70.35\|
	\|AI2 Reasoning Challenge (25-Shot)\|70.99\|
	\|HellaSwag (10-Shot) \|86.90\|
	\|MMLU (5-Shot) \|70.63\|
	\|TruthfulQA (0-shot) \|62.45\|
	\|Winogrande (5-shot) \|80.51\|
	\|GSM8k (5-shot) \|50.64\|