|
--- |
|
license: openrail |
|
datasets: |
|
- teknium/OpenHermes-2.5 |
|
language: |
|
- en |
|
library_name: transformers |
|
pipeline_tag: text-generation |
|
--- |
|
# Model Card for neoncortex/mini-mistral-openhermes-2.5-chatml-test |
|
|
|
A tiny Mistral model trained as an experiment on teknium/OpenHermes-2.5. |
|
|
|
## Model Details |
|
|
|
A 63M parameter auto-regressive LM using Mistral architecture as a base. |
|
- Multi-query Attention instead of Grouped-query Attention. |
|
- Sliding window is disabled. |
|
- Modified ChatML instead of Mistral chat template - TL;DR I used '<|im_start|>human' instead of '<|im_start|>user' |
|
|
|
### Model Description |
|
|
|
Just doing it to see what happens. |
|
|
|
It'll take about 40 to 45 hours to train on two Nvidia RTX 3060 12GB. |
|
|
|
It uses ChatML for the chat template, but I messed up the template in the dataset, |
|
using '<|im_start|>human' instead of '<|im_start|>user'. ¯\_(ツ)_/¯ |
|
So, here's the bits: |
|
|
|
``` |
|
{%- set ns = namespace(found=false) -%} |
|
{%- for message in messages -%} |
|
{%- if message['role'] == 'system' -%} |
|
{%- set ns.found = true -%} |
|
{%- endif -%} |
|
{%- endfor -%} |
|
{%- for message in messages %} |
|
{%- if message['role'] == 'system' -%} |
|
{{- '<|im_start|>system\n' + message['content'].rstrip() + '<|im_end|>\n' -}} |
|
{%- else -%} |
|
{%- if message['role'] == 'human' -%} |
|
{{-'<|im_start|>human\n' + message['content'].rstrip() + '<|im_end|>\n'-}} |
|
{%- else -%} |
|
{{-'<|im_start|>assistant\n' + message['content'] + '<|im_end|>\n' -}} |
|
{%- endif -%} |
|
{%- endif -%} |
|
{%- endfor -%} |
|
{%- if add_generation_prompt -%} |
|
{{-'<|im_start|>assistant\n'-}} |
|
{%- endif -%} |
|
``` |
|
|
|
- **Developed by:** RoboApocalypse |
|
- **Funded by:** RoboApocalypse |
|
- **Shared by:** RoboApocalypse |
|
- **Model type:** Mistral |
|
- **Language(s) (NLP):** English, maybe others idk |
|
- **License:** OpenRAIL |
|
|
|
### Model Sources |
|
|
|
Exclusively available right here on HuggingFace! |
|
|
|
- **Repository:** https://huggingface.co/neoncortex/mini-mistral-openhermes-2.5-chatml-test |
|
|
|
## Uses |
|
|
|
None |
|
|
|
### Out-of-Scope Use |
|
|
|
This model won't work well for pretty much everything, probably. |
|
|
|
#### Preprocessing |
|
|
|
Format the OpenHermes 2.5 dataset with ChatML. |
|
|
|
#### Training Hyperparameters |
|
|
|
- **Training regime:** bf16 mixed precision |
|
|
|
## Evaluation |
|
|
|
I tried to run evals but the eval suite just laughed at me. |
|
|
|
## Model Examination |
|
|
|
Don't be rude. |
|
|
|
## Environmental Impact |
|
|
|
- **Hardware Type:** 2 x NVIDIA RTX 3060 12GB |
|
- **Hours used:** ~45 x 2. |
|
- **Carbon Emitted:** [TBA] |
|
|
|
### Compute Infrastructure |
|
|
|
I trained it on my PC with no side on it because I like to watch the GPUs do their work. |
|
|
|
#### Hardware |
|
|
|
2 x Nvidia RTX 3060 12GB |
|
|
|
#### Software |
|
|
|
The wonderful free stuff at HuggingFace (https://huggingface.co)[https://huggingface.co]: transformers, datasets, trl |
|
|