neoncortex
/

mini-mistral-openhermes-2.5-chatml-test

@@ -9,35 +9,67 @@ pipeline_tag: text-generation
 ---
 # Model Card for neoncortex/mini-mistral-openhermes-2.5-chatml-test
-A tiny Mistral model trained on teknium/OpenHermes-2.5. This is epoch 3/9, so it's early in training.
 ## Model Details
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
 ## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
 ### Direct Use
@@ -53,7 +85,7 @@ A tiny Mistral model trained on teknium/OpenHermes-2.5. This is epoch 3/9, so it
 ### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
 [More Information Needed]
@@ -87,16 +119,20 @@ Use the code below to get started with the model.
 <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
 #### Training Hyperparameters
 - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
 <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
@@ -104,41 +140,11 @@ Use the code below to get started with the model.
 ## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
-#### Factors
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
-### Results
-[More Information Needed]
-#### Summary
-## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
 ## Environmental Impact
@@ -146,13 +152,13 @@ Use the code below to get started with the model.
 Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
-## Technical Specifications [optional]
 ### Model Architecture and Objective
@@ -160,42 +166,28 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
 ### Compute Infrastructure
-[More Information Needed]
 #### Hardware
-[More Information Needed]
 #### Software
-[More Information Needed]
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
-[More Information Needed]
-**APA:**
 [More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
 ## Model Card Contact
-[More Information Needed]

 ---
 # Model Card for neoncortex/mini-mistral-openhermes-2.5-chatml-test
+A tiny Mistral model trained on teknium/OpenHermes-2.5.
+This is epoch 5/9, so still some training to go.
 ## Model Details
+A 63M parameter auto-regressive LM using Mistral architecture as a base.
+- Multi-query Attention instead of Grouped-query Attention.
+- Sliding window is disabled.
+- Modified ChatML instead of Mistral chat template - TL;DR I used '<|im_start|>human' instead of '<|im_start|>user'
+### Model Description
+Just doing it to see what happens.
+It'll take about 40 to 45 hours to train on two Nvidia RTX 3060 12GB.
+It uses ChatML for the chat template, but I fucked up the template in the dataset,
+using '<|im_start|>human' instead of '<|im_start|>user'. ¯\_(ツ)_/¯
+So, here's the bits:
+```
+{%- set ns = namespace(found=false) -%}
+{%- for message in messages -%}
+    {%- if message['role'] == 'system' -%}
+        {%- set ns.found = true -%}
+    {%- endif -%}
+{%- endfor -%}
+{%- for message in messages %}
+    {%- if message['role'] == 'system' -%}
+        {{- '<|im_start|>system\n' + message['content'].rstrip() + '<|im_end|>\n' -}}
+    {%- else -%}
+        {%- if message['role'] == 'user' -%}
+            {{-'<|im_start|>human\n' + message['content'].rstrip() + '<|im_end|>\n'-}}
+        {%- else -%}
+            {{-'<|im_start|>assistant\n' + message['content'] + '<|im_end|>\n' -}}
+        {%- endif -%}
+    {%- endif -%}
+{%- endfor -%}
+{%- if add_generation_prompt -%}
+    {{-'<|im_start|>assistant\n'-}}
+{%- endif -%}
+```
+- **Developed by:** gronkomatic
+- **Funded by:** gronkomatic
+- **Shared by:** gronkomatic
+- **Model type:** Mistral
+- **Language(s) (NLP):** English, maybe others I dunno
+- **License:** OpenRAIL, IDGAF
+### Model Sources
+Exclusively available right here on HuggingFace!
+- **Repository:** https://huggingface.co/neoncortex/mini-mistral-openhermes-2.5-chatml-test
+- **Paper:** LoL
+- **Demo:** Just download it in Oobabooga and use the modified chatML template above. Maybe I'll throw together a Space or something.
 ## Uses
+If you wanna have a laugh at how bad it is then go ahead, but I wouldn't expect much from it.
 ### Direct Use
 ### Out-of-Scope Use
+This model won't work well for pretty much everything, probably.
 [More Information Needed]
 <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing
+I took the OpenHermes 2.5 dataset formatted it with ChatML.
 #### Training Hyperparameters
 - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times
+epochs: 9
+steps: 140976
+batches per device: 6
+1.04it/s
 <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
 ## Evaluation
+I tried to run evals but the eval suite just laughed at me.
+## Model Examination
+Don't be rude.
 ## Environmental Impact
 Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** I already told you. Try and keep up.
+- **Hours used:** ~45 x 2 I guess.
+- **Cloud Provider:** gronkomatic
+- **Compute Region:** myob
+- **Carbon Emitted:** Probably
+## Technical Specifications
 ### Model Architecture and Objective
 ### Compute Infrastructure
+I trained it on my PC with no side on it because I like to watch the GPUs do their work.
 #### Hardware
+2 x Nvidia RTX 3060 12GB
 #### Software
+The wonderful free stuff at HuggingFace (https://huggingface.co)[https://huggingface.co]: transformers, datasets, trl
+## Glossary
+IDGAF - I don't give a fuck
+## More Information
 [More Information Needed]
+## Model Card Authors
+gronkomatic, unless you're offended by something, in which case it was hacked by hackers.
 ## Model Card Contact
+If you want to send me insults come find me on Reddit I guess u/gronkomatic.