nomadicsynth
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -24,7 +24,7 @@ Just doing it to see what happens.
|
|
24 |
|
25 |
It'll take about 40 to 45 hours to train on two Nvidia RTX 3060 12GB.
|
26 |
|
27 |
-
It uses ChatML for the chat template, but I
|
28 |
using '<|im_start|>human' instead of '<|im_start|>user'. ¯\_(ツ)_/¯
|
29 |
So, here's the bits:
|
30 |
|
@@ -56,57 +56,30 @@ So, here's the bits:
|
|
56 |
- **Shared by:** RoboApocalypse
|
57 |
- **Model type:** Mistral
|
58 |
- **Language(s) (NLP):** English, maybe others I dunno
|
59 |
-
- **License:** OpenRAIL
|
60 |
|
61 |
### Model Sources
|
62 |
|
63 |
Exclusively available right here on HuggingFace!
|
64 |
|
65 |
- **Repository:** https://huggingface.co/neoncortex/mini-mistral-openhermes-2.5-chatml-test
|
66 |
-
- **Paper:** LoL
|
67 |
-
- **Demo:** Just download it in Oobabooga and use the modified chatML template above. Maybe I'll throw together a Space or something.
|
68 |
|
69 |
## Uses
|
70 |
|
71 |
-
|
72 |
|
73 |
### Out-of-Scope Use
|
74 |
|
75 |
This model won't work well for pretty much everything, probably.
|
76 |
|
77 |
-
## How to Get Started with the Model
|
78 |
-
|
79 |
-
Use the code below to get started with the model.
|
80 |
-
|
81 |
-
[More Information Needed]
|
82 |
-
|
83 |
-
## Training Details
|
84 |
-
|
85 |
-
### Training Data
|
86 |
-
|
87 |
-
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
88 |
-
|
89 |
-
[More Information Needed]
|
90 |
-
|
91 |
-
### Training Procedure
|
92 |
-
|
93 |
-
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
94 |
-
|
95 |
#### Preprocessing
|
96 |
|
97 |
-
|
98 |
|
99 |
#### Training Hyperparameters
|
100 |
|
101 |
- **Training regime:** bf16 mixed precision
|
102 |
|
103 |
-
#### Speeds, Sizes, Times
|
104 |
-
|
105 |
-
epochs: 9
|
106 |
-
steps: 140976
|
107 |
-
batches per device: 6
|
108 |
-
1.04it/s
|
109 |
-
|
110 |
## Evaluation
|
111 |
|
112 |
I tried to run evals but the eval suite just laughed at me.
|
@@ -117,11 +90,9 @@ Don't be rude.
|
|
117 |
|
118 |
## Environmental Impact
|
119 |
|
120 |
-
- **Hardware Type:**
|
121 |
-
- **Hours used:** ~45 x 2
|
122 |
-
- **
|
123 |
-
- **Compute Region:** myob
|
124 |
-
- **Carbon Emitted:** Yes, definitely
|
125 |
|
126 |
### Compute Infrastructure
|
127 |
|
@@ -134,11 +105,3 @@ I trained it on my PC with no side on it because I like to watch the GPUs do the
|
|
134 |
#### Software
|
135 |
|
136 |
The wonderful free stuff at HuggingFace (https://huggingface.co)[https://huggingface.co]: transformers, datasets, trl
|
137 |
-
|
138 |
-
## Model Card Authors
|
139 |
-
|
140 |
-
RoboApocalypse, unless you're offended by something, in which case it was hacked by hackers.
|
141 |
-
|
142 |
-
## Model Card Contact
|
143 |
-
|
144 |
-
If you want to send me insults come find me on Reddit I guess.
|
|
|
24 |
|
25 |
It'll take about 40 to 45 hours to train on two Nvidia RTX 3060 12GB.
|
26 |
|
27 |
+
It uses ChatML for the chat template, but I messed up the template in the dataset,
|
28 |
using '<|im_start|>human' instead of '<|im_start|>user'. ¯\_(ツ)_/¯
|
29 |
So, here's the bits:
|
30 |
|
|
|
56 |
- **Shared by:** RoboApocalypse
|
57 |
- **Model type:** Mistral
|
58 |
- **Language(s) (NLP):** English, maybe others I dunno
|
59 |
+
- **License:** OpenRAIL
|
60 |
|
61 |
### Model Sources
|
62 |
|
63 |
Exclusively available right here on HuggingFace!
|
64 |
|
65 |
- **Repository:** https://huggingface.co/neoncortex/mini-mistral-openhermes-2.5-chatml-test
|
|
|
|
|
66 |
|
67 |
## Uses
|
68 |
|
69 |
+
None
|
70 |
|
71 |
### Out-of-Scope Use
|
72 |
|
73 |
This model won't work well for pretty much everything, probably.
|
74 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
75 |
#### Preprocessing
|
76 |
|
77 |
+
Format the OpenHermes 2.5 dataset with ChatML.
|
78 |
|
79 |
#### Training Hyperparameters
|
80 |
|
81 |
- **Training regime:** bf16 mixed precision
|
82 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
83 |
## Evaluation
|
84 |
|
85 |
I tried to run evals but the eval suite just laughed at me.
|
|
|
90 |
|
91 |
## Environmental Impact
|
92 |
|
93 |
+
- **Hardware Type:** 2 x NVIDIA RTX 3060 12GB
|
94 |
+
- **Hours used:** ~45 x 2.
|
95 |
+
- **Carbon Emitted:** [TBA]
|
|
|
|
|
96 |
|
97 |
### Compute Infrastructure
|
98 |
|
|
|
105 |
#### Software
|
106 |
|
107 |
The wonderful free stuff at HuggingFace (https://huggingface.co)[https://huggingface.co]: transformers, datasets, trl
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|