nomadicsynth commited on
Commit
dde2182
·
verified ·
1 Parent(s): 281ce79

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -44
README.md CHANGED
@@ -24,7 +24,7 @@ Just doing it to see what happens.
24
 
25
  It'll take about 40 to 45 hours to train on two Nvidia RTX 3060 12GB.
26
 
27
- It uses ChatML for the chat template, but I fucked up the template in the dataset,
28
  using '<|im_start|>human' instead of '<|im_start|>user'. ¯\_(ツ)_/¯
29
  So, here's the bits:
30
 
@@ -56,57 +56,30 @@ So, here's the bits:
56
  - **Shared by:** RoboApocalypse
57
  - **Model type:** Mistral
58
  - **Language(s) (NLP):** English, maybe others I dunno
59
- - **License:** OpenRAIL, IDGAF
60
 
61
  ### Model Sources
62
 
63
  Exclusively available right here on HuggingFace!
64
 
65
  - **Repository:** https://huggingface.co/neoncortex/mini-mistral-openhermes-2.5-chatml-test
66
- - **Paper:** LoL
67
- - **Demo:** Just download it in Oobabooga and use the modified chatML template above. Maybe I'll throw together a Space or something.
68
 
69
  ## Uses
70
 
71
- If you wanna have a laugh at how bad it is then go ahead, but I wouldn't expect much from it.
72
 
73
  ### Out-of-Scope Use
74
 
75
  This model won't work well for pretty much everything, probably.
76
 
77
- ## How to Get Started with the Model
78
-
79
- Use the code below to get started with the model.
80
-
81
- [More Information Needed]
82
-
83
- ## Training Details
84
-
85
- ### Training Data
86
-
87
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
88
-
89
- [More Information Needed]
90
-
91
- ### Training Procedure
92
-
93
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
94
-
95
  #### Preprocessing
96
 
97
- I took the OpenHermes 2.5 dataset and formatted it with ChatML.
98
 
99
  #### Training Hyperparameters
100
 
101
  - **Training regime:** bf16 mixed precision
102
 
103
- #### Speeds, Sizes, Times
104
-
105
- epochs: 9
106
- steps: 140976
107
- batches per device: 6
108
- 1.04it/s
109
-
110
  ## Evaluation
111
 
112
  I tried to run evals but the eval suite just laughed at me.
@@ -117,11 +90,9 @@ Don't be rude.
117
 
118
  ## Environmental Impact
119
 
120
- - **Hardware Type:** I already told you. Try and keep up.
121
- - **Hours used:** ~45 x 2 I guess.
122
- - **Cloud Provider:** RoboApocalypse
123
- - **Compute Region:** myob
124
- - **Carbon Emitted:** Yes, definitely
125
 
126
  ### Compute Infrastructure
127
 
@@ -134,11 +105,3 @@ I trained it on my PC with no side on it because I like to watch the GPUs do the
134
  #### Software
135
 
136
  The wonderful free stuff at HuggingFace (https://huggingface.co)[https://huggingface.co]: transformers, datasets, trl
137
-
138
- ## Model Card Authors
139
-
140
- RoboApocalypse, unless you're offended by something, in which case it was hacked by hackers.
141
-
142
- ## Model Card Contact
143
-
144
- If you want to send me insults come find me on Reddit I guess.
 
24
 
25
  It'll take about 40 to 45 hours to train on two Nvidia RTX 3060 12GB.
26
 
27
+ It uses ChatML for the chat template, but I messed up the template in the dataset,
28
  using '<|im_start|>human' instead of '<|im_start|>user'. ¯\_(ツ)_/¯
29
  So, here's the bits:
30
 
 
56
  - **Shared by:** RoboApocalypse
57
  - **Model type:** Mistral
58
  - **Language(s) (NLP):** English, maybe others I dunno
59
+ - **License:** OpenRAIL
60
 
61
  ### Model Sources
62
 
63
  Exclusively available right here on HuggingFace!
64
 
65
  - **Repository:** https://huggingface.co/neoncortex/mini-mistral-openhermes-2.5-chatml-test
 
 
66
 
67
  ## Uses
68
 
69
+ None
70
 
71
  ### Out-of-Scope Use
72
 
73
  This model won't work well for pretty much everything, probably.
74
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
75
  #### Preprocessing
76
 
77
+ Format the OpenHermes 2.5 dataset with ChatML.
78
 
79
  #### Training Hyperparameters
80
 
81
  - **Training regime:** bf16 mixed precision
82
 
 
 
 
 
 
 
 
83
  ## Evaluation
84
 
85
  I tried to run evals but the eval suite just laughed at me.
 
90
 
91
  ## Environmental Impact
92
 
93
+ - **Hardware Type:** 2 x NVIDIA RTX 3060 12GB
94
+ - **Hours used:** ~45 x 2.
95
+ - **Carbon Emitted:** [TBA]
 
 
96
 
97
  ### Compute Infrastructure
98
 
 
105
  #### Software
106
 
107
  The wonderful free stuff at HuggingFace (https://huggingface.co)[https://huggingface.co]: transformers, datasets, trl