RichardErkhov commited on
Commit
a39c90c
·
verified ·
1 Parent(s): 892972b

uploaded readme

Browse files
Files changed (1) hide show
  1. README.md +90 -0
README.md ADDED
@@ -0,0 +1,90 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Quantization made by Richard Erkhov.
2
+
3
+ [Github](https://github.com/RichardErkhov)
4
+
5
+ [Discord](https://discord.gg/pvy7H8DZMG)
6
+
7
+ [Request more models](https://github.com/RichardErkhov/quant_request)
8
+
9
+
10
+ starchat2-15b-sft-v0.1 - bnb 8bits
11
+ - Model creator: https://huggingface.co/HuggingFaceH4/
12
+ - Original model: https://huggingface.co/HuggingFaceH4/starchat2-15b-sft-v0.1/
13
+
14
+
15
+
16
+
17
+ Original model description:
18
+ ---
19
+ license: bigcode-openrail-m
20
+ base_model: bigcode/starcoder2-15b
21
+ tags:
22
+ - alignment-handbook
23
+ - generated_from_trainer
24
+ datasets:
25
+ - HuggingFaceH4/airoboros-3.2
26
+ - HuggingFaceH4/Code-Feedback
27
+ - HuggingFaceH4/orca-math-word-problems-200k
28
+ - HuggingFaceH4/SystemChat
29
+ - HuggingFaceH4/capybara
30
+ model-index:
31
+ - name: starcoder2-15b-sft-v5.0
32
+ results: []
33
+ ---
34
+
35
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
36
+ should probably proofread and complete it, then remove this comment. -->
37
+
38
+ # Model Card for starchat2-15b-sft-v0.1
39
+
40
+ This model is a fine-tuned version of [bigcode/starcoder2-15b](https://huggingface.co/bigcode/starcoder2-15b) on the HuggingFaceH4/airoboros-3.2, the HuggingFaceH4/Code-Feedback, the HuggingFaceH4/orca-math-word-problems-200k, the HuggingFaceH4/SystemChat and the HuggingFaceH4/capybara datasets.
41
+ It achieves the following results on the evaluation set:
42
+ - Loss: 0.6614
43
+
44
+ ## Model description
45
+
46
+ More information needed
47
+
48
+ ## Intended uses & limitations
49
+
50
+ More information needed
51
+
52
+ ## Training and evaluation data
53
+
54
+ More information needed
55
+
56
+ ## Training procedure
57
+
58
+ ### Training hyperparameters
59
+
60
+ The following hyperparameters were used during training:
61
+ - learning_rate: 2e-05
62
+ - train_batch_size: 8
63
+ - eval_batch_size: 8
64
+ - seed: 42
65
+ - distributed_type: multi-GPU
66
+ - num_devices: 16
67
+ - total_train_batch_size: 128
68
+ - total_eval_batch_size: 128
69
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
70
+ - lr_scheduler_type: cosine
71
+ - lr_scheduler_warmup_ratio: 0.1
72
+ - num_epochs: 3
73
+
74
+ ### Training results
75
+
76
+ | Training Loss | Epoch | Step | Validation Loss |
77
+ |:-------------:|:-----:|:----:|:---------------:|
78
+ | 0.6422 | 1.0 | 910 | 0.6910 |
79
+ | 0.5701 | 2.0 | 1820 | 0.6639 |
80
+ | 0.5227 | 3.0 | 2730 | 0.6614 |
81
+
82
+
83
+ ### Framework versions
84
+
85
+ - Transformers 4.39.0.dev0
86
+ - Pytorch 2.1.2+cu121
87
+ - Datasets 2.16.1
88
+ - Tokenizers 0.15.1
89
+
90
+