agi-css commited on
Commit
6c06597
·
1 Parent(s): 173680c

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +48 -0
README.md ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - alexl83/AlpacaDataCleaned
5
+ - sahil2801/CodeAlpaca-20k
6
+ language:
7
+ - en
8
+ library_name: transformers
9
+ pipeline_tag: text-generation
10
+ tags:
11
+ - rlhf
12
+ - alignment
13
+ - simulation
14
+ - computational social science
15
+ ---
16
+
17
+
18
+ # Model Card for So(cially)-Good LM
19
+
20
+ ![model image](https://agwarbliu.s3.amazonaws.com/logo.png)
21
+
22
+ ![model image](https://agwarbliu.s3.amazonaws.com/model_select_base.png)
23
+
24
+
25
+ **Fast, Effective, and Stable alternative of RLHF!**
26
+
27
+ **Instead of training an additional reward model that is likely to be gamed, we directly train the model on the social games!** 🕹️ 🎲 🎮
28
+
29
+ Full details on simulation and training can be found [here](https://github.com/agi-templar/Stable-Alignment).
30
+
31
+ # Training Procedure
32
+
33
+ This is the very beginning of Stable Alignment project, which is an enhanced instruction tuning model based on LLaMA.
34
+
35
+ We improve:
36
+
37
+ - Instruction tuning data quality, by using [AlpacaDataCleaned](https://github.com/gururise/AlpacaDataCleaned), which fixes many errors in original Alpaca dataset.
38
+
39
+ - Code pretraining with [codealpaca](https://github.com/sahil280114/codealpaca).
40
+
41
+ We use the [Alpaca fine-tuning script](https://github.com/tatsu-lab/stanford_alpaca) to train this model.
42
+
43
+
44
+ # Bias, Risks, and Limitations
45
+
46
+ Although this project aims to better align current LMs with social norms, inappropriate content and inherent biases in the training data will still impair the alignment of the model.
47
+
48
+ The model should not be used directly in any application, without a prior assessment of safety and fairness concerns specific to the application.