gptj-soda-chai / README.md
pvduy's picture
Update README.md
6d7a231
# Training
This is the 10k steps English supervised-fine-tuning (SFT) model of GPT-J using SODA dataset for Chai Competition.
- **Language:** English
- **Finetuned from:** [EleutherAI / GPT-J](https://huggingface.co/EleutherAI/gpt-j-6b)
- **Code:** [Open-Assistant/model/model_training](https://github.com/LAION-AI/Open-Assistant/tree/main/model/model_training)
- **Dataset:** 10 percent from [SODA dataset](https://huggingface.co/datasets/allenai/soda)
# Why OpenAssistant framework:
- Easy to setup training with change config from dataset and model is all you need
- Data processing available for almost popular conversation datasets: SODA, Vicuna, OpenAssistant, ...
# Configuration:
You need to add this to default config file `configs/config.yaml`
```
data:
soda-only:
datasets:
- soda:
fraction: 0.1
input_max_length: 1024
```
```
gptj-chai:
dtype: fp16
log_dir: gptj-soda
model_name: EleutherAI/gpt-j-6b
output_dir: output/gptj-soda-chai
max_length: 1024
warmup_steps: 100
gradient_checkpointing: true
gradient_accumulation_steps: 1
per_device_train_batch_size: 8
per_device_eval_batch_size: 8
eval_steps: 5000
save_steps: 5000
num_train_epochs: 1
save_total_limit: 1
use_flash_attention: false
```
# Command to train:
```bash
deepspeed trainer_sft.py --local_rank=0 --configs defaults gptj-chai soda-only --cache_dir data_cache --deepspeed
```
# Interactive Demo Code:
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
class ChatBot():
def __init__(self, path="/mnt/hdd/duyphung/gptj-soda-chai/checkpoint-10000/"):
self.tokenizer = AutoTokenizer.from_pretrained(path)
self.model = AutoModelForCausalLM.from_pretrained(path).half().cuda().eval()
self.model.pad_token_id = self.tokenizer.eos_token_id
self.tokenizer.pad_token_id = self.tokenizer.eos_token_id
def chat(self, message):
enc_dict = self.tokenizer(
message,
return_tensors='pt'
)
for x in enc_dict:
enc_dict[x] = enc_dict[x].cuda()
chat_history_ids = self.model.generate(
input_ids=enc_dict['input_ids'],
attention_mask=enc_dict['attention_mask'],
max_new_tokens=64,
temperature=0.7,
do_sample=True,
top_k=0,
top_p=0.95,
)
out = chat_history_ids[:, enc_dict['input_ids'].shape[-1]:][0]
return self.tokenizer.decode(out, skip_special_tokens=True)
if __name__ == "__main__":
bot_name = 'Bot:'
prompt = "<|prompter|>"
chat_history = []
bot = ChatBot()
while True:
message = input("Me: ")
chat_history.append(f'Me: {message}')
prompt = prompt + message + "<|endoftext|><|assistant|>"
response = bot.chat(prompt)
print(f'{bot_name} {response}')
prompt = prompt + response + "<|endoftext|><|prompter|>"
```