File size: 5,984 Bytes
ee75241 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 |
---
license: apache-2.0
language:
- en
- de
- es
- fr
tags:
- ctranslate2
- int8
- float16
- sft
pipeline_tag: text-generation
widget:
- text: >-
<|prompter|>What is a meme, and what's the history behind this
word?<|endoftext|><|assistant|>
- text: <|prompter|>What's the Earth total population<|endoftext|><|assistant|>
- text: >-
<|prompter|>Write a story about future of AI
development<|endoftext|><|assistant|>
datasets:
- OpenAssistant/oasst1
library_name: transformers
---
# # Fast-Inference with Ctranslate2
Speedup inference while reducing memory by 2x-4x using int8 inference in C++ on CPU or GPU.
quantized version of [OpenAssistant/falcon-7b-sft-top1-696](https://huggingface.co/OpenAssistant/falcon-7b-sft-top1-696)
```bash
pip install hf-hub-ctranslate2>=2.10.0 ctranslate2>=3.16.0
```
```python
# from transformers import AutoTokenizer
model_name = "michaelfeil/ct2fast-falcon-7b-sft-top1-696"
from hf_hub_ctranslate2 import GeneratorCT2fromHfHub
model = GeneratorCT2fromHfHub(
# load in int8 on CUDA
model_name_or_path=model_name,
device="cuda",
compute_type="int8_float16",
# tokenizer=AutoTokenizer.from_pretrained("{ORG}/{NAME}")
)
outputs = model.generate(
text=["def fibonnaci(", "User: How are you doing? Bot:"],
max_length=64,
include_prompt_in_result=False
)
print(outputs)
```
Checkpoint compatible to [ctranslate2>=3.16.0](https://github.com/OpenNMT/CTranslate2)
and [hf-hub-ctranslate2>=2.10.0](https://github.com/michaelfeil/hf-hub-ctranslate2)
- `compute_type=int8_float16` for `device="cuda"`
- `compute_type=int8` for `device="cpu"`
Converted on 2023-06-16 using
```
ct2-transformers-converter --model OpenAssistant/falcon-7b-sft-top1-696 --output_dir ~/tmp-ct2fast-falcon-7b-sft-top1-696 --force --copy_files tokenizer.json README.md tokenizer_config.json generation_config.json special_tokens_map.json .gitattributes --quantization int8_float16 --trust_remote_code
```
# Licence and other remarks:
This is just a quantized version. Licence conditions are intended to be idential to original huggingface repo.
# Original description
# Open-Assistant Falcon 7B SFT OASST-TOP1 Model
This model is a fine-tuning of TII's [Falcon 7B](https://huggingface.co/tiiuae/falcon-7b) LLM.
It was trained with 11,123 top-1 (high-quality) demonstrations of the OASST data set (exported on June 2, 2023) with a batch size of 128 for 8 epochs with LIMA style dropout (p=0.2) and a context-length of 2048 tokens.
## Model Details
- **Finetuned from:** [tiiuae/falcon-7b](https://huggingface.co/tiiuae/falcon-7b)
- **Model type:** Causal decoder-only transformer language model
- **Language:** English, German, Spanish, French (and limited capabilities in Italian, Portuguese, Polish, Dutch, Romanian, Czech, Swedish);
- **Weights & Biases:** [Training log](https://wandb.ai/open-assistant/public-sft/runs/25apbcld) (Checkpoint: 696 steps)
- **Code:** [Open-Assistant/model/model_training](https://github.com/LAION-AI/Open-Assistant/tree/main/model/model_training)
- **Demo:** [Continuations for 250 random prompts](https://open-assistant.github.io/oasst-model-eval/?f=https%3A%2F%2Fraw.githubusercontent.com%2FOpen-Assistant%2Foasst-model-eval%2Fmain%2Fsampling_reports%2Fchat-gpt%2F2023-04-11_gpt-3.5-turbo_lottery.json%0Ahttps%3A%2F%2Fraw.githubusercontent.com%2FOpen-Assistant%2Foasst-model-eval%2Fmain%2Fsampling_reports%2Foasst-sft%2F2023-06-05_OpenAssistant_falcon-7b-sft-top1-696_sampling_noprefix2.json)
- **License:** Apache 2.0
- **Contact:** [Open-Assistant Discord](https://ykilcher.com/open-assistant-discord)
## Prompting
Two special tokens are used to mark the beginning of user and assistant turns:
`<|prompter|>` and `<|assistant|>`. Each turn ends with a `<|endoftext|>` token.
Input prompt example:
```
<|prompter|>What is a meme, and what's the history behind this word?<|endoftext|><|assistant|>
```
The input ends with the `<|assistant|>` token to signal that the model should
start generating the assistant reply.
## Sample Code
```python
from transformers import AutoTokenizer
import transformers
import torch
model = "OpenAssistant/falcon-7b-sft-top1-696"
tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
device_map="auto",
)
input_text="<|prompter|>What is a meme, and what's the history behind this word?<|endoftext|><|assistant|>"
sequences = pipeline(
input_text,
max_length=500,
do_sample=True,
return_full_text=False,
top_k=10,
num_return_sequences=1,
eos_token_id=tokenizer.eos_token_id,
)
for seq in sequences:
print(f"Result: {seq['generated_text']}")
```
## Configuration Details
Model:
```
falcon-7b:
dtype: bf16
log_dir: "falcon_log_7b"
learning_rate: 1e-5
model_name: "tiiuae/falcon-7b"
deepspeed_config: configs/zero_config.json
output_dir: falcon
weight_decay: 0.0
max_length: 2048
save_strategy: steps
eval_steps: 80
save_steps: 80
warmup_steps: 20
gradient_checkpointing: true
gradient_accumulation_steps: 4
per_device_train_batch_size: 4
per_device_eval_batch_size: 8
num_train_epochs: 8
save_total_limit: 4
residual_dropout: 0.2
residual_dropout_lima: true
```
Dataset:
```
oasst-top1:
# oasst_export: 11123 (100.00%)
datasets:
- oasst_export:
lang: "bg,ca,cs,da,de,en,es,fr,hr,hu,it,nl,pl,pt,ro,ru,sl,sr,sv,uk" # sft-8.0
input_file_path: 2023-06-02_oasst_all_labels.jsonl.gz
val_split: 0.05
top_k: 1
```
Train command:
```
deepspeed trainer_sft.py --configs defaults falcon-7b oasst-top1 --cache_dir <data_cache_dir> --output_dir <output_path> --deepspeed
```
Export command:
```
python export_model.py --dtype bf16 --hf_repo_name OpenAssistant/falcon-7b-sft-top1 --trust_remote_code --auth_token <auth_token> <output_path> --max_shard_size 2GB
``` |