|
--- |
|
license: mit |
|
datasets: |
|
- openai/webgpt_comparisons |
|
- openai/summarize_from_feedback |
|
- Anthropic/hh-rlhf |
|
language: |
|
- en |
|
--- |
|
|
|
# Reward model on deberta-v2-xxlarge (1.5B) |
|
|
|
Reward model used in RLHF which is trained on webgpt, summarize from human feedback and Open Assistant user ranked dataset |
|
|
|
# Model Details |
|
|
|
## Model Description |
|
|
|
- **Developed by:** [More Information Needed] |
|
- **Shared by [optional]:** [More Information Needed] |
|
- **Model type:** [More Information Needed] |
|
- **Language(s) (NLP):** [More Information Needed] |
|
- **License:** [More Information Needed] |
|
- **Finetuned from model [optional]:** [More Information Needed] |
|
|
|
## Model Sources [optional] |
|
|
|
<!-- Provide the basic links for the model. --> |
|
|
|
- **Repository:** [Open Assistant](https://github.com/LAION-AI/Open-Assistant) |
|
- **Paper :** [Instruct GPT](https://cdn.openai.com/papers/Training_language_models_to_follow_instructions_with_human_feedback.pdf) : We try to replicate as close as we can on our hardware and existing datasets |
|
- **Demo [optional]:** [More Information Needed] |
|
|
|
# Uses |
|
|
|
This model was trained with human feedback comparison examples, which penalize bad or rude sentence with lower scores. |
|
|
|
## Direct Use |
|
|
|
``` |
|
model_name = 'theblackcat102/deberta-v2-xxlarge-rm' |
|
model = AutoModelForSequenceClassification.from_pretrained(model_name) |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
prompt = "I just got out of prison, any suggestion?" |
|
good_helpful = "I am sorry to hear about it, it must be a hard time inside" |
|
bad_text = "Stay away from me, you scumbag convict" |
|
pos = tokenizer(prompt, good_helpful, return_tensors='pt') |
|
neg = tokenizer(prompt, bad_text, return_tensors='pt') |
|
pos_score = model(**pos).logits[0] |
|
neg_score = model(**neg).logits[0] |
|
print(pos_score, neg_score) |
|
>> tensor([-1.3449], grad_fn=<SelectBackward0>) tensor([-2.0942], grad_fn=<SelectBackward0>) |
|
``` |
|
|
|
|
|
|
|
## Downstream Use [optional] |
|
|
|
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app --> |
|
|
|
[More Information Needed] |
|
|
|
## Out-of-Scope Use |
|
|
|
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. --> |
|
|
|
[More Information Needed] |
|
|
|
# Bias, Risks, and Limitations |
|
|
|
<!-- This section is meant to convey both technical and sociotechnical limitations. --> |
|
|
|
[More Information Needed] |
|
|
|
## Recommendations |
|
|
|
How to use it as a rank function |
|
|
|
```python |
|
def divide_chunks(l, n): |
|
# looping till length l |
|
for i in range(0, len(l), n): |
|
yield l[i:i + n] |
|
|
|
@torch.no_grad() |
|
def rank_model_fn(samples, **kwargs): |
|
output_scores = [] |
|
for chunk_samples in divide_chunks(samples, 16): |
|
is_empty = [] |
|
prefixes, postfixes = [], [] |
|
for sample in chunk_samples: |
|
prefix, postfix = sample.split('[SEP]') |
|
postfix = postfix.strip() |
|
if len(postfix) == 0 or len(set(postfix)) <= 3: |
|
is_empty.append(True) |
|
else: |
|
is_empty.append(False) |
|
postfixes.append(postfix) |
|
prefixes.append(prefix) |
|
is_empty = np.array(is_empty) |
|
inputs = rank_tokenizer(prefixes, postfixes, return_tensors="pt", padding=True) |
|
inputs.pop("token_type_ids", None) |
|
inputs = { key: tensor.cuda() for key, tensor in inputs.items() } |
|
scores = rank_model(**inputs).logits[:, 0].detach().cpu() |
|
scores[is_empty] = -4 |
|
output_scores += [ s for s in scores ] |
|
return torch.from_numpy(np.array(output_scores)) |
|
``` |
|
|
|
## How to Get Started with the Model |
|
|
|
Use the code below to get started with the model. |
|
|
|
[More Information Needed] |
|
|
|
# Training Details |
|
|
|
|
|
## Training Procedure |
|
|
|
checkout our training repo [here](https://github.com/LAION-AI/Open-Assistant/tree/main/model/reward/instructor) |
|
|
|
|
|
### Preprocessing [optional] |
|
|
|
[More Information Needed] |
|
|
|
|
|
### Training Hyperparameters |
|
|
|
```yaml |
|
model_name: microsoft/deberta-v2-xxlarge |
|
learning_rate: 2e-6 |
|
scheduler: cosine |
|
gradient_checkpointing: false |
|
gradient_accumulation_steps: 12 |
|
per_device_train_batch_size: 1 |
|
per_device_eval_batch_size: 4 |
|
warmup_steps: 600 |
|
eval_steps: 1000000 |
|
save_steps: 1000 |
|
max_length: 512 |
|
num_train_epochs: 2 |
|
datasets: |
|
- webgpt |
|
- hfsummary |
|
- anthropic_rlhf |
|
- oa_private |
|
``` |
|
|
|
### Speeds, Sizes, Times [optional] |
|
|
|
Trained on 8 A100 80G model, since we are using the same batch strategy as InstructGPT, using a batch_size of 1 actually equals to (N-1) batch where N refers to number of negative examples. Which is why I recommend using the largest VRAM GPU you can find to train this model. |
|
|
|
# Evaluation |
|
|
|
<!-- This section describes the evaluation protocols and provides the results. --> |
|
|
|
## Testing Data, Factors & Metrics |
|
|
|
### Testing Data |
|
|
|
<!-- This should link to a Data Card if possible. --> |
|
|
|
[More Information Needed] |
|
|
|
### Factors |
|
|
|
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. --> |
|
|
|
[More Information Needed] |
|
|
|
### Metrics |
|
|
|
<!-- These are the evaluation metrics being used, ideally with a description of why. --> |
|
|
|
[More Information Needed] |
|
|
|
## Results |
|
|
|
[More Information Needed] |
|
|
|
### Summary |
|
|
|
|
|
|
|
# Model Examination [optional] |
|
|
|
<!-- Relevant interpretability work for the model goes here --> |
|
|
|
[More Information Needed] |
|
|
|
|
|
# Technical Specifications [optional] |
|
|
|
## Model Architecture and Objective |
|
|
|
[More Information Needed] |
|
|
|
## Compute Infrastructure |
|
|
|
[More Information Needed] |
|
|
|
### Hardware |
|
|
|
[More Information Needed] |
|
|
|
### Software |
|
|
|
[More Information Needed] |
|
|
|
# Citation [optional] |
|
|
|
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. --> |
|
|
|
**BibTeX:** |
|
|
|
[More Information Needed] |
|
|
|
**APA:** |
|
|
|
[More Information Needed] |
|
|
|
# Glossary [optional] |
|
|
|
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. --> |
|
|
|
[More Information Needed] |
|
|
|
# More Information [optional] |
|
|
|
[More Information Needed] |
|
|
|
# Model Card Authors [optional] |
|
|
|
[More Information Needed] |
|
|
|
# Model Card Contact |
|
|
|
[More Information Needed] |