File size: 6,070 Bytes

---
license: mit
datasets:
- openai/webgpt_comparisons
- openai/summarize_from_feedback
- Anthropic/hh-rlhf
language:
- en
---

# Reward model on deberta-v2-xxlarge (1.5B)

Reward model used in RLHF which is trained on webgpt, summarize from human feedback and Open Assistant user ranked dataset

# Model Details

## Model Description

- **Developed by:** [More Information Needed]
- **Shared by [optional]:** [More Information Needed]
- **Model type:** [More Information Needed]
- **Language(s) (NLP):** [More Information Needed]
- **License:** [More Information Needed]
- **Finetuned from model [optional]:** [More Information Needed]

## Model Sources [optional]

<!-- Provide the basic links for the model. -->

- **Repository:** [Open Assistant](https://github.com/LAION-AI/Open-Assistant)
- **Paper :** [Instruct GPT](https://cdn.openai.com/papers/Training_language_models_to_follow_instructions_with_human_feedback.pdf) : We try to replicate as close as we can on our hardware and existing datasets
- **Demo [optional]:** [More Information Needed]

# Uses

This model was trained with human feedback comparison examples, which penalize bad or rude sentence with lower scores.

## Direct Use

```
model_name = 'theblackcat102/deberta-v2-xxlarge-rm'
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
prompt = "I just got out of prison, any suggestion?"
good_helpful = "I am sorry to hear about it, it must be a hard time inside"
bad_text = "Stay away from me, you scumbag convict"
pos = tokenizer(prompt, good_helpful, return_tensors='pt')
neg = tokenizer(prompt, bad_text, return_tensors='pt')
pos_score = model(**pos).logits[0]
neg_score = model(**neg).logits[0]
print(pos_score, neg_score)
>> tensor([-1.3449], grad_fn=<SelectBackward0>) tensor([-2.0942], grad_fn=<SelectBackward0>)
```



## Downstream Use [optional]

<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->

[More Information Needed]

## Out-of-Scope Use

<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->

[More Information Needed]

# Bias, Risks, and Limitations

<!-- This section is meant to convey both technical and sociotechnical limitations. -->

[More Information Needed]

## Recommendations

How to use it as a rank function

```python
def divide_chunks(l, n):    
    # looping till length l
    for i in range(0, len(l), n):
        yield l[i:i + n]
 
@torch.no_grad()
def rank_model_fn(samples, **kwargs):
    output_scores = []
    for chunk_samples in divide_chunks(samples, 16):
        is_empty = []
        prefixes, postfixes = [], []
        for sample in chunk_samples:
            prefix, postfix = sample.split('[SEP]')
            postfix = postfix.strip()
            if len(postfix) == 0 or len(set(postfix)) <= 3:
                is_empty.append(True)
            else:
                is_empty.append(False)
            postfixes.append(postfix)
            prefixes.append(prefix)
        is_empty = np.array(is_empty)
        inputs = rank_tokenizer(prefixes, postfixes, return_tensors="pt", padding=True)
        inputs.pop("token_type_ids", None)
        inputs =  { key: tensor.cuda() for key, tensor in inputs.items() }
        scores = rank_model(**inputs).logits[:, 0].detach().cpu()
        scores[is_empty] = -4
        output_scores += [ s for s in scores ]
    return torch.from_numpy(np.array(output_scores))
```

## How to Get Started with the Model

Use the code below to get started with the model.

[More Information Needed]

# Training Details


## Training Procedure 

checkout our training repo [here](https://github.com/LAION-AI/Open-Assistant/tree/main/model/reward/instructor)


### Preprocessing [optional]

[More Information Needed]


### Training Hyperparameters

```yaml
model_name: microsoft/deberta-v2-xxlarge
learning_rate: 2e-6
scheduler: cosine
gradient_checkpointing: false
gradient_accumulation_steps: 12
per_device_train_batch_size: 1
per_device_eval_batch_size: 4
warmup_steps: 600
eval_steps: 1000000
save_steps: 1000
max_length: 512
num_train_epochs: 2
datasets:
  - webgpt
  - hfsummary
  - anthropic_rlhf
  - oa_private
```

### Speeds, Sizes, Times [optional]

Trained on 8 A100 80G model, since we are using the same batch strategy as InstructGPT, using a batch_size of 1 actually equals to (N-1) batch where N refers to number of negative examples. Which is why I recommend using the largest VRAM GPU you can find to train this model.

# Evaluation

<!-- This section describes the evaluation protocols and provides the results. -->

## Testing Data, Factors & Metrics

### Testing Data

<!-- This should link to a Data Card if possible. -->

[More Information Needed]

### Factors

<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->

[More Information Needed]

### Metrics

<!-- These are the evaluation metrics being used, ideally with a description of why. -->

[More Information Needed]

## Results

[More Information Needed]

### Summary



# Model Examination [optional]

<!-- Relevant interpretability work for the model goes here -->

[More Information Needed]


# Technical Specifications [optional]

## Model Architecture and Objective

[More Information Needed]

## Compute Infrastructure

[More Information Needed]

### Hardware

[More Information Needed]

### Software

[More Information Needed]

# Citation [optional]

<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->

**BibTeX:**

[More Information Needed]

**APA:**

[More Information Needed]

# Glossary [optional]

<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->

[More Information Needed]

# More Information [optional]

[More Information Needed]

# Model Card Authors [optional]

[More Information Needed]

# Model Card Contact

[More Information Needed]