File size: 1,604 Bytes
c90b76f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
---
license: other
license_name: deepseek
license_link: https://github.com/deepseek-ai/DeepSeek-Math/blob/main/LICENSE-MODEL
---
v0.1
PRM Model adapted from: https://huggingface.co/deepseek-ai/deepseek-math-7b-rl
This is a process reward model mostly trained on a flattened version of PRM800k using LORA and merged back to full model.
### 1. How to Use
```python
prm_tokenizer = AutoTokenizer.from_pretrained("mukaj/deepseek-math-7b-rl-prm-v0.1")
prm_tokenizer.pad_token = prm_tokenizer.eos_token
prm_model = AutoModelForSequenceClassification.from_pretrained("mukaj/deepseek-math-7b-rl-prm-v0.1").eval()
encoded_inputs = [prm_tokenizer.encode(candidate, return_tensors="pt") for candidate in batch_candidates]
max_length = max([input_id.shape[1] for input_id in encoded_inputs]) # Find the longest sequence
padded_inputs = [
torch.nn.functional.pad(input_id, (0, max_length - input_id.size(1)), value=prm_tokenizer.pad_token_id) for
input_id in encoded_inputs]
input_ids = torch.cat(padded_inputs, dim=0).to("cuda")
with torch.no_grad():
outputs = prm_model(input_ids)
logits = outputs.logits[0]
scores = logits.softmax(dim=-1)
log_probs = scores.log()
```
### 2. License
This code repository is licensed under the MIT License. The use of DeepSeekMath models is subject to the Model License. DeepSeekMath supports commercial use.
See the [LICENSE-MODEL](https://github.com/deepseek-ai/DeepSeek-Math/blob/main/LICENSE-MODEL) for more details.
### 3. have any questions, please raise an issue or contact original team at [[email protected]](mailto:[email protected]).
|