IIC
/

Text Generation
Transformers
Safetensors
Spanish
qwen2
chat
conversational
text-generation-inference
Inference Endpoints
gonzalo-santamaria-iic commited on
Commit
91f7ce1
·
verified ·
1 Parent(s): 47c08d8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -6
README.md CHANGED
@@ -130,15 +130,11 @@ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
130
 
131
  ### Training Data
132
 
133
- A combination of both public and private datasets, the latter designed in the IIC. The dataset consists of 21975 conversations in Spanish, with the format `chatml`. Each conversation has two variants: `chosen` and `rejected`, where the only thing that changes is the last answer of the assistant. The last answer in the `chosen` variant is considered a better answer than the one in the `rejected` variant. Different techniques have been used to generate the dataset, which we explain in depth in the paper (**coming soon**).
134
 
135
  ### Training Procedure
136
 
137
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
138
-
139
- #### Preprocessing [optional]
140
-
141
- [More Information Needed]
142
 
143
 
144
  #### Training Hyperparameters
 
130
 
131
  ### Training Data
132
 
133
+ A combination of both public and private datasets designed in the IIC. The dataset consists of 21975 conversations in Spanish, with the format `chatml` and has the same structure as the [Anthropic/hh-rlhf dataset](https://huggingface.co/datasets/Anthropic/hh-rlhf). Each conversation has two variants: `chosen` and `rejected`, where the only thing that changes is the last answer of the assistant. The last answer in the `chosen` variant is considered a better answer than the one in the `rejected` variant. Different techniques have been used to generate the dataset, which we explain in depth in the paper (**coming soon**).
134
 
135
  ### Training Procedure
136
 
137
+ We use the [Transformer Reinforcement Learning](https://huggingface.co/docs/trl/index) (TRL) library. Specifically, we have applied [the script they have published](https://github.com/huggingface/trl/blob/main/examples/scripts/dpo.py) as an example for using DPO to the dataset we have generated.
 
 
 
 
138
 
139
 
140
  #### Training Hyperparameters