IIC
/

RigoChat-7b-v2

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

gonzalo-santamaria-iic commited on Nov 26, 2024

Commit

91f7ce1

·

verified ·

1 Parent(s): 47c08d8

Update README.md

Files changed (1) hide show

README.md +2 -6

README.md CHANGED Viewed

@@ -130,15 +130,11 @@ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
 ### Training Data
-A combination of both public and private datasets, the latter designed in the IIC. The dataset consists of 21975 conversations in Spanish, with the format `chatml`. Each conversation has two variants: `chosen` and `rejected`, where the only thing that changes is the last answer of the assistant. The last answer in the `chosen` variant is considered a better answer than the one in the `rejected` variant. Different techniques have been used to generate the dataset, which we explain in depth in the paper (**coming soon**).
 ### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
 #### Training Hyperparameters

 ### Training Data
+A combination of both public and private datasets designed in the IIC. The dataset consists of 21975 conversations in Spanish, with the format `chatml` and has the same structure as the [Anthropic/hh-rlhf dataset](https://huggingface.co/datasets/Anthropic/hh-rlhf). Each conversation has two variants: `chosen` and `rejected`, where the only thing that changes is the last answer of the assistant. The last answer in the `chosen` variant is considered a better answer than the one in the `rejected` variant. Different techniques have been used to generate the dataset, which we explain in depth in the paper (**coming soon**).
 ### Training Procedure
+We use the [Transformer Reinforcement Learning](https://huggingface.co/docs/trl/index) (TRL) library. Specifically, we have applied [the script they have published](https://github.com/huggingface/trl/blob/main/examples/scripts/dpo.py) as an example for using DPO to the dataset we have generated.
 #### Training Hyperparameters