antoinelouis
/

biencoder-distilcamembert-mmarcoFR

Sentence Similarity

sentence-transformers

passage-retrieval

text-embeddings-inference

Inference Endpoints

Model card Files Files and versions Community

antoinelouis commited on Mar 22, 2024

Commit

0114c8e

·

verified ·

1 Parent(s): fd63314

Update README.md

Files changed (1) hide show

README.md +2 -1

README.md CHANGED Viewed

@@ -9,6 +9,7 @@ metrics:
 tags:
 - passage-retrieval
 library_name: sentence-transformers
 model-index:
 - name: biencoder-distilcamembert-mmarcoFR
   results:
@@ -148,7 +149,7 @@ We use the French training samples from the [mMARCO](https://huggingface.co/data
 #### Implementation
-The model is initialized from the [distilcamembert-base](https://huggingface.co/cmarkea/distilcamembert-base) checkpoint and optimized via the cross-entropy loss (as in [DPR](https://doi.org/10.48550/arXiv.2004.04906)) with a temperature of 0.05. It is fine-tuned on one 32GB NVIDIA V100 GPU for 20 epochs (i.e., 65.7k steps) using the AdamW optimizer with a batch size of 152, a peak learning rate of 2e-5 with warm up along the first 500 steps and linear scheduling. We set the maximum sequence lengths for both the questions and passages to 128 tokens. We use the cosine similarity to compute relevance scores.
 ***

 tags:
 - passage-retrieval
 library_name: sentence-transformers
+base_model: cmarkea/distilcamembert-base
 model-index:
 - name: biencoder-distilcamembert-mmarcoFR
   results:
 #### Implementation
+The model is initialized from the [cmarkea/distilcamembert-base](https://huggingface.co/cmarkea/distilcamembert-base) checkpoint and optimized via the cross-entropy loss (as in [DPR](https://doi.org/10.48550/arXiv.2004.04906)) with a temperature of 0.05. It is fine-tuned on one 32GB NVIDIA V100 GPU for 20 epochs (i.e., 65.7k steps) using the AdamW optimizer with a batch size of 152, a peak learning rate of 2e-5 with warm up along the first 500 steps and linear scheduling. We set the maximum sequence lengths for both the questions and passages to 128 tokens. We use the cosine similarity to compute relevance scores.
 ***