This model is fine tuned with The Latin Library - 15M Token

The dataset was cleaned:

  • Removal of all "pseudo-Latin" text ("Lorem ipsum ...").
  • Use of CLTK for sentence splitting and normalisation.
  • deduplication of the corpus
  • lowercase all text
Downloads last month
148
Safetensors
Model size
124M params
Tensor type
F32
ยท
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Model tree for Cicciokr/Roberta-Base-Latin-Uncased

Finetuned
(2)
this model

Space using Cicciokr/Roberta-Base-Latin-Uncased 1