--- datasets: - armvectores/hy_wikipedia_2023 library_name: fasttext pipeline_tag: feature-extraction --- 414M tokens 1) 73M hy wikipedia 2) 341M arlis database 74951 unique words 3-5 ngrams 5 window length minimum number of words 150 100 epochs, 0.05 start lr 26 hours on 20 xeon gold cores