Fine-tuning the titanet-large model
Hi, I am working on a project which uses the titanet model to compute embeddings for audio wav files but the results aren't that good and wanted to know how I could fine-tune the model to achieve better results on my dataset
Some notes on finetuning can be found here: https://github.com/NVIDIA/NeMo/blob/main/tutorials/speaker_tasks/Speaker_Identification_Verification.ipynb
Hi @nithinraok ,
Hope you are doing great. I have fine-tuned model using common voice dataset (turkish) https://huggingface.co/pgwi/en_tr_titanet_large. But I am struggling to find out how to evaluate the EER and WER. Do you have some refs for me to learn. Please let me know. Thank you very much.
you may refer to the evaluation of voxceleb EER in this script: https://github.com/NVIDIA/NeMo/blob/main/examples/speaker_tasks/recognition/voxceleb_eval.py