Using COQUI-AI TTS repo https://github.com/coqui-ai/TTS
the default spanish VITS model is in .tar format, so I used the finetuning
functionality but with only 1 epoch and all learning rates in 0.0

- Clone repo
- follow instructions for installation, currently doing cd /repo/path && pip install -e .[all,dev,notebooks]
- Generate any text with vits spanish model to download the model, currenty: tts --model_name tts/es/cs100/vits --text "hola hola hola" --out_path /anywhere/
- Go to directory of downloaded model, normally /user/.local/tts/vits
- copy config.json and model.pth.tar
- customize config.json to train only 1 epoch and all lr to 0.0 (including lr_gen & lr_disc) and settings for your database, even when you don't want to train, you need some dataset to act as if it was going to train
- finetune, currently: CUDA_VISIBLE_DEVICES="0" python /path/to/repo/TTS/TTS/bin/train_tts.py --config_path /path/to/custom/config/config.json --restore_path /path/to/model/model_file.pth.tar --use_cuda True

Some troubles I found and how I solved them:

- If training throws error of "Vits has no disc", on confing.json set initialize_disc to true.
- Always needs at least one file for evaluation, so set eval_split_size so that it gets at least one file, for me it crashed when using no evaluation.
- If throws some error about no more data, check the data filters in config.json, such as min and max length. It could be that all your audio and/or text data is being filtered out.