Can't encode wav file
The first script runs fine, but when I run the second script (with reference voice), I run into this error:
RuntimeError: Calculated padded input size per channel: (0). Kernel size: (1). Kernel size can't be greater than actual input size
the error happens on this line:
vq_code_prompt = Codec_model.encode_code(input_waveform=prompt_wav)
can anyone help me?
It seems like the issue might be related to the size of prompt_wav. Can you check if the prompt wav is correct?
Thanks for the response! I'm still stuck on the same line while generating with a reference voice file. I experimented with different reference voice wav files and I could not get anything working other than a llasa-generated wav file (that I generated locally with no reference voice).
I made sure to convert my wavs to a sample rate 16 kHz, but I get this error from the same line:RuntimeError: Expected 2D (unbatched) or 3D (batched) input to conv1d, but got input of size: [1, 1, 189523, 239]
am I getting closer?
in the xcodec code, it's this line that throws the error:
vq_emb = self.CodecEnc(wav.unsqueeze(1)) # [batch, time//down, 1024] 只是示例
oh wait I think it might be because I'm using a stereo wav file, not mono. I'll experiement more
I've got it working with Mono. Thanks!