HKUSTAudio/Llasa-1B · Can't encode wav file

5 days ago

The first script runs fine, but when I run the second script (with reference voice), I run into this error:

RuntimeError: Calculated padded input size per channel: (0). Kernel size: (1). Kernel size can't be greater than actual input size

the error happens on this line:
vq_code_prompt = Codec_model.encode_code(input_waveform=prompt_wav)

can anyone help me?

HKUST-Audio

HKUST Audio org 5 days ago

It seems like the issue might be related to the size of prompt_wav. Can you check if the prompt wav is correct?

seannam

4 days ago

Thanks for the response! I'm still stuck on the same line while generating with a reference voice file. I experimented with different reference voice wav files and I could not get anything working other than a llasa-generated wav file (that I generated locally with no reference voice).
I made sure to convert my wavs to a sample rate 16 kHz, but I get this error from the same line:
RuntimeError: Expected 2D (unbatched) or 3D (batched) input to conv1d, but got input of size: [1, 1, 189523, 239]
am I getting closer?

seannam

4 days ago

in the xcodec code, it's this line that throws the error:

vq_emb = self.CodecEnc(wav.unsqueeze(1)) # [batch, time//down, 1024] 只是示例

seannam

4 days ago

oh wait I think it might be because I'm using a stereo wav file, not mono. I'll experiement more

seannam

4 days ago

I've got it working with Mono. Thanks!