TTS, VC - a heiscold Collection

heiscold 's Collections

TTS, VC

Music_Generation

Audio_

Diffusion_FM_...

LLM

TTS, VC

updated Sep 3, 2024

Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like

Paper • 2402.07383 • Published Feb 12, 2024 • 13
Matcha-TTS: A fast TTS architecture with conditional flow matching

Paper • 2309.03199 • Published Sep 6, 2023 • 11
Natural language guidance of high-fidelity text-to-speech with synthetic annotations

Paper • 2402.01912 • Published Feb 2, 2024 • 11
Fast Timing-Conditioned Latent Audio Diffusion

Paper • 2402.04825 • Published Feb 7, 2024 • 7
FlashSpeech: Efficient Zero-Shot Speech Synthesis

Paper • 2404.14700 • Published Apr 23, 2024 • 29
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models

Paper • 2406.02430 • Published Jun 4, 2024 • 31
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes

Paper • 2406.02897 • Published Jun 5, 2024 • 13
VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers

Paper • 2406.05370 • Published Jun 8, 2024 • 15
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS

Paper • 2406.18009 • Published Jun 26, 2024 • 20
Towards Robust Speech Representation Learning for Thousands of Languages

Paper • 2407.00837 • Published Jun 30, 2024 • 10
Autoregressive Speech Synthesis without Vector Quantization

Paper • 2407.08551 • Published Jul 11, 2024 • 14
Stable Audio Open

Paper • 2407.14358 • Published Jul 19, 2024 • 24
Efficient Audio Captioning with Encoder-Level Knowledge Distillation

Paper • 2407.14329 • Published Jul 19, 2024 • 4
Discrete Flow Matching

Paper • 2407.15595 • Published Jul 22, 2024 • 13
Speech Slytherin: Examining the Performance and Efficiency of Mamba for Speech Separation, Recognition, and Synthesis

Paper • 2407.09732 • Published Jul 13, 2024 • 8
Qwen2-Audio Technical Report

Paper • 2407.10759 • Published Jul 15, 2024 • 55
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer

Paper • 2308.06873 • Published Aug 14, 2023 • 25
MulliVC: Multi-lingual Voice Conversion With Cycle Consistency

Paper • 2408.04708 • Published Aug 8, 2024 • 6
Audio Match Cutting: Finding and Creating Matching Audio Transitions in Movies and Videos

Paper • 2408.10998 • Published Aug 20, 2024 • 8
Accelerating High-Fidelity Waveform Generation via Adversarial Flow Matching Optimization

Paper • 2408.08019 • Published Aug 15, 2024 • 10
PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation

Paper • 2408.07547 • Published Aug 14, 2024 • 7
Meta Flow Matching: Integrating Vector Fields on the Wasserstein Manifold

Paper • 2408.14608 • Published Aug 26, 2024 • 7
No Training, No Problem: Rethinking Classifier-Free Guidance for Diffusion Models

Paper • 2407.02687 • Published Jul 2, 2024 • 22
Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming

Paper • 2408.16725 • Published Aug 29, 2024 • 52