GEONTT
's Collections
audio
updated
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with
Audio2Video Diffusion Model under Weak Conditions
Paper
•
2402.17485
•
Published
•
190
MusicHiFi: Fast High-Fidelity Stereo Vocoding
Paper
•
2403.10493
•
Published
•
15
Paper
•
2404.13358
•
Published
•
12
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Paper
•
2406.02430
•
Published
•
31
Audio Mamba: Bidirectional State Space Model for Audio Representation
Learning
Paper
•
2406.03344
•
Published
•
18
VideoTetris: Towards Compositional Text-to-Video Generation
Paper
•
2406.04277
•
Published
•
23
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
Paper
•
2406.18009
•
Published
•
20
PicoAudio: Enabling Precise Timestamp and Frequency Controllability of
Audio Events in Text-to-audio Generation
Paper
•
2407.02869
•
Published
•
18
FunAudioLLM: Voice Understanding and Generation Foundation Models for
Natural Interaction Between Humans and LLMs
Paper
•
2407.04051
•
Published
•
35
Video-to-Audio Generation with Hidden Alignment
Paper
•
2407.07464
•
Published
•
16
Masked Generative Video-to-Audio Transformers with Enhanced
Synchronicity
Paper
•
2407.10387
•
Published
•
6
MusiConGen: Rhythm and Chord Control for Transformer-Based Text-to-Music
Generation
Paper
•
2407.15060
•
Published
•
9
MulliVC: Multi-lingual Voice Conversion With Cycle Consistency
Paper
•
2408.04708
•
Published
•
6
Presto! Distilling Steps and Layers for Accelerating Music Generation
Paper
•
2410.05167
•
Published
•
15