audio - a GEONTT Collection

GEONTT 's Collections

base

3D

LLM

audio

video

image

RAG

audio

updated Oct 9, 2024

EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

Paper • 2402.17485 • Published Feb 27, 2024 • 190
MusicHiFi: Fast High-Fidelity Stereo Vocoding

Paper • 2403.10493 • Published Mar 15, 2024 • 15
Music Consistency Models

Paper • 2404.13358 • Published Apr 20, 2024 • 12
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models

Paper • 2406.02430 • Published Jun 4, 2024 • 31
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning

Paper • 2406.03344 • Published Jun 5, 2024 • 18
VideoTetris: Towards Compositional Text-to-Video Generation

Paper • 2406.04277 • Published Jun 6, 2024 • 23
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS

Paper • 2406.18009 • Published Jun 26, 2024 • 20
PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation

Paper • 2407.02869 • Published Jul 3, 2024 • 18
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs

Paper • 2407.04051 • Published Jul 4, 2024 • 35
Video-to-Audio Generation with Hidden Alignment

Paper • 2407.07464 • Published Jul 10, 2024 • 16
Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity

Paper • 2407.10387 • Published Jul 15, 2024 • 6
MusiConGen: Rhythm and Chord Control for Transformer-Based Text-to-Music Generation

Paper • 2407.15060 • Published Jul 21, 2024 • 9
MulliVC: Multi-lingual Voice Conversion With Cycle Consistency

Paper • 2408.04708 • Published Aug 8, 2024 • 6
Presto! Distilling Steps and Layers for Accelerating Music Generation

Paper • 2410.05167 • Published Oct 7, 2024 • 15