matlok
's Collections
Papers - Audio
updated
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Paper
•
2310.00704
•
Published
•
21
Structural Similarities Between Language Models and Neural Response
Measurements
Paper
•
2306.01930
•
Published
•
2
Streaming Transformer ASR with Blockwise Synchronous Beam Search
Paper
•
2006.14941
•
Published
•
2
NU-GAN: High resolution neural upsampling with GAN
Paper
•
2010.11362
•
Published
•
2
MusicHiFi: Fast High-Fidelity Stereo Vocoding
Paper
•
2403.10493
•
Published
•
15
A Multimodal Approach to Device-Directed Speech Detection with Large
Language Models
Paper
•
2403.14438
•
Published
•
2
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram
Predictions
Paper
•
1712.05884
•
Published
•
2
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
Paper
•
2403.16973
•
Published
•
2
Masked Audio Generation using a Single Non-Autoregressive Transformer
Paper
•
2401.04577
•
Published
•
42
WavLLM: Towards Robust and Adaptive Speech Large Language Model
Paper
•
2404.00656
•
Published
•
10
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting
for Text-to-Speech Synthesis
Paper
•
2404.03204
•
Published
•
7
Qwen-Audio: Advancing Universal Audio Understanding via Unified
Large-Scale Audio-Language Models
Paper
•
2311.07919
•
Published
•
9
Custom Data Augmentation for low resource ASR using Bark and
Retrieval-Based Voice Conversion
Paper
•
2311.14836
•
Published
•
2
MuPT: A Generative Symbolic Music Pretrained Transformer
Paper
•
2404.06393
•
Published
•
14
Audio Dialogues: Dialogues dataset for audio and music understanding
Paper
•
2404.07616
•
Published
•
15
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through
Direct Preference Optimization
Paper
•
2404.09956
•
Published
•
11
Long-form music generation with latent diffusion
Paper
•
2404.10301
•
Published
•
24
Paper
•
2404.13358
•
Published
•
12
SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General
Sound
Paper
•
2405.00233
•
Published
•
13
LLM-AD: Large Language Model based Audio Description System
Paper
•
2405.00983
•
Published
•
16
Images that Sound: Composing Images and Sounds on a Single Canvas
Paper
•
2405.12221
•
Published
•
1
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation
in Videos
Paper
•
2406.08407
•
Published
•
24
FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized
Sounds
Paper
•
2407.01494
•
Published
•
13
PicoAudio: Enabling Precise Timestamp and Frequency Controllability of
Audio Events in Text-to-audio Generation
Paper
•
2407.02869
•
Published
•
18
FunAudioLLM: Voice Understanding and Generation Foundation Models for
Natural Interaction Between Humans and LLMs
Paper
•
2407.04051
•
Published
•
35
Qwen2-Audio Technical Report
Paper
•
2407.10759
•
Published
•
55
Audio Conditioning for Music Generation via Discrete Bottleneck Features
Paper
•
2407.12563
•
Published
•
5
Facing the Music: Tackling Singing Voice Separation in Cinematic Audio
Source Separation
Paper
•
2408.03588
•
Published
•
6
RP1M: A Large-Scale Motion Dataset for Piano Playing with Bi-Manual
Dexterous Robot Hands
Paper
•
2408.11048
•
Published
•
4
Foundation Models for Music: A Survey
Paper
•
2408.14340
•
Published
•
43
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio
Language Modeling
Paper
•
2408.16532
•
Published
•
47
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech
Representations
Paper
•
2006.11477
•
Published
•
5