--- license: mit pipeline_tag: audio-to-audio library_name: transformers --- # VoiceRestore: Flow-Matching Transformers for Speech Recording Quality Restoration VoiceRestore is a cutting-edge speech restoration model designed to significantly enhance the quality of degraded voice recordings. Leveraging flow-matching transformers, this model excels at addressing a wide range of audio imperfections commonly found in speech, including background noise, reverberation, distortion, and signal loss. It is based on this [repo](https://github.com/skirdey/voicerestore) & demo of audio restorations: [VoiceRestore](https://sparkling-rabanadas-3082be.netlify.app/) ## Usage - using Transformers 🤗 ``` bash !git lfs install !git clone https://huggingface.co/jadechoghari/VoiceRestore %cd VoiceRestore !pip install -r requirements.txt ``` ``` python from transformers import AutoModel # path to the model folder (on colab it's as follows) checkpoint_path = "/content/VoiceRestore" model = AutoModel.from_pretrained(checkpoint_path, trust_remote_code=True) model("test_input.wav", "test_output.wav") ``` ## Example ### Degraded Input: ### Degraded Input Audio --- ### Restored (steps=32, cfg=1.0): Restored audio - 16 steps, strength 0.5: --- ## Key Features - **Universal Restoration**: The model can handle any level and type of voice recording degradation. Pure magic. - **Easy to Use**: Simple interface for processing degraded audio files. - **Pretrained Model**: Includes a 301 million parameter transformer model with pre-trained weights. (Model is still in the process of training, there will be further checkpoint updates) --- ## Model Details - **Architecture**: Flow-matching transformer - **Parameters**: 300M+ parameters - **Input**: Degraded speech audio (various formats supported) - **Output**: Restored speech ## Limitations and Future Work - Current model is optimized for speech; may not perform optimally on music or other audio types. - Ongoing research to improve performance on extreme degradations. - Future updates may include real-time processing capabilities. ## Citation If you use VoiceRestore in your research, please cite our paper: ``` @article{kirdey2024voicerestore, title={VoiceRestore: Flow-Matching Transformers for Speech Recording Quality Restoration}, author={Kirdey, Stanislav}, journal={arXiv}, year={2024} } ``` ## License This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. ## Acknowledgments - Based on the [E2-TTS implementation by Lucidrains](https://github.com/lucidrains/e2-tts-pytorch) - Special thanks to the open-source community for their invaluable contributions. - Credits: This repository is based on the [E2-TTS implementation by Lucidrains](https://github.com/lucidrains/e2-tts-pytorch)