dialogs2-factory / README.md
naonauno's picture
Update README.md
1b37547 verified
|
raw
history blame
1.72 kB
metadata
title: Amphion Vevo Voice Conversion
emoji: 🎤
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 4.8.0
app_file: app.py
pinned: false
python_version: '3.10'

Amphion's Vevo - Voice Conversion & TTS

This is a Gradio web interface for the Vevo voice conversion model from the Amphion toolkit. It supports:

  • Voice conversion (transferring both style and timbre)
  • Timbre-only conversion
  • Text-to-Speech with voice cloning

Usage

  1. Select mode:

    • Voice: Convert voice with both style and timbre transfer
    • Timbre: Convert only the timbre of the voice
    • TTS: Generate speech from text with voice cloning
  2. Upload audio files based on mode:

    • Source Audio: Your input audio (for voice and timbre modes)
    • Reference Style: Style reference (for voice and TTS modes)
    • Reference Timbre: Voice reference (required for all modes)
  3. For TTS mode:

    • Enter the text you want to convert to speech
    • Optionally provide reference text
    • Select source and reference languages
  4. Adjust Flow Matching Steps (1-64, default: 32)

    • Higher values give better quality but take longer
    • Lower values are faster but may reduce quality
  5. Click "Generate" to create the converted audio

Sample Files

Sample audio files are available in the Amphion/models/vc/vevo/wav/ directory:

  • arabic_male.wav
  • source.wav

Technical Requirements

  • Python 3.10+
  • CUDA-capable GPU recommended for faster inference
  • Minimum 12GB storage space for models

Models

The application automatically downloads required models from Hugging Face:

  • Content Tokenizer (vq32)
  • Content-Style Tokenizer (vq8192)
  • Autoregressive Transformer
  • Flow Matching Transformer
  • Vocoder