metadata

license: mit
language:
  - en
pipeline_tag: text-to-speech
tags:
  - audiocraft
  - audiogen
  - styletts2
  - audio
  - synthesis
  - shift
  - audeering
  - dkounadis
  - sound
  - scene
  - acoustic-scene
  - audio-generation

Affective TTS / SoundScape

SHIFT TTS tool with Affective voices
Analysis of emotionality #1
Soundscape e.g. trees, water, hills,, generation via AudioGen
landscape2soundscape.py shows how to overlay TTS & Soundscape to Images and create videos

Available Voices

Listen to available voices!

API

Install

virtualenv --python=python3 ~/.envs/.my_env
source ~/.envs/.my_env/bin/activate
cd shift/
pip install -r requirements.txt

Flask

CUDA_DEVICE_ORDER=PCI_BUS_ID HF_HOME=./hf_home CUDA_VISIBLE_DEVICES=2 python api.py

The following need api.py to be running on a tmux session.

Text 2 Speech

# Basic TTS - See Available Voices
python tts.py --text sample.txt --voice "en_US/m-ailabs_low#mary_ann" --affective

# voice cloning
python tts.py --text sample.txt --native assets/native_voice.wav

Image 2 Video

# Make video narrating an image - All above TTS args apply also here!
python tts.py --text sample.txt --image assets/image_from_T31.jpg

Video 2 Video

# Video Dubbing - from time-stamped subtitles (.srt)
python tts.py --text assets/head_of_fortuna_en.srt --video assets/head_of_fortuna.mp4

# Video narration - from text description (.txt)
python tts.py --text assets/head_of_fortuna_GPT.txt --video assets/head_of_fortuna.mp4

Landscape 2 Soundscape

# TTS & soundscape - overlay to .mp4
python landscape2soundscape.py

Examples

Substitute Native voice via TTS

Same video where Native voice is replaced with English TTS voice with similar emotion

Video dubbing from subtitles .srt

Video Dubbing

Generate dubbed video:

python tts.py --text assets/head_of_fortuna_en.srt --video assets/head_of_fortuna.mp4

Joint Application of D3.1 & D3.2

From an image and text create a video:


python tts.py --text sample.txt --image assets/image_from_T31.jpg

Landscape 2 Soundscape

YouTube Videos

# Loads image & text & sound-scene text and creates .mp4
python landscape2soundscape.py

Live Demo - Paplay

Flask

CUDA_DEVICE_ORDER=PCI_BUS_ID HF_HOME=/data/dkounadis/.hf7/ CUDA_VISIBLE_DEVICES=4 python live_api.py

Client (Ubutu)

python live_demo.py  # will ask text input & play soundscape