Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Spaces:

ales
/

ai-audio-books

Sleeping

App Files Files Community

Fetching metadata from the HF Docker repository...

ai-audio-books / readme.md

Aliaksandr

Update readme.md

84211a6 unverified 4 months ago

|

991 Bytes

Action items

move speaker split to new pipeline
env template
move from AI/ML api to langchain
bugfix w/ 11labs api
async synthesis
map characters to voices
emotion enrichment: add intonation markers, auto-set TTS params
generate good enough sound effects for background
mix effects with narrration
allow files uplaod (.txt)

Backlog

prepare text for TTS
- prepare prompt to split text into character phrases
- split large text in batches, process each batch separatelly, concat batches
- try to identify unknown characters
select voices for TTS
- map characters to available voices
- use LLM to recognize characters for a given text and provide descriptions detailed enough to select appropriate voice
preprocess text phrases for TTS: add intonation markers, auto-set TTS params
run TTS to create narration
add effects. mix them with created narration