Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Spaces:

ales
/

ai-audio-books

Sleeping

App Files Files Community

Fetching metadata from the HF Docker repository...

ai-audio-books / readme.md

navalnica

upd readme

c20a6d7 4 months ago

|

1.45 kB

metadata

title: ai-audio-books
emoji: 📕
colorFrom: blue
colorTo: white
sdk: gradio
sdk_version: 4.44.1
app_file: app.py
pinned: false

Action items

move speaker split to new pipeline
env template
move from AI/ML api to langchain
bugfix w/ 11labs api
async synthesis
map characters to voices
[] emotion enrichment: add intonation markers, auto-set TTS params
generate good enough sound effects for background
mix effects with narrration
allow files uplaod (.txt)
optimizations
- combine sequential phrases of same character in single phrase
- support large texts. use batching. problem: how to ensure same characters? can detect characters in first prompt, then split text in each batch into character phrases
- probably split large phrases into smaller ones

Backlog

prepare text for TTS
- prepare prompt to split text into character phrases
- split large text in batches, process each batch separatelly, concat batches
- try to identify unknown characters
select voices for TTS
- map characters to available voices
- use LLM to recognize characters for a given text and provide descriptions detailed enough to select appropriate voice
preprocess text phrases for TTS: add intonation markers, auto-set TTS params
run TTS to create narration
add effects. mix them with created narration