Spaces:
Sleeping
Sleeping
metadata
title: ai-audio-books
emoji: π
colorFrom: blue
colorTo: white
sdk: gradio
sdk_version: 4.44.1
app_file: app.py
pinned: false
Action items
- move speaker split to new pipeline
- env template
- move from AI/ML api to langchain
- bugfix w/ 11labs api
- async synthesis
- map characters to voices
- [] emotion enrichment: add intonation markers, auto-set TTS params
- generate good enough sound effects for background
- mix effects with narrration
- allow files uplaod (.txt)
- optimizations
- combine sequential phrases of same character in single phrase
- support large texts. use batching. problem: how to ensure same characters? can detect characters in first prompt, then split text in each batch into character phrases
- probably split large phrases into smaller ones
Backlog
- prepare text for TTS
- prepare prompt to split text into character phrases
- split large text in batches, process each batch separatelly, concat batches
- try to identify unknown characters
- select voices for TTS
- map characters to available voices
- use LLM to recognize characters for a given text and provide descriptions detailed enough to select appropriate voice
- preprocess text phrases for TTS: add intonation markers, auto-set TTS params
- run TTS to create narration
- add effects. mix them with created narration