Spaces:

ales
/

ai-audio-books

Sleeping

ales commited on Oct 11, 2024

Commit

fccc76e

verified ·

1 Parent(s): bcc9601

Create README.md

Files changed (1) hide show

README.md ADDED Viewed

+---
+title: ai-audio-books
+emoji: 📕👨‍💻🎧
+colorFrom: blue
+colorTo: gray
+sdk: gradio
+sdk_version: 4.44.1
+app_file: app.py
+pinned: false
+---
+### Action items
+- [ ] move speaker split to new pipeline
+- [ ] env template
+- [ ] move from AI/ML api to langchain
+- [ ] bugfix w/ 11labs api
+- [ ] async synthesis
+- [ ] map characters to voices
+- [] emotion enrichment: add intonation markers, auto-set TTS params
+- [x] generate good enough sound effects for background
+- [ ] mix effects with narrration
+- [x] allow files uplaod (.txt)
+- optimizations
+    - [ ] combine sequential phrases of same character in single phrase
+    - [ ] support large texts. use batching. problem: how to ensure same characters?
+can detect characters in first prompt, then split text in each batch into character phrases
+    - [ ] probably split large phrases into smaller ones
+### Backlog
+- [ ] prepare text for TTS
+    - [x] prepare prompt to split text into character phrases
+    - [ ] split large text in batches, process each batch separatelly, concat batches
+    - [ ] try to identify unknown characters
+- [ ] select voices for TTS
+    - [ ] map characters to available voices
+    - [ ] use LLM to recognize characters for a given text and provide descriptions
+detailed enough to select appropriate voice
+- [ ] preprocess text phrases for TTS: add intonation markers, auto-set TTS params
+- [ ] run TTS to create narration
+- [ ] add effects. mix them with created narration