File size: 1,451 Bytes
8bbc1c8
 
c20a6d7
8bbc1c8
 
 
c20a6d7
8bbc1c8
 
 
 
84211a6
 
 
 
 
 
 
38f34b6
9db4e07
84211a6
9db4e07
c2fa877
 
 
 
 
1e32511
84211a6
1e32511
 
 
 
 
 
 
 
01dd521
1e32511
01dd521
84211a6
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
---
title: ai-audio-books
emoji: πŸ“•
colorFrom: blue
colorTo: white
sdk: gradio
sdk_version: "4.44.1"
app_file: app.py
pinned: false
---

### Action items
- [ ] move speaker split to new pipeline
- [ ] env template
- [ ] move from AI/ML api to langchain
- [ ] bugfix w/ 11labs api
- [ ] async synthesis
- [ ] map characters to voices
- [] emotion enrichment: add intonation markers, auto-set TTS params
- [x] generate good enough sound effects for background
- [ ] mix effects with narrration
- [x] allow files uplaod (.txt)
- optimizations
    - [ ] combine sequential phrases of same character in single phrase
    - [ ] support large texts. use batching. problem: how to ensure same characters?
can detect characters in first prompt, then split text in each batch into character phrases
    - [ ] probably split large phrases into smaller ones

### Backlog
- [ ] prepare text for TTS
    - [x] prepare prompt to split text into character phrases
    - [ ] split large text in batches, process each batch separatelly, concat batches
    - [ ] try to identify unknown characters
- [ ] select voices for TTS
    - [ ] map characters to available voices
    - [ ] use LLM to recognize characters for a given text and provide descriptions
detailed enough to select appropriate voice
- [ ] preprocess text phrases for TTS: add intonation markers, auto-set TTS params
- [ ] run TTS to create narration
- [ ] add effects. mix them with created narration