ales commited on
Commit
fccc76e
Β·
verified Β·
1 Parent(s): bcc9601

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +41 -0
README.md ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: ai-audio-books
3
+ emoji: πŸ“•πŸ‘¨β€πŸ’»πŸŽ§
4
+ colorFrom: blue
5
+ colorTo: gray
6
+ sdk: gradio
7
+ sdk_version: 4.44.1
8
+ app_file: app.py
9
+ pinned: false
10
+ ---
11
+
12
+ ### Action items
13
+ - [ ] move speaker split to new pipeline
14
+ - [ ] env template
15
+ - [ ] move from AI/ML api to langchain
16
+ - [ ] bugfix w/ 11labs api
17
+ - [ ] async synthesis
18
+ - [ ] map characters to voices
19
+ - [] emotion enrichment: add intonation markers, auto-set TTS params
20
+ - [x] generate good enough sound effects for background
21
+ - [ ] mix effects with narrration
22
+ - [x] allow files uplaod (.txt)
23
+ - optimizations
24
+ - [ ] combine sequential phrases of same character in single phrase
25
+ - [ ] support large texts. use batching. problem: how to ensure same characters?
26
+ can detect characters in first prompt, then split text in each batch into character phrases
27
+ - [ ] probably split large phrases into smaller ones
28
+
29
+ ### Backlog
30
+ - [ ] prepare text for TTS
31
+ - [x] prepare prompt to split text into character phrases
32
+ - [ ] split large text in batches, process each batch separatelly, concat batches
33
+ - [ ] try to identify unknown characters
34
+ - [ ] select voices for TTS
35
+ - [ ] map characters to available voices
36
+ - [ ] use LLM to recognize characters for a given text and provide descriptions
37
+ detailed enough to select appropriate voice
38
+ - [ ] preprocess text phrases for TTS: add intonation markers, auto-set TTS params
39
+ - [ ] run TTS to create narration
40
+ - [ ] add effects. mix them with created narration
41
+