Spaces:

ales
/

ai-audio-books

Sleeping

ai-audio-books / README.md

Create README.md

fccc76e verified 4 months ago

1.46 kB

	---
	title: ai-audio-books
	emoji: 📕👨‍💻🎧
	colorFrom: blue
	colorTo: gray
	sdk: gradio
	sdk_version: 4.44.1
	app_file: app.py
	pinned: false
	---

	### Action items
	- [ ] move speaker split to new pipeline
	- [ ] env template
	- [ ] move from AI/ML api to langchain
	- [ ] bugfix w/ 11labs api
	- [ ] async synthesis
	- [ ] map characters to voices
	- [] emotion enrichment: add intonation markers, auto-set TTS params
	- [x] generate good enough sound effects for background
	- [ ] mix effects with narrration
	- [x] allow files uplaod (.txt)
	- optimizations
	- [ ] combine sequential phrases of same character in single phrase
	- [ ] support large texts. use batching. problem: how to ensure same characters?
	can detect characters in first prompt, then split text in each batch into character phrases
	- [ ] probably split large phrases into smaller ones

	### Backlog
	- [ ] prepare text for TTS
	- [x] prepare prompt to split text into character phrases
	- [ ] split large text in batches, process each batch separatelly, concat batches
	- [ ] try to identify unknown characters
	- [ ] select voices for TTS
	- [ ] map characters to available voices
	- [ ] use LLM to recognize characters for a given text and provide descriptions
	detailed enough to select appropriate voice
	- [ ] preprocess text phrases for TTS: add intonation markers, auto-set TTS params
	- [ ] run TTS to create narration
	- [ ] add effects. mix them with created narration