ai-audio-books / readme.md
Aliaksandr
Update readme.md
84211a6 unverified
|
raw
history blame
991 Bytes

Action items

  • move speaker split to new pipeline
  • env template
  • move from AI/ML api to langchain
  • bugfix w/ 11labs api
  • async synthesis
  • map characters to voices
  • emotion enrichment: add intonation markers, auto-set TTS params
  • generate good enough sound effects for background
  • mix effects with narrration
  • allow files uplaod (.txt)

Backlog

  • prepare text for TTS
    • prepare prompt to split text into character phrases
    • split large text in batches, process each batch separatelly, concat batches
    • try to identify unknown characters
  • select voices for TTS
    • map characters to available voices
    • use LLM to recognize characters for a given text and provide descriptions detailed enough to select appropriate voice
  • preprocess text phrases for TTS: add intonation markers, auto-set TTS params
  • run TTS to create narration
  • add effects. mix them with created narration