ai-audio-books / readme.md
navalnica
upd readme
c20a6d7
|
raw
history blame
1.45 kB
metadata
title: ai-audio-books
emoji: πŸ“•
colorFrom: blue
colorTo: white
sdk: gradio
sdk_version: 4.44.1
app_file: app.py
pinned: false

Action items

  • move speaker split to new pipeline
  • env template
  • move from AI/ML api to langchain
  • bugfix w/ 11labs api
  • async synthesis
  • map characters to voices
  • [] emotion enrichment: add intonation markers, auto-set TTS params
  • generate good enough sound effects for background
  • mix effects with narrration
  • allow files uplaod (.txt)
  • optimizations
    • combine sequential phrases of same character in single phrase
    • support large texts. use batching. problem: how to ensure same characters? can detect characters in first prompt, then split text in each batch into character phrases
    • probably split large phrases into smaller ones

Backlog

  • prepare text for TTS
    • prepare prompt to split text into character phrases
    • split large text in batches, process each batch separatelly, concat batches
    • try to identify unknown characters
  • select voices for TTS
    • map characters to available voices
    • use LLM to recognize characters for a given text and provide descriptions detailed enough to select appropriate voice
  • preprocess text phrases for TTS: add intonation markers, auto-set TTS params
  • run TTS to create narration
  • add effects. mix them with created narration