A unified multimodal understanding and generation model.
High-quality speech synthesis powered by Kokoro TTS