Anachrovox V0.1 Amber (Bugged)
Hands-Free AI Voice Chat with a Retro Vibe
Hello again @JLouisBiz !
I've updated the spaces, they now use Kokoro instead of XTTS. It's drastically faster. Additionally, since the TTS is so much faster, I felt comfortable extended the output to 1024 tokens.
Hello! It's currently clipped at 512 tokens for output, so yes it won't be suitable for very long generation. It's also a very tiny model - Llama 3.2 3B - so definitely more for conversation and less for completing tasks.
I'm going to try and swap in Kokoro TTS which should be faster on these small machines. Thanks for taking the time to test.
I'm sorry that it's not working for you - can you make sure you've given it permission to use your microphone and that you're using the correct one (if you have multiple)? There should be an icon in the corner like this (in chrome) you can click on which should let you select microphones and check levels. Whenever I've had trouble activating it, I've always found I was using the wrong microphone or my voice volume was way far down.
If you're using a browser other than Chrome please let me know, I've tested it in others but there could always be something I'm missing.
Regarding the indicators in the bottom right,
So in short, if you say "Hex Vox, what's the news?" and you don't see the recording light turn on, then it didn't catch the wake phrase and you have to try again.
If instead you just want to speak your command without relying on wake phrase recognition, you can just click the "Call" button - that will start recording immediately and always send the audio for transcription.
This project was the one that set me off on making the wake phrase model in the first place. At first I didn't have it and relied instead on voice activity detection and transcription, however this performs extremely poorly in noisy environments or any kind of muted speech, with near-constant accidental activation. The only efficient way to be always-on AND hands-free was to use a front-end wake-word model to gate the rest of the audio workflow.
You're very welcome! Just so it's clear, the code is licensed under Apache, and the wake-word models are licensed under CC-BY-4.0 (to coincide with the licenses of the audio they were trained on.) More info on the models here: https://huggingface.co/benjamin-paine/anachrovox