Benjamin Paine's picture

Benjamin Paine PRO

benjamin-paine

AI & ML interests

A software engineer with an AI habit

Recent Activity

updated a Space 2 days ago
benjamin-paine/anachrovox-v0.1-amber
replied to their post 3 days ago
Hello HuggingFace ๐Ÿค—, and happy new year! ๐ŸŽ† I'm thrilled to be releasing the first iteration of a project I've been working on for quite awhile now. It's called Taproot, and it's a seamlessly scalable open-source AI/ML inference engine designed for letting developers build real-time experiences clustered across a small-to-mid-sized cluster, without the burden of hyperscale infrastructure. Along with the server and task framework is a client library for node and the browser. And what good is a server and client without an app to go alongside it? To that end, I'm also releasing Anachrovox, a fun, real-time hands-free voice assistant that can run on mid-level devices in <12GB VRAM, with web search, weather, and other tools. It uses my real-time browser wake-word library to detect utterances of the phrase 'Hey Vox', 'Hi Vox', 'Okay Vox', 'Anachrovox' or just 'Vox' (alongside some others.) Releasing this many things at once will definitely result in bugs, so please report them when sighted! Thank you all! Taproot: https://github.com/painebenjamin/taproot Taproot JS Client: https://github.com/painebenjamin/taproot.js Anachrovox: https://github.com/painebenjamin/anachrovox The Anachrovox Spaces are networked together, balancing load across them to keep all front-ends responsive. You only have to choose what color you like the most! https://huggingface.co/spaces/benjamin-paine/anachrovox https://huggingface.co/spaces/benjamin-paine/anachrovox-amber
View all activity

Organizations

Taproot AI's profile picture

benjamin-paine's activity

replied to their post 3 days ago
view reply

Hello again @JLouisBiz !

I've updated the spaces, they now use Kokoro instead of XTTS. It's drastically faster. Additionally, since the TTS is so much faster, I felt comfortable extended the output to 1024 tokens.

replied to their post 5 days ago
view reply

Hello! It's currently clipped at 512 tokens for output, so yes it won't be suitable for very long generation. It's also a very tiny model - Llama 3.2 3B - so definitely more for conversation and less for completing tasks.

I'm going to try and swap in Kokoro TTS which should be faster on these small machines. Thanks for taking the time to test.

replied to their post 5 days ago
view reply

I'm sorry that it's not working for you - can you make sure you've given it permission to use your microphone and that you're using the correct one (if you have multiple)? There should be an icon in the corner like this (in chrome) you can click on which should let you select microphones and check levels. Whenever I've had trouble activating it, I've always found I was using the wrong microphone or my voice volume was way far down.

image.png

If you're using a browser other than Chrome please let me know, I've tested it in others but there could always be something I'm missing.

replied to their post 5 days ago
view reply

Regarding the indicators in the bottom right,

  • If the "recording" light doesn't turn on (the top one,) then it did not hear you utter a wake phrase.
  • If the "listening" light does turn on, it detects voice activity, but unless you utter a wake phrase it will not send the recording for transcription and completion.

So in short, if you say "Hex Vox, what's the news?" and you don't see the recording light turn on, then it didn't catch the wake phrase and you have to try again.

If instead you just want to speak your command without relying on wake phrase recognition, you can just click the "Call" button - that will start recording immediately and always send the audio for transcription.

This project was the one that set me off on making the wake phrase model in the first place. At first I didn't have it and relied instead on voice activity detection and transcription, however this performs extremely poorly in noisy environments or any kind of muted speech, with near-constant accidental activation. The only efficient way to be always-on AND hands-free was to use a front-end wake-word model to gate the rest of the audio workflow.

replied to their post 5 days ago
New activity in benjamin-paine/anachrovox-v0.1-azure 5 days ago

uninterrupted view

1
#1 opened 5 days ago by
prithivMLmods
posted an update 6 days ago
view post
Post
2345
Hello HuggingFace ๐Ÿค—, and happy new year! ๐ŸŽ†

I'm thrilled to be releasing the first iteration of a project I've been working on for quite awhile now. It's called Taproot, and it's a seamlessly scalable open-source AI/ML inference engine designed for letting developers build real-time experiences clustered across a small-to-mid-sized cluster, without the burden of hyperscale infrastructure.

Along with the server and task framework is a client library for node and the browser. And what good is a server and client without an app to go alongside it? To that end, I'm also releasing Anachrovox, a fun, real-time hands-free voice assistant that can run on mid-level devices in <12GB VRAM, with web search, weather, and other tools. It uses my real-time browser wake-word library to detect utterances of the phrase 'Hey Vox', 'Hi Vox', 'Okay Vox', 'Anachrovox' or just 'Vox' (alongside some others.)

Releasing this many things at once will definitely result in bugs, so please report them when sighted! Thank you all!

Taproot: https://github.com/painebenjamin/taproot
Taproot JS Client: https://github.com/painebenjamin/taproot.js
Anachrovox: https://github.com/painebenjamin/anachrovox

The Anachrovox Spaces are networked together, balancing load across them to keep all front-ends responsive. You only have to choose what color you like the most!

https://huggingface.co/spaces/benjamin-paine/anachrovox
https://huggingface.co/spaces/benjamin-paine/anachrovox-amber
  • 12 replies
ยท