AI & ML interests

None defined yet.

Recent Activity

XenovaΒ  updated a Space about 1 year ago
nerfies/paper-template
View all activity

nerfies's activity

XenovaΒ 
posted an update 6 days ago
view post
Post
5096
First project of 2025: Vision Transformer Explorer

I built a web app to interactively explore the self-attention maps produced by ViTs. This explains what the model is focusing on when making predictions, and provides insights into its inner workings! 🀯

Try it out yourself! πŸ‘‡
webml-community/attention-visualization

Source code: https://github.com/huggingface/transformers.js-examples/tree/main/attention-visualization
XenovaΒ 
posted an update 20 days ago
view post
Post
3020
Introducing Moonshine Web: real-time speech recognition running 100% locally in your browser!
πŸš€ Faster and more accurate than Whisper
πŸ”’ Privacy-focused (no data leaves your device)
⚑️ WebGPU accelerated (w/ WASM fallback)
πŸ”₯ Powered by ONNX Runtime Web and Transformers.js

Demo: webml-community/moonshine-web
Source code: https://github.com/huggingface/transformers.js-examples/tree/main/moonshine-web
Β·
XenovaΒ 
posted an update 30 days ago
view post
Post
2999
Introducing TTS WebGPU: The first ever text-to-speech web app built with WebGPU acceleration! πŸ”₯ High-quality and natural speech generation that runs 100% locally in your browser, powered by OuteTTS and Transformers.js. πŸ€— Try it out yourself!

Demo: webml-community/text-to-speech-webgpu
Source code: https://github.com/huggingface/transformers.js-examples/tree/main/text-to-speech-webgpu
Model: onnx-community/OuteTTS-0.2-500M (ONNX), OuteAI/OuteTTS-0.2-500M (PyTorch)
XenovaΒ 
posted an update about 1 month ago
view post
Post
3962
We just released Transformers.js v3.1 and you're not going to believe what's now possible in the browser w/ WebGPU! 🀯 Let's take a look:
πŸ”€ Janus from Deepseek for unified multimodal understanding and generation (Text-to-Image and Image-Text-to-Text)
πŸ‘οΈ Qwen2-VL from Qwen for dynamic-resolution image understanding
πŸ”’ JinaCLIP from Jina AI for general-purpose multilingual multimodal embeddings
πŸŒ‹ LLaVA-OneVision from ByteDance for Image-Text-to-Text generation
πŸ€Έβ€β™€οΈ ViTPose for pose estimation
πŸ“„ MGP-STR for optical character recognition (OCR)
πŸ“ˆ PatchTST & PatchTSMixer for time series forecasting

That's right, everything running 100% locally in your browser (no data sent to a server)! πŸ”₯ Huge for privacy!

Check out the release notes for more information. πŸ‘‡
https://github.com/huggingface/transformers.js/releases/tag/3.1.0

Demo link (+ source code): webml-community/Janus-1.3B-WebGPU
XenovaΒ 
posted an update about 2 months ago
view post
Post
5577
Have you tried out πŸ€— Transformers.js v3? Here are the new features:
⚑ WebGPU support (up to 100x faster than WASM)
πŸ”’ New quantization formats (dtypes)
πŸ› 120 supported architectures in total
πŸ“‚ 25 new example projects and templates
πŸ€– Over 1200 pre-converted models
🌐 Node.js (ESM + CJS), Deno, and Bun compatibility
🏑 A new home on GitHub and NPM

Get started with npm i @huggingface/transformers.

Learn more in our blog post: https://huggingface.co/blog/transformersjs-v3
  • 3 replies
Β·
XenovaΒ 
posted an update 5 months ago
view post
Post
13930
I can't believe this... Phi-3.5-mini (3.8B) running in-browser at ~90 tokens/second on WebGPU w/ Transformers.js and ONNX Runtime Web! 🀯 Since everything runs 100% locally, no messages are sent to a server β€” a huge win for privacy!
- πŸ€— Demo: webml-community/phi-3.5-webgpu
- πŸ§‘β€πŸ’» Source code: https://github.com/huggingface/transformers.js-examples/tree/main/phi-3.5-webgpu
Β·
XenovaΒ 
posted an update 5 months ago
view post
Post
14965
I'm excited to announce that Transformers.js V3 is finally available on NPM! πŸ”₯ State-of-the-art Machine Learning for the web, now with WebGPU support! 🀯⚑️

Install it from NPM with:
πš—πš™πš– πš’ @πš‘πšžπšπšπš’πš—πšπšπšŠπšŒπšŽ/πšπš›πšŠπš—πšœπšπš˜πš›πš–πšŽπš›πšœ

or via CDN, for example: https://v2.scrimba.com/s0lmm0qh1q

Segment Anything demo: webml-community/segment-anything-webgpu
Β·
XenovaΒ 
posted an update 6 months ago
view post
Post
7975
Introducing Whisper Diarization: Multilingual speech recognition with word-level timestamps and speaker segmentation, running 100% locally in your browser thanks to πŸ€— Transformers.js!

Tested on this iconic Letterman interview w/ Grace Hopper from 1983!
- Demo: Xenova/whisper-speaker-diarization
- Source code: Xenova/whisper-speaker-diarization
  • 1 reply
Β·
XenovaΒ 
posted an update 6 months ago
view post
Post
6797
Introducing Whisper Timestamped: Multilingual speech recognition with word-level timestamps, running 100% locally in your browser thanks to πŸ€— Transformers.js! Check it out!
πŸ‘‰ Xenova/whisper-word-level-timestamps πŸ‘ˆ

This unlocks a world of possibilities for in-browser video editing! 🀯 What will you build? 😍

Source code: https://github.com/xenova/transformers.js/tree/v3/examples/whisper-word-timestamps
  • 1 reply
Β·
XenovaΒ 
posted an update 6 months ago
XenovaΒ 
posted an update 6 months ago
view post
Post
6019
Florence-2, the new vision foundation model by Microsoft, can now run 100% locally in your browser on WebGPU, thanks to Transformers.js! πŸ€—πŸ€―

It supports tasks like image captioning, optical character recognition, object detection, and many more! 😍 WOW!
- Demo: Xenova/florence2-webgpu
- Models: https://huggingface.co/models?library=transformers.js&other=florence2
- Source code: https://github.com/xenova/transformers.js/tree/v3/examples/florence2-webgpu
XenovaΒ 
posted an update 7 months ago
view post
Post
10252
Introducing Whisper WebGPU: Blazingly-fast ML-powered speech recognition directly in your browser! πŸš€ It supports multilingual transcription and translation across 100 languages! 🀯

The model runs locally, meaning no data leaves your device! 😍

Check it out! πŸ‘‡
- Demo: Xenova/whisper-webgpu
- Source code: https://github.com/xenova/whisper-web/tree/experimental-webgpu
Β·
XenovaΒ 
posted an update 8 months ago
view post
Post
11491
Introducing Phi-3 WebGPU, a private and powerful AI chatbot that runs 100% locally in your browser, powered by πŸ€— Transformers.js and onnxruntime-web!

πŸ”’ On-device inference: no data sent to a server
⚑️ WebGPU-accelerated (> 20 t/s)
πŸ“₯ Model downloaded once and cached

Try it out: Xenova/experimental-phi3-webgpu
Β·
XenovaΒ 
posted an update 9 months ago
view post
Post
13011
Introducing MusicGen Web: AI-powered music generation directly in your browser, built with πŸ€— Transformers.js! 🎡

Everything runs 100% locally, meaning there are no calls to an API! 🀯 Since it's served as a static HF space, it costs $0 to host and run! πŸ”₯

We also added the ability to share your generated music to the discussion tab, so give it a try! πŸ‘‡
Xenova/musicgen-web
  • 2 replies
Β·
XenovaΒ 
posted an update 10 months ago
view post
Post
Introducing the πŸ€— Transformers.js WebGPU Embedding Benchmark! ⚑️
πŸ‘‰ Xenova/webgpu-embedding-benchmark πŸ‘ˆ

On my device, I was able to achieve a 64.04x speedup over WASM! 🀯 How much does WebGPU speed up ML models running locally in your browser? Try it out and share your results! πŸš€
Β·
XenovaΒ 
posted an update 10 months ago
view post
Post
Real-time object detection w/ πŸ€— Transformers.js, running YOLOv9 locally in your browser! 🀯

Try it out yourself: Xenova/video-object-detection
(Model used + example code: Xenova/gelan-c_all)

This demo shows why on-device ML is so important:
1. Privacy - local inference means no user data is sent to the cloud
2. No server latency - empowers developers to build real-time applications
3. Lower costs - no need to pay for bandwidth and processing of streamed video

I can't wait to see what you build with it! πŸ”₯
  • 3 replies
Β·
XenovaΒ 
posted an update 11 months ago
view post
Post
Introducing Remove Background Web: In-browser background removal, powered by @briaai 's new RMBG-v1.4 model and πŸ€— Transformers.js!

Everything runs 100% locally, meaning none of your images are uploaded to a server! 🀯 At only ~45MB, the 8-bit quantized version of the model is perfect for in-browser usage (it even works on mobile).

Check it out! πŸ‘‡
Demo: Xenova/remove-background-web
Model: briaai/RMBG-1.4
Β·
XenovaΒ 
posted an update 12 months ago
view post
Post
Last week, we released πŸ€— Transformers.js v2.14, which added support for SAM (Segment Anything Model).

This means you can now generate high-quality segmentation masks for objects in a scene, directly in your browser! 🀯

Demo (+ source code): Xenova/segment-anything-web
Model: Xenova/slimsam-77-uniform

But how does this differ from Meta's original demo? πŸ€” Didn't that also run in-browser?

Well, in their demo, the image embeddings are computed server-side, then sent to the client for decoding. Trying to do this all client-side would be completely impractical: taking minutes per image! πŸ˜΅β€πŸ’«

That's where SlimSAM comes to the rescue! SlimSAM is a novel SAM compression method, able to shrink the model over 100x (637M β†’ 5.5M params), while still achieving remarkable results!

The best part? You can get started in a few lines of JavaScript code, thanks to Transformers.js! πŸ”₯

// npm i @xenova/transformers
import { SamModel, AutoProcessor, RawImage } from '@xenova/transformers';

// Load model and processor
const model = await SamModel.from_pretrained('Xenova/slimsam-77-uniform');
const processor = await AutoProcessor.from_pretrained('Xenova/slimsam-77-uniform');

// Prepare image and input points
const img_url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/corgi.jpg';
const raw_image = await RawImage.read(img_url);
const input_points = [[[340, 250]]];

// Process inputs and perform mask generation
const inputs = await processor(raw_image, input_points);
const outputs = await model(inputs);

// Post-process masks
const masks = await processor.post_process_masks(outputs.pred_masks, inputs.original_sizes, inputs.reshaped_input_sizes);
console.log(masks);

// Visualize the mask
const image = RawImage.fromTensor(masks[0][0].mul(255));
image.save('mask.png');


I can't wait to see what you build with it! πŸ€—
Β·