File size: 2,961 Bytes
d81ceb9
bcedcd1
d81ceb9
687f6d7
 
 
d81ceb9
 
 
 
9ab1b1d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d81ceb9
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
---
base_model: CIDAS/clipseg-rd64
library_name: transformers.js
tags:
- vision
- image-segmentation
---

https://huggingface.co/CIDAS/clipseg-rd64 with ONNX weights to be compatible with Transformers.js.

## Usage (Transformers.js)

If you haven't already, you can install the [Transformers.js](https://huggingface.co/docs/transformers.js) JavaScript library from [NPM](https://www.npmjs.com/package/@xenova/transformers) using:
```bash
npm i @xenova/transformers
```

**Example:** Perform zero-shot image segmentation with a `CLIPSegForImageSegmentation` model.

```js
import { AutoTokenizer, AutoProcessor, CLIPSegForImageSegmentation, RawImage } from '@xenova/transformers';

// Load tokenizer, processor, and model
const tokenizer = await AutoTokenizer.from_pretrained('Xenova/clipseg-rd64');
const processor = await AutoProcessor.from_pretrained('Xenova/clipseg-rd64');
const model = await CLIPSegForImageSegmentation.from_pretrained('Xenova/clipseg-rd64');

// Run tokenization
const texts = ['a glass', 'something to fill', 'wood', 'a jar'];
const text_inputs = tokenizer(texts, { padding: true, truncation: true });

// Read image and run processor
const image = await RawImage.read('https://github.com/timojl/clipseg/blob/master/example_image.jpg?raw=true');
const image_inputs = await processor(image);

// Run model with both text and pixel inputs
const { logits } = await model({ ...text_inputs, ...image_inputs });
// logits: Tensor {
//   dims: [4, 352, 352],
//   type: 'float32',
//   data: Float32Array(495616)[ ... ],
//   size: 495616
// }
```

You can visualize the predictions as follows:
```js
// Visualize images
const preds = logits
  .unsqueeze_(1)
  .sigmoid_()
  .mul_(255)
  .round_()
  .to('uint8');

for (let i = 0; i < preds.dims[0]; ++i) {
  const img = RawImage.fromTensor(preds[i]);
  img.save(`prediction_${i}.png`);
}
```

| Original | `"a glass"` | `"something to fill"` | `"wood"` | `"a jar"` |
|--------|--------|--------|--------|--------|
| ![image](https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/B4wAIseP3SokRd7Flu1Y9.png) | ![prediction_0](https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/5NhLTZH2jM8k98n62dWRx.png) | ![prediction_1](https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/9PjQB8QRE8IUN1OhEK30q.png) | ![prediction_2](https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/mAuKCId-iwMbKPi6ee3Zk.png) | ![prediction_3](https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/m0HP13L2lzalLsbXtrmbc.png) |

---

Note: Having a separate repo for ONNX weights is intended to be a temporary solution until WebML gains more traction. If you would like to make your models web-ready, we recommend converting to ONNX using [🤗 Optimum](https://huggingface.co/docs/optimum/index) and structuring your repo like this one (with ONNX weights located in a subfolder named `onnx`).