Xenova HF staff commited on
Commit
de9b81d
·
verified ·
1 Parent(s): ba69bd5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +71 -1
README.md CHANGED
@@ -7,4 +7,74 @@ tags:
7
  - text2text-generation
8
  - image-text-to-text
9
  library_name: transformers.js
10
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  - text2text-generation
8
  - image-text-to-text
9
  library_name: transformers.js
10
+ ---
11
+
12
+ https://huggingface.co/microsoft/Florence-2-large-ft with ONNX weights to be compatible with Transformers.js.
13
+
14
+ ## Usage (Transformers.js)
15
+
16
+ > [!IMPORTANT]
17
+ > NOTE: Florence-2 support is experimental and requires you to install Transformers.js [v3](https://github.com/xenova/transformers.js/tree/v3) from source.
18
+
19
+ If you haven't already, you can install the [Transformers.js](https://huggingface.co/docs/transformers.js) JavaScript library from [GitHub](https://github.com/xenova/transformers.js/tree/v3) using:
20
+ ```bash
21
+ npm install xenova/transformers.js#v3
22
+ ```
23
+
24
+ **Example:** Perform image captioning with `onnx-community/Florence-2-large-ft`.
25
+ ```js
26
+ import {
27
+ Florence2ForConditionalGeneration,
28
+ AutoProcessor,
29
+ AutoTokenizer,
30
+ RawImage,
31
+ } from '@xenova/transformers';
32
+
33
+ // Load model, processor, and tokenizer
34
+ const model_id = 'onnx-community/Florence-2-large-ft';
35
+ const model = await Florence2ForConditionalGeneration.from_pretrained(model_id, {
36
+ dtype: {
37
+ embed_tokens: 'fp16', // or 'fp32'
38
+ vision_encoder: 'fp16', // or 'fp32'
39
+ encoder_model: 'q4',
40
+ decoder_model_merged: 'q4',
41
+ },
42
+ });
43
+ const processor = await AutoProcessor.from_pretrained(model_id);
44
+ const tokenizer = await AutoTokenizer.from_pretrained(model_id);
45
+
46
+ // Load image and prepare vision inputs
47
+ const url = 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg';
48
+ const image = await RawImage.fromURL(url);
49
+ const vision_inputs = await processor(image);
50
+
51
+ // Specify task and prepare text inputs
52
+ const task = '<MORE_DETAILED_CAPTION>';
53
+ const prompts = processor.construct_prompts(task);
54
+ const text_inputs = tokenizer(prompts);
55
+
56
+ // Generate text
57
+ const generated_ids = await model.generate({
58
+ ...text_inputs,
59
+ ...vision_inputs,
60
+ max_new_tokens: 256,
61
+ });
62
+
63
+ // Decode generated text
64
+ const generated_text = tokenizer.batch_decode(generated_ids, { skip_special_tokens: false })[0];
65
+
66
+ // Post-process the generated text
67
+ const result = processor.post_process_generation(generated_text, task, image.size);
68
+ console.log(result);
69
+ // { '<MORE_DETAILED_CAPTION>': 'A car is parked on the street. The car is a light green color. The doors on the building are brown. The building is a yellow color. There are two doors on both sides of the car. The wheels on the car are very shiny. The ground is made of bricks. The sky is blue. The sun is shining on the top of the building.' }
70
+ ```
71
+
72
+ We also released an online demo, which you can try yourself: https://huggingface.co/spaces/Xenova/florence2-webgpu
73
+
74
+
75
+ <video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/BJj3jQXNqS_7Nt2MSb2ss.mp4"></video>
76
+
77
+ ---
78
+
79
+ Note: Having a separate repo for ONNX weights is intended to be a temporary solution until WebML gains more traction. If you would like to make your models web-ready, we recommend converting to ONNX using [🤗 Optimum](https://huggingface.co/docs/optimum/index) and structuring your repo like this one (with ONNX weights located in a subfolder named `onnx`).
80
+