pawlowskipawel commited on
Commit
1dc3d75
·
verified ·
1 Parent(s): 1605104

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +100 -3
README.md CHANGED
@@ -1,3 +1,100 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+
5
+ <a id="readme-top"></a>
6
+
7
+ [![arXiv][paper-shield]][paper-url]
8
+ [![MIT License][license-shield]][license-url]
9
+
10
+ <!-- PROJECT LOGO -->
11
+ <br />
12
+ <div align="center">
13
+ <!-- <a href="https://github.com/othneildrew/Best-README-Template">
14
+ <img src="images/logo.png" alt="Logo" width="80" height="80">
15
+ </a> -->
16
+ <h3 align="center">TinyClick: Single-Turn Agent for Empowering GUI Automation</h3>
17
+ <p align="center">
18
+ The code for running the model from paper: TinyClick: Single-Turn Agent for Empowering GUI Automation
19
+ </p>
20
+ </div>
21
+
22
+
23
+ <!-- ABOUT THE PROJECT -->
24
+ ## About The Project
25
+
26
+ We present a single-turn agent for graphical user interface (GUI) interaction tasks, using Vision-Language Model Florence-2-Base. Main goal of the agent is to click on desired UI element based on the screenshot and user command. It demonstrates strong performance on Screenspot and OmniAct, while maintaining a compact size of 0.27B parameters and minimal latency.
27
+
28
+
29
+ <!-- USAGE EXAMPLES -->
30
+ ## Usage
31
+
32
+ ```python
33
+ from transformers import AutoModelForCausalLM, AutoProcessor
34
+ from PIL import Image
35
+ import requests
36
+
37
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
38
+ processor = AutoProcessor.from_pretrained(
39
+ "Samsung/TinyClick", trust_remote_code=True
40
+ )
41
+ model = AutoModelForCausalLM.from_pretrained(
42
+ "Samsung/TinyClick",
43
+ trust_remote_code=True,
44
+ ).to(device)
45
+
46
+ url = "https://huggingface.co/Samsung/TinyClick/resolve/main/sample.png"
47
+ img = Image.open(requests.get(url, stream=True).raw)
48
+
49
+ command = "click on accept and continue button"
50
+ image_size = img.size
51
+
52
+ input_text = ("What to do to execute the command? " + command.strip()).lower()
53
+
54
+ inputs = processor(
55
+ images=img,
56
+ text=input_text,
57
+ return_tensors="pt",
58
+ do_resize=True,
59
+ )
60
+
61
+ outputs = model.generate(**inputs)
62
+ generated_texts = processor.batch_decode(outputs, skip_special_tokens=False)
63
+ ```
64
+
65
+ For postprocessing fuction go to our github repository: https://github.sec.samsung.net/MLLM/tinyclick
66
+ ```python
67
+ from tinyclick_utils import prepare_inputs, postprocess
68
+
69
+ result = postprocess(generated_texts[0], image_size)
70
+ ```
71
+
72
+ <!-- CITATION -->
73
+ ## Citation
74
+
75
+ ```
76
+ @misc{pawlowski2024tinyclicksingleturnagentempowering,
77
+ title={TinyClick: Single-Turn Agent for Empowering GUI Automation},
78
+ author={Pawel Pawlowski and Krystian Zawistowski and Wojciech Lapacz and Marcin Skorupa and Adam Wiacek and Sebastien Postansque and Jakub Hoscilowicz},
79
+ year={2024},
80
+ eprint={2410.11871},
81
+ archivePrefix={arXiv},
82
+ primaryClass={cs.HC},
83
+ url={https://arxiv.org/abs/2410.11871},
84
+ }
85
+ ```
86
+
87
+
88
+ <!-- LICENSE -->
89
+ ## License
90
+
91
+ Please check the MIT license that is listed in this repository. See `LICENSE` for more information.
92
+
93
+ <p align="right">(<a href="#readme-top">back to top</a>)</p>
94
+
95
+
96
+ <!-- MARKDOWN LINKS & IMAGES -->
97
+ [paper-shield]: https://img.shields.io/badge/2024-arXiv-red
98
+ [paper-url]: https://img.shields.io/badge/2024-arXiv-red
99
+ [license-shield]: https://img.shields.io/badge/License-MIT-yellow.svg
100
+ [license-url]: https://opensource.org/licenses/MIT