Samsung
/

TinyClick

Model card Files Files and versions Community

TinyClick / README.md

pawlowskipawel's picture

Update README.md

a77fadb verified 3 months ago

|

history blame contribute delete

3.28 kB

	---
	license: mit
	base_model: microsoft/Florence-2-base
	---

	<a id="readme-top"></a>

	[![arXiv][paper-shield]][paper-url]
	[![MIT License][license-shield]][license-url]

	<!-- PROJECT LOGO -->
	<br />
	<div align="center">
	<!-- <a href="https://github.com/othneildrew/Best-README-Template">
	<img src="images/logo.png" alt="Logo" width="80" height="80">
	</a> -->
	<h3 align="center">TinyClick: Single-Turn Agent for Empowering GUI Automation</h3>
	<p align="center">
	The code for running the model from paper: TinyClick: Single-Turn Agent for Empowering GUI Automation
	</p>
	</div>


	<!-- ABOUT THE PROJECT -->
	## About The Project

	We present a single-turn agent for graphical user interface (GUI) interaction tasks, using Vision-Language Model Florence-2-Base. Main goal of the agent is to click on desired UI element based on the screenshot and user command. It demonstrates strong performance on Screenspot and OmniAct, while maintaining a compact size of 0.27B parameters and minimal latency.


	<!-- USAGE EXAMPLES -->
	## Usage
	To set up the environment for running the code, please refer to the [GitHub repository](https://github.com/SamsungLabs/TinyClick). All necessary libraries and dependencies are listed in the requirements.txt file

	```python
	from transformers import AutoModelForCausalLM, AutoProcessor
	from PIL import Image
	import requests
	import torch

	device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
	processor = AutoProcessor.from_pretrained(
	"Samsung/TinyClick", trust_remote_code=True
	)
	model = AutoModelForCausalLM.from_pretrained(
	"Samsung/TinyClick",
	trust_remote_code=True,
	).to(device)

	url = "https://huggingface.co/Samsung/TinyClick/resolve/main/sample.png"
	img = Image.open(requests.get(url, stream=True).raw)

	command = "click on accept and continue button"
	image_size = img.size

	input_text = ("What to do to execute the command? " + command.strip()).lower()

	inputs = processor(
	images=img,
	text=input_text,
	return_tensors="pt",
	do_resize=True,
	)

	outputs = model.generate(**inputs)
	generated_texts = processor.batch_decode(outputs, skip_special_tokens=False)
	```

	For postprocessing fuction go to our github repository: https://github.com/SamsungLabs/TinyClick
	```python
	from tinyclick_utils import postprocess

	result = postprocess(generated_texts[0], image_size)
	```

	<!-- CITATION -->
	## Citation

	```
	@misc{pawlowski2024tinyclicksingleturnagentempowering,
	title={TinyClick: Single-Turn Agent for Empowering GUI Automation},
	author={Pawel Pawlowski and Krystian Zawistowski and Wojciech Lapacz and Marcin Skorupa and Adam Wiacek and Sebastien Postansque and Jakub Hoscilowicz},
	year={2024},
	eprint={2410.11871},
	archivePrefix={arXiv},
	primaryClass={cs.HC},
	url={https://arxiv.org/abs/2410.11871},
	}
	```


	<!-- LICENSE -->
	## License

	Please check the MIT license that is listed in this repository. See `LICENSE` for more information.

	<p align="right">(<a href="#readme-top">back to top</a>)</p>


	<!-- MARKDOWN LINKS & IMAGES -->
	[paper-shield]: https://img.shields.io/badge/2024-arXiv-red
	[paper-url]: https://arxiv.org/abs/2410.11871
	[license-shield]: https://img.shields.io/badge/License-MIT-yellow.svg
	[license-url]: https://opensource.org/licenses/MIT