TinyClick / README.md
pawlowskipawel's picture
Update README.md
1dc3d75 verified
|
raw
history blame
3.05 kB
metadata
license: mit

arXiv MIT License


TinyClick: Single-Turn Agent for Empowering GUI Automation

The code for running the model from paper: TinyClick: Single-Turn Agent for Empowering GUI Automation

About The Project

We present a single-turn agent for graphical user interface (GUI) interaction tasks, using Vision-Language Model Florence-2-Base. Main goal of the agent is to click on desired UI element based on the screenshot and user command. It demonstrates strong performance on Screenspot and OmniAct, while maintaining a compact size of 0.27B parameters and minimal latency.

Usage

from transformers import AutoModelForCausalLM, AutoProcessor
from PIL import Image
import requests

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
processor = AutoProcessor.from_pretrained(
    "Samsung/TinyClick", trust_remote_code=True
)
model = AutoModelForCausalLM.from_pretrained(
    "Samsung/TinyClick",
    trust_remote_code=True,
).to(device)

url = "https://huggingface.co/Samsung/TinyClick/resolve/main/sample.png"
img = Image.open(requests.get(url, stream=True).raw)

command = "click on accept and continue button"
image_size = img.size

input_text = ("What to do to execute the command? " + command.strip()).lower()

inputs = processor(
    images=img,
    text=input_text,
    return_tensors="pt",
    do_resize=True,
)

outputs = model.generate(**inputs)
generated_texts = processor.batch_decode(outputs, skip_special_tokens=False)

For postprocessing fuction go to our github repository: https://github.sec.samsung.net/MLLM/tinyclick

from tinyclick_utils import prepare_inputs, postprocess

result = postprocess(generated_texts[0], image_size)

Citation

@misc{pawlowski2024tinyclicksingleturnagentempowering,
    title={TinyClick: Single-Turn Agent for Empowering GUI Automation}, 
    author={Pawel Pawlowski and Krystian Zawistowski and Wojciech Lapacz and Marcin Skorupa and Adam Wiacek and Sebastien Postansque and Jakub Hoscilowicz},
    year={2024},
    eprint={2410.11871},
    archivePrefix={arXiv},
    primaryClass={cs.HC},
    url={https://arxiv.org/abs/2410.11871}, 
}

License

Please check the MIT license that is listed in this repository. See LICENSE for more information.

(back to top)