Configuration Parsing Warning: In UNKNOWN_FILENAME: "processor_config.chat_template" must be a string

Centurio Aya

Model Details

Model Description

  • Model type: Centurio is an open-source multilingual large vision-language model.
  • Training Data: COMING SOON
  • Languages: The model was trained with the following 100 languages: af, am, ar, ar-eg, as, azb, be, bg, bm, bn, bo, bs, ca, ceb, cs, cy, da, de, du, el, en, eo, es, et, eu, fa, fi, fr, ga, gd, gl, ha, hi, hr, ht, hu, id, ig, is, it, iw, ja, jv, ka, ki, kk, km, ko, la, lb, ln, lo, lt, lv, mi, mr, ms, mt, my, no, oc, pa, pl, pt, qu, ro, ru, sa, sc, sd, sg, sk, sl, sm, so, sq, sr, ss, sv, sw, ta, te, th, ti, tl, tn, tpi, tr, ts, tw, uk, ur, uz, vi, war, wo, xh, yo, zh, zu
  • License: This work is released under the Creative Commons Attribution Non Commercial 4.0 license.

Model Sources

Uses

Direct Use

The model can be used directly through the transformers library with our custom code.

from transformers import AutoModelForCausalLM, AutoProcessor
import timm
from PIL import Image    
import requests

url = "https://upload.wikimedia.org/wikipedia/commons/b/bd/Golden_Retriever_Dukedestiny01_drvd.jpg"
image = Image.open(requests.get(url, stream=True).raw)

model_name = "WueNLP/centurio_aya"

processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)

## Appearance of images in the prompt are indicates with '<image_placeholder>'!
prompt = "<image_placeholder>\nBriefly describe the image in German."

messages = [
    {"role": "user", "content": prompt}
]

text = processor.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    trust_remote_code=True
)

model_inputs = processor(text=[text], images=[image] return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=128
)

generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

Multiple Images

We natively support multi-image inputs. You only have to 1) include more <image_placeholder> while 2) passing all images of the entire batch as a flat list:

[...]
# Variables reused from above.

image_multi_1, image_multi_2 = [...] # prepare additional images

prompt_multi = "What is the difference between the following images?\n<image_placeholder><image_placeholder>\nAnswer in German."

messages_multi = [
    {"role": "user", "content": prompt_multi}
]

text_multi = processor.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = processor(text=[text, text_multi], images=[image, image_multi_1, image_multi_2] return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=128
)

[...]

Bias, Risks, and Limitations

  • General biases, risks, and limitations of large vision-language models like hallucinations or biases from training data apply.
  • This is a research project and not recommended for production use.
  • Multilingual: Performance and generation quality can differ widely between languages.
  • OCR: Model struggles both with small text and writing in non-Latin scripts.

Citation

BibTeX:

@article{centurio2025,
  title={TODO},
  author={TODO},
  year={2024},
  journal={arXiv preprint arXiv:TODO},
  url={TODO}
}
Downloads last month
24
Safetensors
Model size
8.48B params
Tensor type
F32
·
BF16
·
Inference Examples
Inference API (serverless) does not yet support model repos that contain custom code.

Model tree for WueNLP/centurio_aya

Finetuned
(10)
this model

Space using WueNLP/centurio_aya 1

Collection including WueNLP/centurio_aya