EmojiLM

This is a T5 model pre-trained on the Text2Emoji dataset to translate setences into series of emojis.

For instance, "I love pizza" will be translated into "๐Ÿ•๐Ÿ˜".

An example implementation for translation:

from transformers import T5Tokenizer, T5ForConditionalGeneration

path = "KomeijiForce/flan-t5-xl-emojilm"
tokenizer = T5Tokenizer.from_pretrained(path)
generator = T5ForConditionalGeneration.from_pretrained(path)

prefix = "translate into emojis:"
sentence = "I love the weather in Alaska!"
inputs = tokenizer(prefix+" "+sentence, return_tensors="pt")
generated_ids = generator.generate(inputs["input_ids"], num_beams=4, do_sample=True, max_length=100)
decoded = tokenizer.decode(generated_ids[0], skip_special_tokens=True).replace(" ", "")
print(decoded)

You will probably get some output like "โ„๏ธ๐Ÿ”๏ธโค๏ธ".

If you find this model & dataset resource useful, please consider cite our paper:

@article{DBLP:journals/corr/abs-2311-01751,
  author       = {Letian Peng and
                  Zilong Wang and
                  Hang Liu and
                  Zihan Wang and
                  Jingbo Shang},
  title        = {EmojiLM: Modeling the New Emoji Language},
  journal      = {CoRR},
  volume       = {abs/2311.01751},
  year         = {2023},
  url          = {https://doi.org/10.48550/arXiv.2311.01751},
  doi          = {10.48550/ARXIV.2311.01751},
  eprinttype    = {arXiv},
  eprint       = {2311.01751},
  timestamp    = {Tue, 07 Nov 2023 18:17:14 +0100},
  biburl       = {https://dblp.org/rec/journals/corr/abs-2311-01751.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}
Downloads last month
5
Safetensors
Model size
2.86B params
Tensor type
F32
ยท
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train KomeijiForce/flan-t5-xl-emojilm