About
This repository is a boilerplate to push a mask-filling model to the HuggingFace Model Hub.
Upload to huggingface
Download your tokenizer, model checkpoints, and optionally the training logs (events.out.*
) to the ./ckpt
directory (do not include any large files except pytorch_model.bin
and log files events.out.*
).
Optionally, test model using the MLM task:
pip install pya0 # for math token preprocessing
# testing local checkpoints:
python test.py ./ckpt/math-tokenizer ./ckpt/2-2-0/encoder.ckpt
# testing Model Hub checkpoints:
python test.py approach0/coco-mae-220 approach0/coco-mae-220
Note
Modify the test examples intest.txt
to play with it. The test file is tab-separated, the first column is additional positions you want to mask for the right-side sentence (useful for masking tokens in math markups). A zero means no additional mask positions.
To upload to huggingface, use the upload2hgf.sh
script.
Before runnig this script, be sure to check:
git-lfs
is installed- having git-remote named
hgf
reference tohttps://huggingface.co/your/repo
- model contains all the files needed:
config.json
andpytorch_model.bin
- tokenizer contains all the files needed:
added_tokens.json
,special_tokens_map.json
,tokenizer_config.json
,vocab.txt
andtokenizer.json
- no
tokenizer_file
field intokenizer_config.json
(sometimes it is located locally at~/.cache
)
- Downloads last month
- 103
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.