claudios
/

cubert-20210711-Java-1024

Inference Endpoints

Model card Files Files and versions Community

claudios commited on May 7, 2024

Commit

2270990

·

verified ·

1 Parent(s): c75b047

Create README.md

Files changed (1) hide show

README.md +44 -0

README.md ADDED Viewed

	@@ -0,0 +1,44 @@

+---
+license: apache-2.0
+arxiv: 2001.00059
+pipeline_tag: fill-mask
+tags:
+- code
+- cubert
+---
+# CuBERT: Learning and Evaluating Contextual Embedding of Source Code
+## Overview
+This model is the unofficial HuggingFace version of "[CuBERT](https://github.com/google-research/google-research/tree/master/cubert)". In particular, this version comes from [gs://cubert/20210711_Java/pre_trained_model_epochs_2__length_1024](https://console.cloud.google.com/storage/browser/cubert/20210711_Java/pre_trained_model_epochs_2__length_1024). It was trained 2021-07-11 for 2 epochs with a 1024 token context window on the Java BigQuery dataset. I manually converted the Tensorflow checkpoint to PyTorch and have uploaded it here. The [tokenizer](https://github.com/google-research/google-research/blob/master/cubert/python_tokenizer.py) has not been converted yet. All credit goes to Aditya Kanade, Petros Maniatis, Gogul Balakrishnan, and Kensen Shi.
+The other versions are available here:
+[cubert-20210711-Python-512](https://huggingface.co/claudios/cubert-20210711-Python-512/)
+[cubert-20210711-Python-1024](https://huggingface.co/claudios/cubert-20210711-Python-1024/)
+[cubert-20210711-Python-2048](https://huggingface.co/claudios/cubert-20210711-Python-2048/)
+[cubert-20210711-Java-512](https://huggingface.co/claudios/cubert-20210711-Java-512/)
+[cubert-20210711-Java-1024](https://huggingface.co/claudios/cubert-20210711-Java-1024/)
+[cubert-20210711-Java-2048](https://huggingface.co/claudios/cubert-20210711-Java-2048/)
+Citation:
+```bibtex
+@inproceedings{cubert,
+author    = {Aditya Kanade and
+             Petros Maniatis and
+             Gogul Balakrishnan and
+             Kensen Shi},
+title     = {Learning and evaluating contextual embedding of source code},
+booktitle = {Proceedings of the 37th International Conference on Machine Learning,
+               {ICML} 2020, 12-18 July 2020},
+series    = {Proceedings of Machine Learning Research},
+publisher = {{PMLR}},
+year      = {2020},
+}
+```