--- language: - code license: apache-2.0 widget: - text: public [MASK] isOdd(Integer num) {if (num % 2 == 0) {return "even";} else {return "odd";}} --- # Model Card for JavaBERT A BERT-like model pretrained on Java software code. # Model Details ## Model Description A BERT-like model pretrained on Java software code. - **Developed by:** Christian-Albrechts-University of Kiel (CAUKiel) - **Shared by [Optional]:** Hugging Face - **Model type:** Fill-Mask - **Language(s) (NLP):** en - **License:** Apache-2.0 - **Related Models:** A version of this model using an uncased tokenizer is available at [CAUKiel/JavaBERT-uncased](https://huggingface.co/CAUKiel/JavaBERT-uncased). - **Parent Model:** BERT - **Resources for more information:** - [Associated Paper](https://arxiv.org/pdf/2110.10404.pdf) # Uses ## Direct Use Fill-Mask ## Downstream Use [Optional] More information needed. ## Out-of-Scope Use The model should not be used to intentionally create hostile or alienating environments for people. # Bias, Risks, and Limitations Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups. ## Recommendations Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. { see paper= word something) # Training Details ## Training Data The model was trained on 2,998,345 Java files retrieved from open source projects on GitHub. A ```bert-base-cased``` tokenizer is used by this model. ## Training Procedure ### Training Objective A MLM (Masked Language Model) objective was used to train this model. ### Preprocessing More information needed. ### Speeds, Sizes, Times More information needed. # Evaluation ## Testing Data, Factors & Metrics ### Testing Data More information needed. ### Factors ### Metrics More information needed. ## Results More information needed. # Model Examination More information needed. # Environmental Impact Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). - **Hardware Type:** More information needed. - **Hours used:** More information needed. - **Cloud Provider:** More information needed. - **Compute Region:** More information needed. - **Carbon Emitted:** More information needed. # Technical Specifications [optional] ## Model Architecture and Objective More information needed. ## Compute Infrastructure More information needed. ### Hardware More information needed. ### Software More information needed. # Citation **BibTeX:** ``` @inproceedings{De_Sousa_Hasselbring_2021, address={Melbourne, Australia}, title={JavaBERT: Training a Transformer-Based Model for the Java Programming Language}, rights={https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html}, ISBN={9781665435833}, url={https://ieeexplore.ieee.org/document/9680322/}, DOI={10.1109/ASEW52652.2021.00028}, booktitle={2021 36th IEEE/ACM International Conference on Automated Software Engineering Workshops (ASEW)}, publisher={IEEE}, author={Tavares de Sousa, Nelson and Hasselbring, Wilhelm}, year={2021}, month=nov, pages={90–95} } ``` **APA:** More information needed. # Glossary [optional] More information needed. # More Information [optional] More information needed. # Model Card Authors [optional] Christian-Albrechts-University of Kiel (CAUKiel) in collaboration with Ezi Ozoani and the team at Hugging Face # Model Card Contact More information needed. # How to Get Started with the Model Use the code below to get started with the model.
Click to expand ```python from transformers import pipeline pipe = pipeline('fill-mask', model='CAUKiel/JavaBERT') output = pipe(CODE) # Replace with Java code; Use '[MASK]' to mask tokens/words in the code. ```