--- datasets: - HuggingFaceFW/fineweb language: - en --- # Encoder-Decoder model with DeBERTa encoder ## pre-trained models - Encoder: `microsoft/deberta-v3-small` - Decoder: `deliciouscat/deberta-v3-base-decoder-v0.1` (6 transformer layers, 8 attention heads) -> 297511524(298M) params ## Data used `HuggingFaceFW/fineweb` -> sampled 124800 ## Training hparams - optimizer: AdamW, lr=2.3e-5, betas=(0.875, 0.997) - batch size: 12 (maximal on Colab pro A100 env) -> training on denoising objective (BART) ## How to use ``` from transformers import AutoTokenizer, EncoderDecoderModel model = EncoderDecoderModel.from_pretrained("deliciouscat/deberta-v3-base-encoder-decoder-v0.2") tokenizer = AutoTokenizer.from_pretrained("deliciouscat/deberta-v3-base-encoder-decoder-v0.2") ``` ## Future work! - train more scientific data - fine-tune on keyword extraction task