Encoder-Decoder model with DeBERTa encoder
pre-trained models
Encoder:
microsoft/deberta-v3-small
Decoder:
deliciouscat/deberta-v3-base-decoder-v0.1
(6 transformer layers, 8 attention heads)
-> 297511524(298M) params
Data used
HuggingFaceFW/fineweb
-> sampled 124800
Training hparams
optimizer: AdamW, lr=2.3e-5, betas=(0.875, 0.997)
batch size: 12 (maximal on Colab pro A100 env)
-> training on denoising objective (BART)
How to use
from transformers import AutoTokenizer, EncoderDecoderModel
model = EncoderDecoderModel.from_pretrained("deliciouscat/deberta-v3-base-encoder-decoder-v0.2")
tokenizer = AutoTokenizer.from_pretrained("deliciouscat/deberta-v3-base-encoder-decoder-v0.2")
Future work!
train more scientific data
fine-tune on keyword extraction task
- Downloads last month
- 18
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.