YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

StructBERT: Un-Official Copy

Official Repository Link: https://github.com/alibaba/AliceMind/tree/main/StructBERT

Claimer

Reproduce HFHub models:

Download model/tokenizer vocab

wget https://raw.githubusercontent.com/alibaba/AliceMind/main/StructBERT/config/large_bert_config.json && mv large_bert_config.json config.json
wget https://raw.githubusercontent.com/alibaba/AliceMind/main/StructBERT/config/vocab.txt
wget https://alice-open.oss-cn-zhangjiakou.aliyuncs.com/StructBERT/en_model && mv en_model pytorch_model.bin
from transformers import AutoConfig, AutoModelForMaskedLM, AutoTokenizer

config = AutoConfig.from_pretrained("./config.json")
model = AutoModelForMaskedLM.from_pretrained(".", config=config)
tokenizer = AutoTokenizer.from_pretrained(".", config=config)

model.push_to_hub("structbert-large")
tokenizer.push_to_hub("structbert-large")

https://arxiv.org/abs/1908.04577

StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding

Introduction

We extend BERT to a new model, StructBERT, by incorporating language structures into pre-training. Specifically, we pre-train StructBERT with two auxiliary tasks to make the most of the sequential order of words and sentences, which leverage language structures at the word and sentence levels, respectively.

Pre-trained models

Model Description #params Download
structbert.en.large StructBERT using the BERT-large architecture 340M structbert.en.large
structroberta.en.large StructRoBERTa continue training from RoBERTa 355M Coming soon
structbert.ch.large Chinese StructBERT; BERT-large architecture 330M structbert.ch.large

Results

The results of GLUE & CLUE tasks can be reproduced using the hyperparameters listed in the following "Example usage" section.

structbert.en.large

GLUE benchmark

Model MNLI QNLIv2 QQP SST-2 MRPC
structbert.en.large 86.86% 93.04% 91.67% 93.23% 86.51%

structbert.ch.large

CLUE benchmark

Model CMNLI OCNLI TNEWS AFQMC
structbert.ch.large 84.47% 81.28% 68.67% 76.11%

Example usage

Requirements and Installation

  • PyTorch version >= 1.0.1

  • Install other libraries via

pip install -r requirements.txt
  • For faster training install NVIDIA's apex library

Finetune MNLI

python run_classifier_multi_task.py \
  --task_name MNLI \
  --do_train \
  --do_eval \
  --do_test \
  --amp_type O1 \
  --lr_decay_factor 1 \
  --dropout 0.1 \
  --do_lower_case \
  --detach_index -1 \
  --core_encoder bert \
  --data_dir path_to_glue_data \
  --vocab_file config/vocab.txt \
  --bert_config_file config/large_bert_config.json \
  --init_checkpoint path_to_pretrained_model \
  --max_seq_length 128 \
  --train_batch_size 32 \
  --learning_rate 2e-5 \
  --num_train_epochs 3 \
  --fast_train \
  --gradient_accumulation_steps 1 \
  --output_dir path_to_output_dir 

Citation

If you use our work, please cite:

@article{wang2019structbert,
  title={Structbert: Incorporating language structures into pre-training for deep language understanding},
  author={Wang, Wei and Bi, Bin and Yan, Ming and Wu, Chen and Bao, Zuyi and Xia, Jiangnan and Peng, Liwei and Si, Luo},
  journal={arXiv preprint arXiv:1908.04577},
  year={2019}
}
Downloads last month
37
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.