Model description

This model is DistilBERT with some custom layers, finetuned on classifying financial documents. The labels that the model was trained on:

  • Esg/Sustainability report
  • Annual Report
  • Quarterly Report
  • Financial Report
  • Other Document

The output are probabilities for each class. The ids of the output should be interpreted as follows:

  • 0 -> ESG/Sustainability Report
  • 1 -> Annual Report
  • 2 -> Other Document
  • 3 -> Quarterly Report
  • 4 -> Financial Report

Example use

Download model:

from huggingface_hub import from_pretrained_keras
from transformers import DistilBertTokenizer

model_name = "esg-x/distilbert-esg-documents-classifier"
tokenizer = DistilBertTokenizer.from_pretrained(model_name)
model = from_pretrained_keras(model_name)
model.compile()

Get model output:

input_text = "Your input text"
input = tokenizer(input_text,
                  return_tensors = "tf",
                  padding = "max_length",
                  max_length = 512)

output = model(input["input_ids"])

Convert output to a readable label:

import numpy as np

labels ={
    0: "ESG/Sustainability Report",
    1: "Annual Report",
    2: "Other Document",
    3: "Quarterly Report",
    4: "Financial Document"
}

def get_label(probabilities):
  return labels[np.argmax(probabilities)]

get_label(output)

Limitations

The max context size of the model is 512 tokens.

Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .