T5-for-information-extraction
This is an encoder-decoder model that was trained on various information extraction tasks, including text classification, named entity recognition, relation extraction and entity linking.
How to use:
First of all, initialize the model:
from transformers import T5Tokenizer, T5ForConditionalGeneration
import torch
device = torch.device("cuda") if torch.cuda.is_available() else torch.device('cpu')
tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-large")
model = T5ForConditionalGeneration.from_pretrained("knowledgator/t5-for-ie").to(device)
You need to set a prompt and put it with text to the model, below are examples of how to use it for different tasks:
named entity recognition
input_text = "Extract entity types from the text: <e1>Kyiv</e1> is the capital of <e2>Ukraine</e2>."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(device)
outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
text classification
input_text = "Classify the following text into the most relevant categories: Kyiv is the capital of Ukraine"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(device)
outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
relation extraction
input_text = "Extract relations between entities in the text: <e1>Kyiv</e1> is the capital of <e2>Ukraine</e2>."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(device)
outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
Unlimited-classifier
With our unlimited-classifier you can use t5-for-ie
to classify text into millions of categories. It applies generation with contraints that is super helful when structured and deterministic outputs are needed.
To install it, run the following command:
pip install -U unlimited-classifier
Right now you can try it with the following example:
from unlimited_classifier import TextClassifier
labels=[
"e1 - capital of Ukraine",
"e1 - capital of Poland",
"e1 - European city",
"e1 - Asian city",
"e1 - small country"
]
classifier = TextClassifier(
labels=['default'],
model=model,
tokenizer=tokenizer,
device=device #if cuda
)
classifier.initialize_labels_trie(labels)
text = "<e1>Kyiv</e1> is the capital <e2>Ukraine</e2>."
output = classifier.invoke(text)
print(output)
Turbo T5
We recommend to use this model on GPU with our TurboT5 package, it uses custom CUDA kernels that accelerate computations and allows much longer sequences.
First of all, you need to install the package
pip install turbot5 -U
Then you can import different heads for various purposes; we released more encoder heads for tasks such as token classification, question-answering or text classification and, of course, encoder-decoder heads for conditional generation:
from turbot5 import T5ForConditionalGeneration
from turbot5 import T5Config
from transformers import T5Tokenizer
import torch
tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-large")
model = T5ForConditionalGeneration.from_pretrained("knowledgator/t5-for-ie",
attention_type = 'flash', #put attention type you want to use
use_triton=True).to('cuda')
Feedback
We value your input! Share your feedback and suggestions to help us improve our models. Fill out the feedback form
Join Our Discord
Connect with our community on Discord for news, support, and discussion about our models. Join Discord
- Downloads last month
- 23