|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- yuntian-deng/im2latex-100k |
|
metrics: |
|
- bleu |
|
- cer |
|
pipeline_tag: image-to-text |
|
tags: |
|
- vision |
|
- nougat |
|
base_model: |
|
- facebook/nougat-small |
|
--- |
|
# Nougat for formula |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
We performed fune-tuning on [small-sized Nougat model](https://huggingface.co/facebook/nougat-small) using data |
|
from [IM2LATEX-100K](https://www.kaggle.com/datasets/shahrukhkhan/im2latex100k) to make it especially powerful in |
|
identifying formula from images. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
|
Nougat for formula is good at identifying formula from images. It takes images with white backgroud and formula written in |
|
black as input and return with accurate Latex code for the formula. |
|
|
|
The Naugat model (Neural Optical Understanding for Academic Documents) was proposed by Meta AI in August 2023 as |
|
a visual Transformer model for processing scientific documents. It can convert PDF format documents into Markup language, |
|
especially with good recognition ability for mathematical expressions and tables.The goal of this model is to improve the accessibility |
|
of scientific knowledge by bridging human readable documents with machine readable text. |
|
|
|
|
|
|
|
- **Model type:** Vision Encoder Decoder |
|
- **Finetuned from model:** [Nougat model, small-sized version](https://huggingface.co/facebook/nougat-small) |
|
|
|
|
|
## Uses |
|
|
|
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. --> |
|
|
|
Nougat for formula can be used as a tool for converting complicated formula to Latex code. It has potential to be |
|
a good substitute for other tools. |
|
|
|
For example, when you are taking notes and tired at coding long Latex/Markdown formula code, just make a screen shot |
|
of them and put it into Nougat for formula. Then you can get the exact code for the formula as long as it won't exceed |
|
the max length of the model you use. |
|
|
|
You can also continue fine-tuning the model to make it more powerful in identifying formulas from certain subjects. |
|
|
|
Nougat for formula may be useful when developing tools or apps aiming at generating Latex code. |
|
|
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
Demo below shows how to input an image into the model and generate Latex/Markdown formula code. |
|
|
|
``` python |
|
from transformers import NougatProcessor, VisionEncoderDecoderModel |
|
from PIL import Image |
|
|
|
max_length = 100 # defing max length of output |
|
processor = NougatProcessor.from_pretrained(r".", max_length = max_length) # Replace with your path |
|
model = VisionEncoderDecoderModel.from_pretrained(r".") # Replace with your path |
|
|
|
image = Image.open(r"image_path") # Replace with your path |
|
image = processor(image, return_tensors="pt").pixel_values # The processor will resize the image according to our model |
|
|
|
result_tensor = model.generate( |
|
image, |
|
max_length=max_length, |
|
bad_words_ids=[[processor.tokenizer.unk_token_id]] |
|
) # generate id tensor |
|
|
|
result = processor.batch_decode(result_tensor, skip_special_tokens=True) # Using the processor to decode the result |
|
result = processor.post_process_generation(result, fix_markdown=False) |
|
|
|
print(*result) |
|
``` |
|
|
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. --> |
|
|
|
[IM2LATEX-100K](https://www.kaggle.com/datasets/shahrukhkhan/im2latex100k) |
|
|
|
|
|
#### Preprocessing |
|
|
|
The preprocessing of X(image) has been showed in the short demo above. |
|
|
|
The preprocessing of Y(formula) is done by: |
|
|
|
1. Remove the space in the formula string. |
|
2. Using `processor` to tokenize the string. |
|
|
|
|
|
#### Training Hyperparameters |
|
|
|
- **Training regime:** `torch.optim.AdamW(model.parameters(), lr=1e-4)` <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision --> |
|
|
|
|
|
## Evaluation |
|
|
|
<!-- This section describes the evaluation protocols and provides the results. --> |
|
|
|
### Testing Data, Factors & Metrics |
|
|
|
#### Testing Data |
|
|
|
<!-- This should link to a Dataset Card if possible. --> |
|
The tesing data is also taken from [IM2LATEX-100K](https://www.kaggle.com/datasets/shahrukhkhan/im2latex100k). |
|
Note that the train, validation and test data has been well split before downloading. |
|
|
|
|
|
#### Metrics |
|
|
|
<!-- These are the evaluation metrics being used, ideally with a description of why. --> |
|
|
|
BLEU and CER. |
|
|
|
### Results |
|
|
|
The BLEU is 0.8157 and CER is 0.1601 on test data. |