File size: 1,826 Bytes
ab5cd79 877ca03 ab5cd79 877ca03 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
---
license: apache-2.0
language:
- ko
metrics:
- cer
- wer
pipeline_tag: image-to-text
---
# trOCR-youtube-kor-OCR
fine-tuned for VisionEncoderDecoderModel(encoder , decoder)
encoder = 'facebook/deit-base-distilled-patch16-384'
decoder = 'klue/roberta-base'
## How to Get Started with the Model
```python
from transformers import VisionEncoderDecoderModel,AutoTokenizer, TrOCRProcessor
import torch
from PIL import Image
device = torch.device('cuda') # change 'cuda' if you need.
image_path='(your image path)'
image = Image.open(image_path)
#model can be .jpg or .png
#hugging face download: https://huggingface.co/gg4ever/trOCR-final
processor = TrOCRProcessor.from_pretrained("microsoft/trocr-base-handwritten")
trocr_model = "gg4ever/trOCR-youtube-kor-OCR"
model = VisionEncoderDecoderModel.from_pretrained(trocr_model).to(device)
tokenizer = AutoTokenizer.from_pretrained(trocr_model)
pixel_values = (processor(image, return_tensors="pt").pixel_values).to(device)
generated_ids = model.generate(pixel_values)
generated_text = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(generated_text)
```
## Training Details
### Training Data
100k words generated by TextRecognitionDataGenerator(trdg) : https://github.com/Belval/TextRecognitionDataGenerator/blob/master/trdg/run.py
120k words from AI-hub OCR words dataset : https://aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&dataSetSn=81
### Training Hyperparameters
training_args = Seq2SeqTrainingArguments(
predict_with_generate=True,
evaluation_strategy="steps",
per_device_train_batch_size=32,
per_device_eval_batch_size=32,
num_train_epochs=2,
fp16=True,
learning_rate=4e-5,
output_dir="./models",
save_steps=2000,
eval_steps=1000,
warmup_steps=2000,
weight_decay=0.01
) |