doc: update readme
Browse files
README.md
CHANGED
@@ -7,7 +7,16 @@ sdk: docker
|
|
7 |
app_port: 7860
|
8 |
---
|
9 |
|
10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
11 |
|
12 |
## Scene Text Font Dataset Generation
|
13 |
|
@@ -147,6 +156,43 @@ The generation is CPU bound, and the generation speed is highly dependent on the
|
|
147 |
|
148 |
Some fonts are problematic during the generation process. The script has an manual exclusion list in `config/fonts.yml` and also support unqualified font detection on the fly. The script will automatically skip the problematic fonts and log them for future model training.
|
149 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
150 |
## Font Classification Experiment Results
|
151 |
|
152 |
On our synthesized dataset,
|
@@ -187,8 +233,53 @@ On our synthesized dataset,
|
|
187 |
* <sup>9</sup> Data Augmentation v3: Color Jitter + Random Crop [30%-130%] + Random Gaussian Blur + Random Gaussian Noise + Random Rotation [-15°, 15°] + Random Horizontal Flip + Random Downsample [1, 2]
|
188 |
* <sup>10</sup> Preserve Aspect Ratio by Random Cropping
|
189 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
190 |
## Related works and Resources
|
191 |
|
|
|
192 |
* Font Identification and Recommendations: https://mangahelpers.com/forum/threads/font-identification-and-recommendations.35672/
|
193 |
* Unconstrained Text Detection in Manga: a New Dataset and Baseline: https://arxiv.org/pdf/2009.04042.pdf
|
194 |
* SwordNet: Chinese Character Font Style Recognition Network: https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9682683
|
|
|
7 |
app_port: 7860
|
8 |
---
|
9 |
|
10 |
+
<div align="center">
|
11 |
+
<h1>✨YuzuMarker.FontDetection✨</h1>
|
12 |
+
<p>First-ever CJK (Chinese, Japanese, Korean) font recognition model</p>
|
13 |
+
<p>
|
14 |
+
<a href="https://huggingface.co/spaces/gyrojeff/YuzuMarker.FontDetection"><img alt="Click here for Online Demo" src="https://img.shields.io/badge/🤗-Open%20In%20Spaces%20(Online Demo)-blue.svg"/></a>
|
15 |
+
<img alt="Commit activity" src="https://img.shields.io/github/commit-activity/m/JeffersonQin/YuzuMarker.FontDetection"/>
|
16 |
+
<img alt="License" src="https://img.shields.io/github/license/JeffersonQin/YuzuMarker.FontDetection"/>
|
17 |
+
<img alt="Contributors" src="https://img.shields.io/github/contributors/JeffersonQin/YuzuMarker.FontDetection"/>
|
18 |
+
</p>
|
19 |
+
</div>
|
20 |
|
21 |
## Scene Text Font Dataset Generation
|
22 |
|
|
|
156 |
|
157 |
Some fonts are problematic during the generation process. The script has an manual exclusion list in `config/fonts.yml` and also support unqualified font detection on the fly. The script will automatically skip the problematic fonts and log them for future model training.
|
158 |
|
159 |
+
## Model Training
|
160 |
+
|
161 |
+
Have the dataset ready under the `dataset` directory, you can start training the model. Note that you can have more than one folder of dataset, and the script will automatically merge them as long as you provide the path to the folder by command line arguments.
|
162 |
+
|
163 |
+
```bash
|
164 |
+
$ python train.py -h
|
165 |
+
usage: train.py [-h] [-d [DEVICES ...]] [-b SINGLE_BATCH_SIZE] [-c CHECKPOINT] [-m {resnet18,resnet34,resnet50,resnet101,deepfont}] [-p] [-i] [-a {v1,v2,v3}]
|
166 |
+
[-l LR] [-s [DATASETS ...]] [-n MODEL_NAME] [-f] [-z SIZE] [-t {medium,high,heighest}] [-r]
|
167 |
+
|
168 |
+
optional arguments:
|
169 |
+
-h, --help show this help message and exit
|
170 |
+
-d [DEVICES ...], --devices [DEVICES ...]
|
171 |
+
GPU devices to use (default: [0])
|
172 |
+
-b SINGLE_BATCH_SIZE, --single-batch-size SINGLE_BATCH_SIZE
|
173 |
+
Batch size of single device (default: 64)
|
174 |
+
-c CHECKPOINT, --checkpoint CHECKPOINT
|
175 |
+
Trainer checkpoint path (default: None)
|
176 |
+
-m {resnet18,resnet34,resnet50,resnet101,deepfont}, --model {resnet18,resnet34,resnet50,resnet101,deepfont}
|
177 |
+
Model to use (default: resnet18)
|
178 |
+
-p, --pretrained Use pretrained model for ResNet (default: False)
|
179 |
+
-i, --crop-roi-bbox Crop ROI bounding box (default: False)
|
180 |
+
-a {v1,v2,v3}, --augmentation {v1,v2,v3}
|
181 |
+
Augmentation strategy to use (default: None)
|
182 |
+
-l LR, --lr LR Learning rate (default: 0.0001)
|
183 |
+
-s [DATASETS ...], --datasets [DATASETS ...]
|
184 |
+
Datasets paths, seperated by space (default: ['./dataset/font_img'])
|
185 |
+
-n MODEL_NAME, --model-name MODEL_NAME
|
186 |
+
Model name (default: current tag)
|
187 |
+
-f, --font-classification-only
|
188 |
+
Font classification only (default: False)
|
189 |
+
-z SIZE, --size SIZE Model feature image input size (default: 512)
|
190 |
+
-t {medium,high,heighest}, --tensor-core {medium,high,heighest}
|
191 |
+
Tensor core precision (default: high)
|
192 |
+
-r, --preserve-aspect-ratio-by-random-crop
|
193 |
+
Preserve aspect ratio (default: False)
|
194 |
+
```
|
195 |
+
|
196 |
## Font Classification Experiment Results
|
197 |
|
198 |
On our synthesized dataset,
|
|
|
233 |
* <sup>9</sup> Data Augmentation v3: Color Jitter + Random Crop [30%-130%] + Random Gaussian Blur + Random Gaussian Noise + Random Rotation [-15°, 15°] + Random Horizontal Flip + Random Downsample [1, 2]
|
234 |
* <sup>10</sup> Preserve Aspect Ratio by Random Cropping
|
235 |
|
236 |
+
## Pretrained Models
|
237 |
+
|
238 |
+
Available at: https://huggingface.co/gyrojeff/YuzuMarker.FontDetection/tree/main
|
239 |
+
|
240 |
+
Note that since I trained everything on pytorch 2.0 with `torch.compile`, if you want to use the pretrained model you would need to install pytorch 2.0 and compile it with `torch.compile` as in `demo.py`.
|
241 |
+
|
242 |
+
## Demo Deployment
|
243 |
+
|
244 |
+
To deploy the demo, you would need either the whole font dataset under `./dataset/fonts` or a cache file indicating fonts of model called `font_demo_cache.bin`. This will be later released as resource.
|
245 |
+
|
246 |
+
To deploy, first run the following script to generate the demo font image (if you have the fonts dataset):
|
247 |
+
|
248 |
+
```bash
|
249 |
+
python generate_font_sample_image.py
|
250 |
+
```
|
251 |
+
|
252 |
+
then run the following script to start the demo server:
|
253 |
+
|
254 |
+
```bash
|
255 |
+
$ python demo.py -h
|
256 |
+
usage: demo.py [-h] [-d DEVICE] [-c CHECKPOINT] [-m {resnet18,resnet34,resnet50,resnet101,deepfont}] [-f] [-z SIZE] [-s] [-p PORT] [-a ADDRESS]
|
257 |
+
|
258 |
+
optional arguments:
|
259 |
+
-h, --help show this help message and exit
|
260 |
+
-d DEVICE, --device DEVICE
|
261 |
+
GPU devices to use (default: 0), -1 for CPU
|
262 |
+
-c CHECKPOINT, --checkpoint CHECKPOINT
|
263 |
+
Trainer checkpoint path (default: None). Use link as huggingface://<user>/<repo>/<file> for huggingface.co models, currently only supports model file in the root
|
264 |
+
directory.
|
265 |
+
-m {resnet18,resnet34,resnet50,resnet101,deepfont}, --model {resnet18,resnet34,resnet50,resnet101,deepfont}
|
266 |
+
Model to use (default: resnet18)
|
267 |
+
-f, --font-classification-only
|
268 |
+
Font classification only (default: False)
|
269 |
+
-z SIZE, --size SIZE Model feature image input size (default: 512)
|
270 |
+
-s, --share Get public link via Gradio (default: False)
|
271 |
+
-p PORT, --port PORT Port to use for Gradio (default: 7860)
|
272 |
+
-a ADDRESS, --address ADDRESS
|
273 |
+
Address to use for Gradio (default: 127.0.0.1)
|
274 |
+
```
|
275 |
+
|
276 |
+
## Online Demo
|
277 |
+
|
278 |
+
The project is also deployed on Huggingface Space: https://huggingface.co/spaces/gyrojeff/YuzuMarker.FontDetection
|
279 |
+
|
280 |
## Related works and Resources
|
281 |
|
282 |
+
* DeepFont: Identify Your Font from An Image: https://arxiv.org/abs/1507.03196
|
283 |
* Font Identification and Recommendations: https://mangahelpers.com/forum/threads/font-identification-and-recommendations.35672/
|
284 |
* Unconstrained Text Detection in Manga: a New Dataset and Baseline: https://arxiv.org/pdf/2009.04042.pdf
|
285 |
* SwordNet: Chinese Character Font Style Recognition Network: https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9682683
|