gyrojeff commited on
Commit
0338832
·
1 Parent(s): 0d214da

doc: update readme

Browse files
Files changed (1) hide show
  1. README.md +92 -1
README.md CHANGED
@@ -7,7 +7,16 @@ sdk: docker
7
  app_port: 7860
8
  ---
9
 
10
- # YuzuMarker.FontDetection
 
 
 
 
 
 
 
 
 
11
 
12
  ## Scene Text Font Dataset Generation
13
 
@@ -147,6 +156,43 @@ The generation is CPU bound, and the generation speed is highly dependent on the
147
 
148
  Some fonts are problematic during the generation process. The script has an manual exclusion list in `config/fonts.yml` and also support unqualified font detection on the fly. The script will automatically skip the problematic fonts and log them for future model training.
149
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
150
  ## Font Classification Experiment Results
151
 
152
  On our synthesized dataset,
@@ -187,8 +233,53 @@ On our synthesized dataset,
187
  * <sup>9</sup> Data Augmentation v3: Color Jitter + Random Crop [30%-130%] + Random Gaussian Blur + Random Gaussian Noise + Random Rotation [-15°, 15°] + Random Horizontal Flip + Random Downsample [1, 2]
188
  * <sup>10</sup> Preserve Aspect Ratio by Random Cropping
189
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
190
  ## Related works and Resources
191
 
 
192
  * Font Identification and Recommendations: https://mangahelpers.com/forum/threads/font-identification-and-recommendations.35672/
193
  * Unconstrained Text Detection in Manga: a New Dataset and Baseline: https://arxiv.org/pdf/2009.04042.pdf
194
  * SwordNet: Chinese Character Font Style Recognition Network: https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9682683
 
7
  app_port: 7860
8
  ---
9
 
10
+ <div align="center">
11
+ <h1>✨YuzuMarker.FontDetection✨</h1>
12
+ <p>First-ever CJK (Chinese, Japanese, Korean) font recognition model</p>
13
+ <p>
14
+ <a href="https://huggingface.co/spaces/gyrojeff/YuzuMarker.FontDetection"><img alt="Click here for Online Demo" src="https://img.shields.io/badge/🤗-Open%20In%20Spaces%20(Online Demo)-blue.svg"/></a>
15
+ <img alt="Commit activity" src="https://img.shields.io/github/commit-activity/m/JeffersonQin/YuzuMarker.FontDetection"/>
16
+ <img alt="License" src="https://img.shields.io/github/license/JeffersonQin/YuzuMarker.FontDetection"/>
17
+ <img alt="Contributors" src="https://img.shields.io/github/contributors/JeffersonQin/YuzuMarker.FontDetection"/>
18
+ </p>
19
+ </div>
20
 
21
  ## Scene Text Font Dataset Generation
22
 
 
156
 
157
  Some fonts are problematic during the generation process. The script has an manual exclusion list in `config/fonts.yml` and also support unqualified font detection on the fly. The script will automatically skip the problematic fonts and log them for future model training.
158
 
159
+ ## Model Training
160
+
161
+ Have the dataset ready under the `dataset` directory, you can start training the model. Note that you can have more than one folder of dataset, and the script will automatically merge them as long as you provide the path to the folder by command line arguments.
162
+
163
+ ```bash
164
+ $ python train.py -h
165
+ usage: train.py [-h] [-d [DEVICES ...]] [-b SINGLE_BATCH_SIZE] [-c CHECKPOINT] [-m {resnet18,resnet34,resnet50,resnet101,deepfont}] [-p] [-i] [-a {v1,v2,v3}]
166
+ [-l LR] [-s [DATASETS ...]] [-n MODEL_NAME] [-f] [-z SIZE] [-t {medium,high,heighest}] [-r]
167
+
168
+ optional arguments:
169
+ -h, --help show this help message and exit
170
+ -d [DEVICES ...], --devices [DEVICES ...]
171
+ GPU devices to use (default: [0])
172
+ -b SINGLE_BATCH_SIZE, --single-batch-size SINGLE_BATCH_SIZE
173
+ Batch size of single device (default: 64)
174
+ -c CHECKPOINT, --checkpoint CHECKPOINT
175
+ Trainer checkpoint path (default: None)
176
+ -m {resnet18,resnet34,resnet50,resnet101,deepfont}, --model {resnet18,resnet34,resnet50,resnet101,deepfont}
177
+ Model to use (default: resnet18)
178
+ -p, --pretrained Use pretrained model for ResNet (default: False)
179
+ -i, --crop-roi-bbox Crop ROI bounding box (default: False)
180
+ -a {v1,v2,v3}, --augmentation {v1,v2,v3}
181
+ Augmentation strategy to use (default: None)
182
+ -l LR, --lr LR Learning rate (default: 0.0001)
183
+ -s [DATASETS ...], --datasets [DATASETS ...]
184
+ Datasets paths, seperated by space (default: ['./dataset/font_img'])
185
+ -n MODEL_NAME, --model-name MODEL_NAME
186
+ Model name (default: current tag)
187
+ -f, --font-classification-only
188
+ Font classification only (default: False)
189
+ -z SIZE, --size SIZE Model feature image input size (default: 512)
190
+ -t {medium,high,heighest}, --tensor-core {medium,high,heighest}
191
+ Tensor core precision (default: high)
192
+ -r, --preserve-aspect-ratio-by-random-crop
193
+ Preserve aspect ratio (default: False)
194
+ ```
195
+
196
  ## Font Classification Experiment Results
197
 
198
  On our synthesized dataset,
 
233
  * <sup>9</sup> Data Augmentation v3: Color Jitter + Random Crop [30%-130%] + Random Gaussian Blur + Random Gaussian Noise + Random Rotation [-15°, 15°] + Random Horizontal Flip + Random Downsample [1, 2]
234
  * <sup>10</sup> Preserve Aspect Ratio by Random Cropping
235
 
236
+ ## Pretrained Models
237
+
238
+ Available at: https://huggingface.co/gyrojeff/YuzuMarker.FontDetection/tree/main
239
+
240
+ Note that since I trained everything on pytorch 2.0 with `torch.compile`, if you want to use the pretrained model you would need to install pytorch 2.0 and compile it with `torch.compile` as in `demo.py`.
241
+
242
+ ## Demo Deployment
243
+
244
+ To deploy the demo, you would need either the whole font dataset under `./dataset/fonts` or a cache file indicating fonts of model called `font_demo_cache.bin`. This will be later released as resource.
245
+
246
+ To deploy, first run the following script to generate the demo font image (if you have the fonts dataset):
247
+
248
+ ```bash
249
+ python generate_font_sample_image.py
250
+ ```
251
+
252
+ then run the following script to start the demo server:
253
+
254
+ ```bash
255
+ $ python demo.py -h
256
+ usage: demo.py [-h] [-d DEVICE] [-c CHECKPOINT] [-m {resnet18,resnet34,resnet50,resnet101,deepfont}] [-f] [-z SIZE] [-s] [-p PORT] [-a ADDRESS]
257
+
258
+ optional arguments:
259
+ -h, --help show this help message and exit
260
+ -d DEVICE, --device DEVICE
261
+ GPU devices to use (default: 0), -1 for CPU
262
+ -c CHECKPOINT, --checkpoint CHECKPOINT
263
+ Trainer checkpoint path (default: None). Use link as huggingface://<user>/<repo>/<file> for huggingface.co models, currently only supports model file in the root
264
+ directory.
265
+ -m {resnet18,resnet34,resnet50,resnet101,deepfont}, --model {resnet18,resnet34,resnet50,resnet101,deepfont}
266
+ Model to use (default: resnet18)
267
+ -f, --font-classification-only
268
+ Font classification only (default: False)
269
+ -z SIZE, --size SIZE Model feature image input size (default: 512)
270
+ -s, --share Get public link via Gradio (default: False)
271
+ -p PORT, --port PORT Port to use for Gradio (default: 7860)
272
+ -a ADDRESS, --address ADDRESS
273
+ Address to use for Gradio (default: 127.0.0.1)
274
+ ```
275
+
276
+ ## Online Demo
277
+
278
+ The project is also deployed on Huggingface Space: https://huggingface.co/spaces/gyrojeff/YuzuMarker.FontDetection
279
+
280
  ## Related works and Resources
281
 
282
+ * DeepFont: Identify Your Font from An Image: https://arxiv.org/abs/1507.03196
283
  * Font Identification and Recommendations: https://mangahelpers.com/forum/threads/font-identification-and-recommendations.35672/
284
  * Unconstrained Text Detection in Manga: a New Dataset and Baseline: https://arxiv.org/pdf/2009.04042.pdf
285
  * SwordNet: Chinese Character Font Style Recognition Network: https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9682683