OS Error - No pytorch_model.bin
Thanks for uploading this.
Unfortunately, I am getting the following error. I can see there is a model_final.pth but it doesn't seem to work with AutoModel.
OSError: HYPJUDY/layoutlmv3-base-finetuned-publaynet does not appear to have a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack.
I have the same problem, also tried to download the model and load it locally but didn't help
Sorry for the confusion. I uploaded these models to support the usage in https://github.com/microsoft/unilm/tree/master/layoutlmv3#document-layout-analysis-on-publaynet, which may not be compatible with AutoModel.
@HYPJUDY Could you please give some example of how to use "HYPJUDY/layoutlmv3-base-finetuned-publaynet" pre-trained model for taking inference i.e., detect layouts from custom document's image as it is not compatible with AutoModel I cannot use this this.
I was trying below code for using this pre-trained model, but getting some error.
from unilm.layoutlmv3.layoutlmft.models.layoutlmv3 import LayoutLMv3Model
from unilm.layoutlmv3.examples.object_detection.ditod.config import add_vit_config
import torch
from detectron2.config import CfgNode as CN
from detectron2.config import get_cfg
from detectron2.utils.visualizer import ColorMode, Visualizer
from detectron2.data import MetadataCatalog
from detectron2.engine import DefaultPredictor
Step 1: instantiate config
cfg = get_cfg()
add_vit_config(cfg)
cfg.merge_from_file("/content/unilm/layoutlmv3/examples/object_detection/cascade_layoutlmv3.yaml")
Step 2: add model weights URL to config
cfg.MODEL.WEIGHTS = "./content/drive/MyDrive/model_final.pth"
Step 3: set device
cfg.MODEL.DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
Step 4: define model
predictor = DefaultPredictor(cfg)
-> This step is giving error.....
ValueError Traceback (most recent call last)
in
3
4 # Step 4: define model
----> 5 predictor = DefaultPredictor(cfg)
8 frames
/usr/local/lib/python3.8/dist-packages/detectron2/engine/defaults.py in init(self, cfg)
280 def init(self, cfg):
281 self.cfg = cfg.clone() # cfg can be modified by model
--> 282 self.model = build_model(self.cfg)
283 self.model.eval()
284 if len(cfg.DATASETS.TEST):
/usr/local/lib/python3.8/dist-packages/detectron2/modeling/meta_arch/build.py in build_model(cfg)
20 """
21 meta_arch = cfg.MODEL.META_ARCHITECTURE
---> 22 model = META_ARCH_REGISTRY.get(meta_arch)(cfg)
23 model.to(torch.device(cfg.MODEL.DEVICE))
24 _log_api_usage("modeling.meta_arch." + meta_arch)
/usr/local/lib/python3.8/dist-packages/detectron2/config/config.py in wrapped(self, *args, **kwargs)
187
188 if _called_with_cfg(*args, **kwargs):
--> 189 explicit_args = _get_args_from_config(from_config_func, *args, **kwargs)
190 init_func(self, **explicit_args)
191 else:
/usr/local/lib/python3.8/dist-packages/detectron2/config/config.py in _get_args_from_config(from_config_func, *args, **kwargs)
243 if name not in supported_arg_names:
244 extra_kwargs[name] = kwargs.pop(name)
--> 245 ret = from_config_func(*args, **kwargs)
246 # forward the other arguments to init
247 ret.update(extra_kwargs)
/usr/local/lib/python3.8/dist-packages/detectron2/modeling/meta_arch/rcnn.py in from_config(cls, cfg)
70 @classmethod
71 def from_config(cls, cfg):
---> 72 backbone = build_backbone(cfg)
73 return {
74 "backbone": backbone,
/usr/local/lib/python3.8/dist-packages/detectron2/modeling/backbone/build.py in build_backbone(cfg, input_shape)
29
30 backbone_name = cfg.MODEL.BACKBONE.NAME
---> 31 backbone = BACKBONE_REGISTRY.get(backbone_name)(cfg, input_shape)
32 assert isinstance(backbone, Backbone)
33 return backbone
/content/unilm/dit/object_detection/ditod/backbone.py in build_vit_fpn_backbone(cfg, input_shape)
143 backbone (Backbone): backbone module, must be a subclass of :class:Backbone
.
144 """
--> 145 bottom_up = build_VIT_backbone(cfg)
146 in_features = cfg.MODEL.FPN.IN_FEATURES
147 out_channels = cfg.MODEL.FPN.OUT_CHANNELS
/content/unilm/dit/object_detection/ditod/backbone.py in build_VIT_backbone(cfg)
129 model_kwargs = eval(str(cfg.MODEL.VIT.MODEL_KWARGS).replace("`", ""))
130
--> 131 return VIT_Backbone(name, out_features, drop_path, img_size, pos_type, model_kwargs)
132
133
/content/unilm/dit/object_detection/ditod/backbone.py in init(self, name, out_features, drop_path, img_size, pos_type, model_kwargs)
61 self._out_feature_channels = {"layer7": 1024, "layer11": 1024, "layer15": 1024, "layer23": 1024}
62 else:
---> 63 raise ValueError("Unsupported VIT name yet.")
64
65 if 'beit' in name or 'dit' in name:
ValueError: Unsupported VIT name yet.
Could you please help me to resolve this.
I made it work by combing content found from another example here: https://huggingface.co/spaces/nielsr/dit-document-layout-analysis/blob/main/app.py
You will need to adjust some code/setting in the example as it was about Dit and you want to use LayoutLMv3 here. But both are for document layout analysis tasks so the steps are the same and you only need to swap in parts from LayoutLM.
@superbean
Could you please provide code how u changed the code to work with this layoutLM model? I am struggling with this and it would be very helpful!
@dmacko232
, you can probably have a look at this example: https://github.com/microsoft/unilm/blob/master/dit/object_detection/inference.py
I did pretty much the same thing.
And you will need ALL layoutlmv3 (model, configuration, tokenizer and more) implementation from the official unilm repo here (https://github.com/microsoft/unilm/tree/master/layoutlmv3/layoutlmft/models/layoutlmv3), and NOT the one from latest huggingface transformers library. (Because the implementation is slightly different even though they do overlap completely with the class names, etc.)
Downloading the repo and changing the name of the file worked for me:
# Load the model from HuggingFace
!git lfs install
!git clone https://huggingface.co/HYPJUDY/layoutlmv3-base-finetuned-publaynet
!mv layoutlmv3-base-finetuned-publaynet/model_final.pth layoutlmv3-base-finetuned-publaynet/pytorch_model.bin
And for processing, I used the regular layoutmv3 processorprocessor = AutoProcessor.from_pretrained("microsoft/layoutlmv3-base", apply_ocr=True)
@superbean
Could you please provide code how u changed the code to work with this layoutLM model? I am struggling with this and it would be very helpful!
@superbean @jiboncom @DGHOSH @HYPJUDY , can anyone give complete code to work with this, taking image as input and getting labels as output with bbox info. or something.
@MLLife
as
@superbean
said you can use the script inference.py
at the link https://github.com/microsoft/unilm/blob/master/dit/object_detection/inference.py and substitute the paths to the correct config and model weights. You DON'T need to change the script if you only want to visualize the results. You will have to modify the script if you need to save the bounding boxes, labels, and scores somewhere. All the results can be found in the output
object at line 67 of the script linked above. You must add some lines to save them somewhere. Unfortunately, classes are represented with integers, not text labels, so if you know the mapping please let me know.
Here is an example with the raw inference script:python ./dit/object_detection/inference.py \ --image_path path/to/image.jpg \ --output_file_name path/to/output/image.jpg \ --config ./layoutlmv3/examples/object_detection/cascade_layoutlmv3.yaml \ --opts MODEL.WEIGHTS path/to/model.pth
You can find the model weights at https://huggingface.co/HYPJUDY/layoutlmv3-base-finetuned-publaynet. I hope this is useful.
@Blind2015-private
, i am able to run the code now, using the code available in spaces app.py;
thankfully, got the labels for PubLayNet dataset as
id2label = {0: "Text", 1: "Title", 2: "List", 3: "Table", 4: "Figure"}
but the but, still output is not what i want.
LayoutLM model only able to classify individual words instead of entire sections.
how can i fix that?
@MLLife if the model is classifying individual words it is possible that the model is not working properly on your data because it is overfitting on the original training data. I am using a page of the original paper of layoutlmv3 rescaled so that the longer axis is 1000 pixels long and it works fine (meaning that it is classifying sections).
If you want a model that classifies individual words, the weights I shared earlier are not appropriate. Unfortunately, I wouldn't know how to obtain a more appropriate model checkpoint for your use case.
@Blind2015-private
, can you tell what is the exact image size you use to get section wise box labels? also, i found something on github - https://github.com/microsoft/unilm/issues/906
but, i don't see any such option in huggingface code
@nielsr
edit: found great article here; https://github.com/microsoft/unilm/issues/800
@MLLife as @superbean said you can use the script
inference.py
at the link https://github.com/microsoft/unilm/blob/master/dit/object_detection/inference.py and substitute the paths to the correct config and model weights. You DON'T need to change the script if you only want to visualize the results. You will have to modify the script if you need to save the bounding boxes, labels, and scores somewhere. All the results can be found in theoutput
object at line 67 of the script linked above. You must add some lines to save them somewhere. Unfortunately, classes are represented with integers, not text labels, so if you know the mapping please let me know.Here is an example with the raw inference script:
python ./dit/object_detection/inference.py \ --image_path path/to/image.jpg \ --output_file_name path/to/output/image.jpg \ --config ./layoutlmv3/examples/object_detection/cascade_layoutlmv3.yaml \ --opts MODEL.WEIGHTS path/to/model.pth
You can find the model weights at https://huggingface.co/HYPJUDY/layoutlmv3-base-finetuned-publaynet. I hope this is useful.
can you share the exact code you used? for document layout?
@MLLife as @superbean said you can use the script
inference.py
at the link https://github.com/microsoft/unilm/blob/master/dit/object_detection/inference.py and substitute the paths to the correct config and model weights. You DON'T need to change the script if you only want to visualize the results. You will have to modify the script if you need to save the bounding boxes, labels, and scores somewhere. All the results can be found in theoutput
object at line 67 of the script linked above. You must add some lines to save them somewhere. Unfortunately, classes are represented with integers, not text labels, so if you know the mapping please let me know.Here is an example with the raw inference script:
python ./dit/object_detection/inference.py \ --image_path path/to/image.jpg \ --output_file_name path/to/output/image.jpg \ --config ./layoutlmv3/examples/object_detection/cascade_layoutlmv3.yaml \ --opts MODEL.WEIGHTS path/to/model.pth
You can find the model weights at https://huggingface.co/HYPJUDY/layoutlmv3-base-finetuned-publaynet. I hope this is useful.
can you share the exact code you used? for document layout?
@MLLife got anything on this ?
@Vineetttt , its not straight forward but, there is a DiT document layout app file from here, https://huggingface.co/spaces/nielsr/dit-document-layout-analysis/blob/main/app.py modify it, just the model paths and initial imports, it will work.
remember, layoutlmv3 can't be used for commerical projects