yifeihu
/

TF-ID-base

@@ -6,14 +6,13 @@ tags:
 - vision
 - ocr
 - segmentation
-- coco
 ---
 # TF-ID: Table/Figure IDentifier for academic papers
 ## Model Summary
-TF-ID (Table/Figure IDentifier) is a family of object detection models finetuned to extract tables and figures in academic papers. They come in four versions:
 | Model   | Model size | Model Description |
 | ------- | ------------- |   ------------- |
 | TF-ID-base[[HF]](https://huggingface.co/yifeihu/TF-ID-base) | 0.23B  | Extract tables/figures and their caption text
@@ -22,8 +21,12 @@ TF-ID (Table/Figure IDentifier) is a family of object detection models finetuned
 | TF-ID-large-no-caption[[HF]](https://huggingface.co/yifeihu/TF-ID-large-no-caption) | 0.77B  | Extract tables/figures without caption text
 All TF-ID models are finetuned from [microsoft/Florence-2](https://huggingface.co/microsoft/Florence-2-large-ft) checkpoints.
 TF-ID models take an image of a single paper page as the input, and return bounding boxes for all tables and figures in the given page.
 TF-ID-base and TF-ID-large draw bounding boxes around tables/figures and their caption text.
 TF-ID-base-no-caption and TF-ID-large-no-caption draw bounding boxes around tables/figures without their caption text.
 ![image/png](https://huggingface.co/yifeihu/TF-ID-base/resolve/main/td-id-caption.png)
@@ -56,17 +59,15 @@ Use the code below to get started with the model.
 ```python
 import requests
 from PIL import Image
 from transformers import AutoProcessor, AutoModelForCausalLM
-model = AutoModelForCausalLM.from_pretrained("microsoft/Florence-2-base-ft", trust_remote_code=True)
-processor = AutoProcessor.from_pretrained("microsoft/Florence-2-base-ft", trust_remote_code=True)
 prompt = "<OD>"
-url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg?download=true"
 image = Image.open(requests.get(url, stream=True).raw)
 inputs = processor(text=prompt, images=image, return_tensors="pt")
@@ -86,13 +87,18 @@ print(parsed_answer)
 ```
 ## BibTex and citation info
 ```
-@article{xiao2023florence,
-  title={Florence-2: Advancing a unified representation for a variety of vision tasks},
-  author={Xiao, Bin and Wu, Haiping and Xu, Weijian and Dai, Xiyang and Hu, Houdong and Lu, Yumao and Zeng, Michael and Liu, Ce and Yuan, Lu},
-  journal={arXiv preprint arXiv:2311.06242},
-  year={2023}
 }
 ```

 - vision
 - ocr
 - segmentation
 ---
 # TF-ID: Table/Figure IDentifier for academic papers
 ## Model Summary
+TF-ID (Table/Figure IDentifier) is a family of object detection models finetuned to extract tables and figures in academic papers created by [Yifei Hu](https://x.com/hu_yifei). They come in four versions:
 | Model   | Model size | Model Description |
 | ------- | ------------- |   ------------- |
 | TF-ID-base[[HF]](https://huggingface.co/yifeihu/TF-ID-base) | 0.23B  | Extract tables/figures and their caption text
 | TF-ID-large-no-caption[[HF]](https://huggingface.co/yifeihu/TF-ID-large-no-caption) | 0.77B  | Extract tables/figures without caption text
 All TF-ID models are finetuned from [microsoft/Florence-2](https://huggingface.co/microsoft/Florence-2-large-ft) checkpoints.
+The models were finetuned with papers from Hugging Face Daily Papers. All bounding boxes are manually annotated and checked by humans.
 TF-ID models take an image of a single paper page as the input, and return bounding boxes for all tables and figures in the given page.
 TF-ID-base and TF-ID-large draw bounding boxes around tables/figures and their caption text.
 TF-ID-base-no-caption and TF-ID-large-no-caption draw bounding boxes around tables/figures without their caption text.
 ![image/png](https://huggingface.co/yifeihu/TF-ID-base/resolve/main/td-id-caption.png)
 ```python
 import requests
 from PIL import Image
 from transformers import AutoProcessor, AutoModelForCausalLM
+model = AutoModelForCausalLM.from_pretrained("yifeihu/TF-ID-base", trust_remote_code=True)
+processor = AutoProcessor.from_pretrained("yifeihu/TF-ID-base", trust_remote_code=True)
 prompt = "<OD>"
+url = "https://huggingface.co/yifeihu/TF-ID-base/resolve/main/arxiv_2305_10853_5.png?download=true"
 image = Image.open(requests.get(url, stream=True).raw)
 inputs = processor(text=prompt, images=image, return_tensors="pt")
 ```
+To visualize the results, see [this tutorial notebook](https://colab.research.google.com/github/roboflow-ai/notebooks/blob/main/notebooks/how-to-finetune-florence-2-on-detection-dataset.ipynb) for more details.
+## Finetuning Code and Dataset
+Coming soon!
 ## BibTex and citation info
 ```
+@misc{TF-ID,
+      url={[https://huggingface.co/yifeihu/TF-ID-base](https://huggingface.co/yifeihu/TF-ID-base)},
+      title={TF-ID: Table/Figure IDentifier for academic papers},
+      author={"Yifei Hu"}
 }
 ```