Spaces:
Runtime error
Runtime error
## Preparing Data for YOLO-World | |
### Overview | |
For pre-training YOLO-World, we adopt several datasets as listed in the below table: | |
| Data | Samples | Type | Boxes | | |
| :-- | :-----: | :---:| :---: | | |
| Objects365v1 | 609k | detection | 9,621k | | |
| GQA | 621k | grounding | 3,681k | | |
| Flickr | 149k | grounding | 641k | | |
| CC3M-Lite | 245k | image-text | 821k | | |
### Dataset Directory | |
We put all data into the `data` directory, such as: | |
```bash | |
βββ coco | |
β βββ annotations | |
β βββ lvis | |
β βββ train2017 | |
β βββ val2017 | |
βββ flickr | |
β βββ annotations | |
β βββ images | |
βββ mixed_grounding | |
β βββ annotations | |
β βββ images | |
βββ mixed_grounding | |
β βββ annotations | |
β βββ images | |
βββ objects365v1 | |
β βββ annotations | |
β βββ train | |
β βββ val | |
``` | |
**NOTE**: We strongly suggest that you check the directories or paths in the dataset part of the config file, especially for the values `ann_file`, `data_root`, and `data_prefix`. | |
We provide the annotations of the pre-training data in the below table: | |
| Data | images | Annotation File | | |
| :--- | :------| :-------------- | | |
| Objects365v1 | [`Objects365 train`](https://opendatalab.com/OpenDataLab/Objects365_v1) | [`objects365_train.json`](https://opendatalab.com/OpenDataLab/Objects365_v1) | | |
| MixedGrounding | [`GQA`](https://nlp.stanford.edu/data/gqa/images.zip) | [`final_mixed_train_no_coco.json`](https://huggingface.co/GLIPModel/GLIP/tree/main/mdetr_annotations/final_mixed_train_no_coco.json) | | |
| Flickr30k | [`Flickr30k`](https://shannon.cs.illinois.edu/DenotationGraph/) |[`final_flickr_separateGT_train.json`](https://huggingface.co/GLIPModel/GLIP/tree/main/mdetr_annotations/final_flickr_separateGT_train.json) | | |
| LVIS-minival | [`COCO val2017`](https://cocodataset.org/) | [`lvis_v1_minival_inserted_image_name.json`](https://huggingface.co/GLIPModel/GLIP/blob/main/lvis_v1_minival_inserted_image_name.json) | | |
**Acknowledgement:** We sincerely thank [GLIP](https://github.com/microsoft/GLIP) and [mdetr](https://github.com/ashkamath/mdetr) for providing the annotation files for pre-training. | |
### Dataset Class | |
> For fine-tuning YOLO-World on Close-set Object Detection, using `MultiModalDataset` is recommended. | |
#### Setting CLASSES/Categories | |
If you use `COCO-format` custom datasets, you "DO NOT" need to define a dataset class for custom vocabularies/categories. | |
Explicitly setting the CLASSES in the config file through `metainfo=dict(classes=your_classes),` is simple: | |
```python | |
coco_train_dataset = dict( | |
_delete_=True, | |
type='MultiModalDataset', | |
dataset=dict( | |
type='YOLOv5CocoDataset', | |
metainfo=dict(classes=your_classes), | |
data_root='data/your_data', | |
ann_file='annotations/your_annotation.json', | |
data_prefix=dict(img='images/'), | |
filter_cfg=dict(filter_empty_gt=False, min_size=32)), | |
class_text_path='data/texts/your_class_texts.json', | |
pipeline=train_pipeline) | |
``` | |
For training YOLO-World, we mainly adopt two kinds of dataset classs: | |
#### 1. `MultiModalDataset` | |
`MultiModalDataset` is a simple wrapper for pre-defined Dataset Class, such as `Objects365` or `COCO`, which add the texts (category texts) into the dataset instance for formatting input texts. | |
**Text JSON** | |
The json file is formatted as follows: | |
```json | |
[ | |
['A_1','A_2'], | |
['B'], | |
['C_1', 'C_2', 'C_3'], | |
... | |
] | |
``` | |
We have provided the text json for [`LVIS`](./../data/texts/lvis_v1_class_texts.json), [`COCO`](../data/texts/coco_class_texts.json), and [`Objects365`](../data/texts/obj365v1_class_texts.json) | |
#### 2. `YOLOv5MixedGroundingDataset` | |
The `YOLOv5MixedGroundingDataset` extends the `COCO` dataset by supporting loading texts/captions from the json file. It's desgined for `MixedGrounding` or `Flickr30K` with text tokens for each object. | |
### π₯ Custom Datasets | |
For custom dataset, we suggest the users convert the annotation files according to the usage. Note that, converting the annotations to the **standard COCO format** is basically required. | |
1. **Large vocabulary, grounding, referring:** you can follow the annotation format as the `MixedGrounding` dataset, which adds `caption` and `tokens_positive` for assigning the text for each object. The texts can be a category or a noun phrases. | |
2. **Custom vocabulary (fixed):** you can adopt the `MultiModalDataset` wrapper as the `Objects365` and create a **text json** for your custom categories. | |
### CC3M Pseudo Annotations | |
The following annotations are generated according to the automatic labeling process in our paper. Adn we report the results based on these annotations. | |
To use CC3M annotations, you need to prepare the `CC3M` images first. | |
| Data | Images | Boxes | File | | |
| :--: | :----: | :---: | :---: | | |
| CC3M-246K | 246,363 | 820,629 | [Download π€](https://huggingface.co/wondervictor/YOLO-World/blob/main/cc3m_pseudo_annotations.json) | | |
| CC3M-500K | 536,405 | 1,784,405| [Download π€](https://huggingface.co/wondervictor/YOLO-World/blob/main/cc3m_pseudo_500k_annotations.json) | | |
| CC3M-750K | 750,000 | 4,504,805 | [Download π€](https://huggingface.co/wondervictor/YOLO-World/blob/main/cc3m_pseudo_750k_annotations.json) | |