SpursgoZmy
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -33,7 +33,7 @@ It was trained with a two-stage pipeline as LLaVA:
|
|
33 |
2. Instruction tuning: train the vision-language connector and the base LLM with multimodal instruction following data of tabular and non-tabular tasks.
|
34 |
|
35 |
**Code Base:** We use the official code of [LLaVA-v1.5](https://github.com/haotian-liu/LLaVA) for model training and inference,
|
36 |
-
and the saved model checkpoint is uploaded to this repository.
|
37 |
|
38 |
**Model Date:** Table-LLaVA 13B was trained in January 2024.
|
39 |
|
@@ -73,9 +73,9 @@ Table LLaVA is based on LLaVA-1.5 and thus follows its license. Llama 2 is licen
|
|
73 |
|
74 |
## Limitations
|
75 |
|
76 |
-
Though the proposed Table-LLaVA demonstrates
|
77 |
great performance on a wide range of table-based
|
78 |
tasks, the resolution of input images (336*336) is relatively
|
79 |
low and may limit the upper bound of its capacity. Luckily, with the emergence of MLLMs which
|
80 |
possess higher input image resolution (e.g., Monkey (Li et al., 2023d), LLaVA-Next (Liu et al.,
|
81 |
-
2024)), researchers can use MMTab to develop more powerful tabular MLLM in the future research.
|
|
|
33 |
2. Instruction tuning: train the vision-language connector and the base LLM with multimodal instruction following data of tabular and non-tabular tasks.
|
34 |
|
35 |
**Code Base:** We use the official code of [LLaVA-v1.5](https://github.com/haotian-liu/LLaVA) for model training and inference,
|
36 |
+
and the saved model checkpoint is uploaded to this repository. Thus, Table LLaVA can be used in the same way as the normal LLaVA v1.5 model with its original code.
|
37 |
|
38 |
**Model Date:** Table-LLaVA 13B was trained in January 2024.
|
39 |
|
|
|
73 |
|
74 |
## Limitations
|
75 |
|
76 |
+
Table LLaVA takes one table image as the model input. Digesting multiple table images would be valuable to support more application scenarios. Though the proposed Table-LLaVA demonstrates
|
77 |
great performance on a wide range of table-based
|
78 |
tasks, the resolution of input images (336*336) is relatively
|
79 |
low and may limit the upper bound of its capacity. Luckily, with the emergence of MLLMs which
|
80 |
possess higher input image resolution (e.g., Monkey (Li et al., 2023d), LLaVA-Next (Liu et al.,
|
81 |
+
2024)), researchers can use MMTab to develop more powerful tabular MLLM in the future research.
|