wangyueqian
/

MMDuet

Video-Text-to-Text

llava-onevision

online video understanding

video understanding

Model card Files Files and versions Community

wangyueqian commited on Nov 28, 2024

Commit

2366917

·

verified ·

1 Parent(s): b6413e7

add paper and video demo to REAME.md

Files changed (1) hide show

README.md +11 -3

README.md CHANGED Viewed

@@ -23,14 +23,22 @@ This is the model checkpoint of **MMDuet**, a VideoLLM you can interact with in
 ## Related Resources
 - **Github:** [MMDuet](https://github.com/yellow-binary-tree/MMDuet)
-- **Paper:** TODO
-- **Demo:** [Video Demo](https://www.youtube.com/watch?v=n1OybwhQvtk)
 - **Data:** [MMDuetIT](https://huggingface.co/datasets/wangyueqian/MMDuetIT)
 ## Citation
 If you use this work in your research, please consider cite:
 ```bibtex
 ```

 ## Related Resources
+- **Paper:** [VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video Comprehension with Video-Text Duet Interaction Format](https://arxiv.org/abs/2411.17991)
 - **Github:** [MMDuet](https://github.com/yellow-binary-tree/MMDuet)
+- **Video Demo:** [On Youtube](https://www.youtube.com/watch?v=n1OybwhQvtk) and [On Bilibili](https://www.bilibili.com/video/BV1nwzGYBEPE)
 - **Data:** [MMDuetIT](https://huggingface.co/datasets/wangyueqian/MMDuetIT)
 ## Citation
 If you use this work in your research, please consider cite:
 ```bibtex
+@misc{wang2024mmduet,
+      title={VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video Comprehension with Video-Text Duet Interaction Format},
+      author={Yueqian Wang and Xiaojun Meng and Yuxuan Wang and Jianxin Liang and Jiansheng Wei and Huishuai Zhang and Dongyan Zhao},
+      year={2024},
+      eprint={2411.17991},
+      archivePrefix={arXiv},
+      primaryClass={cs.CV},
+      url={https://arxiv.org/abs/2411.17991},
+}
 ```