update readme
Browse files
README.md
CHANGED
@@ -73,10 +73,9 @@ MiniCPM-o 2.6 can be easily used in various ways: (1) [llama.cpp](https://github
|
|
73 |
<img src="https://github.com/OpenBMB/MiniCPM-o/raw/main/assets/radar.jpg" width=90% />
|
74 |
</div>
|
75 |
|
76 |
-
|
77 |
-
<summary>Click to view visual understanding results.</summary>
|
78 |
|
79 |
-
**Image Understanding
|
80 |
|
81 |
<div align="center">
|
82 |
<table style="margin: 0px auto;">
|
@@ -394,8 +393,10 @@ MiniCPM-o 2.6 can be easily used in various ways: (1) [llama.cpp](https://github
|
|
394 |
Note: For proprietary models, we calculate token density based on the image encoding charging strategy defined in the official API documentation, which provides an upper-bound estimation.
|
395 |
|
396 |
|
397 |
-
**Multi-image and Video Understanding
|
398 |
|
|
|
|
|
399 |
<div align="center">
|
400 |
|
401 |
<table style="margin: 0px auto;">
|
@@ -497,10 +498,9 @@ Note: For proprietary models, we calculate token density based on the image enco
|
|
497 |
</details>
|
498 |
|
499 |
|
500 |
-
|
501 |
-
<summary>Click to view audio understanding and speech conversation results.</summary>
|
502 |
|
503 |
-
**Audio Understanding
|
504 |
|
505 |
<div align="center">
|
506 |
<table style="margin: 0px auto;">
|
@@ -624,7 +624,7 @@ Note: For proprietary models, we calculate token density based on the image enco
|
|
624 |
</div>
|
625 |
* We evaluate officially released checkpoints by ourselves.<br><br>
|
626 |
|
627 |
-
**Speech Generation
|
628 |
|
629 |
<div align="center">
|
630 |
<table style="margin: 0px auto;">
|
@@ -790,12 +790,10 @@ All results are from AudioEvals, and the evaluation methods along with further d
|
|
790 |
</table>
|
791 |
</div>
|
792 |
|
793 |
-
</details>
|
794 |
|
795 |
-
|
796 |
-
<summary>Click to view multimodal live streaming results.</summary>
|
797 |
|
798 |
-
**Multimodal Live Streaming
|
799 |
|
800 |
<table style="margin: 0px auto;">
|
801 |
<thead>
|
@@ -922,7 +920,6 @@ All results are from AudioEvals, and the evaluation methods along with further d
|
|
922 |
</tbody>
|
923 |
</table>
|
924 |
|
925 |
-
</details>
|
926 |
|
927 |
|
928 |
### Examples <!-- omit in toc -->
|
|
|
73 |
<img src="https://github.com/OpenBMB/MiniCPM-o/raw/main/assets/radar.jpg" width=90% />
|
74 |
</div>
|
75 |
|
76 |
+
#### Visual understanding results
|
|
|
77 |
|
78 |
+
**Image Understanding:**
|
79 |
|
80 |
<div align="center">
|
81 |
<table style="margin: 0px auto;">
|
|
|
393 |
Note: For proprietary models, we calculate token density based on the image encoding charging strategy defined in the official API documentation, which provides an upper-bound estimation.
|
394 |
|
395 |
|
396 |
+
**Multi-image and Video Understanding:**
|
397 |
|
398 |
+
<details>
|
399 |
+
<summary>click to view</summary>
|
400 |
<div align="center">
|
401 |
|
402 |
<table style="margin: 0px auto;">
|
|
|
498 |
</details>
|
499 |
|
500 |
|
501 |
+
#### Audio understanding and speech conversation results.
|
|
|
502 |
|
503 |
+
**Audio Understanding:**
|
504 |
|
505 |
<div align="center">
|
506 |
<table style="margin: 0px auto;">
|
|
|
624 |
</div>
|
625 |
* We evaluate officially released checkpoints by ourselves.<br><br>
|
626 |
|
627 |
+
**Speech Generation:**
|
628 |
|
629 |
<div align="center">
|
630 |
<table style="margin: 0px auto;">
|
|
|
790 |
</table>
|
791 |
</div>
|
792 |
|
|
|
793 |
|
794 |
+
#### Multimodal live streaming results.
|
|
|
795 |
|
796 |
+
**Multimodal Live Streaming:** results on StreamingBench
|
797 |
|
798 |
<table style="margin: 0px auto;">
|
799 |
<thead>
|
|
|
920 |
</tbody>
|
921 |
</table>
|
922 |
|
|
|
923 |
|
924 |
|
925 |
### Examples <!-- omit in toc -->
|