Update README.md
Browse files
README.md
CHANGED
@@ -1130,6 +1130,7 @@ else:
|
|
1130 |
|
1131 |
|
1132 |
#### Speech and Audio Mode
|
|
|
1133 |
Model initialization
|
1134 |
|
1135 |
```python
|
@@ -1146,9 +1147,9 @@ model.init_tts()
|
|
1146 |
model.tts.float()
|
1147 |
```
|
1148 |
|
1149 |
-
<
|
1150 |
|
1151 |
-
#####
|
1152 |
|
1153 |
`Mimick` task reflects a model's end-to-end speech modeling capability. The model takes audio input, and outputs an ASR transcription and subsequently reconstructs the original audio with high similarity. The higher the similarity between the reconstructed audio and the original audio, the stronger the model's foundational capability in end-to-end speech modeling.
|
1154 |
|
@@ -1172,9 +1173,9 @@ res = model.chat(
|
|
1172 |
|
1173 |
</details>
|
1174 |
|
1175 |
-
<
|
1176 |
|
1177 |
-
#####
|
1178 |
|
1179 |
A general usage scenario of MiniCPM-o 2.6 is role-playing a specific character based on the audio prompt. It will mimic the voice of the character to some extent and act like the character in text, including language style. In this mode, MiniCPM-o-2.6 will sounds **more natural and human-like**. Self-defined audio prompts can be used to customize the voice of the character in an end-to-end manner.
|
1180 |
|
@@ -1214,11 +1215,9 @@ res = model.chat(
|
|
1214 |
print(res)
|
1215 |
```
|
1216 |
|
|
|
1217 |
|
1218 |
-
|
1219 |
-
<br/>
|
1220 |
-
|
1221 |
-
##### **Speech Conversation as an AI Assistant**
|
1222 |
|
1223 |
An enhanced feature of MiniCPM-o-2.6 is to act as an AI assistant, but only with limited choice of voices. In this mode, MiniCPM-o-2.6 is **less human-like and more like a voice assistant**. But it is more instruction-following.
|
1224 |
|
@@ -1256,9 +1255,9 @@ res = model.chat(
|
|
1256 |
print(res)
|
1257 |
```
|
1258 |
|
1259 |
-
<
|
1260 |
|
1261 |
-
#####
|
1262 |
|
1263 |
MiniCPM-o-2.6 can also do Instruction-to-Speech, aka **Voice Creation**. You can describe a voice in detail, and the model will generate a voice that matches the description. For more Instruction-to-Speech sample instructions, you can refer to https://voxinstruct.github.io/VoxInstruct/.
|
1264 |
|
@@ -1285,9 +1284,9 @@ res = model.chat(
|
|
1285 |
```
|
1286 |
</details>
|
1287 |
|
1288 |
-
<
|
1289 |
|
1290 |
-
#####
|
1291 |
|
1292 |
MiniCPM-o-2.6 can also do zero-shot text-to-speech, aka **Voice Cloning**. With this mode, model will act like a TTS model.
|
1293 |
|
@@ -1314,9 +1313,9 @@ res = model.chat(
|
|
1314 |
```
|
1315 |
</details>
|
1316 |
|
1317 |
-
<
|
1318 |
|
1319 |
-
#####
|
1320 |
|
1321 |
MiniCPM-o-2.6 can also be used to address various audio understanding tasks, such as ASR, speaker analysis, general audio captioning, and sound scene tagging.
|
1322 |
|
@@ -1349,10 +1348,6 @@ print(res)
|
|
1349 |
```
|
1350 |
|
1351 |
|
1352 |
-
<br/>
|
1353 |
-
<br/>
|
1354 |
-
|
1355 |
-
|
1356 |
### Vision-Only mode
|
1357 |
|
1358 |
`MiniCPM-o-2_6` has the same inference methods as `MiniCPM-V-2_6`
|
|
|
1130 |
|
1131 |
|
1132 |
#### Speech and Audio Mode
|
1133 |
+
|
1134 |
Model initialization
|
1135 |
|
1136 |
```python
|
|
|
1147 |
model.tts.float()
|
1148 |
```
|
1149 |
|
1150 |
+
<hr/>
|
1151 |
|
1152 |
+
##### Mimick
|
1153 |
|
1154 |
`Mimick` task reflects a model's end-to-end speech modeling capability. The model takes audio input, and outputs an ASR transcription and subsequently reconstructs the original audio with high similarity. The higher the similarity between the reconstructed audio and the original audio, the stronger the model's foundational capability in end-to-end speech modeling.
|
1155 |
|
|
|
1173 |
|
1174 |
</details>
|
1175 |
|
1176 |
+
<hr/>
|
1177 |
|
1178 |
+
##### General Speech Conversation with Configurable Voices
|
1179 |
|
1180 |
A general usage scenario of MiniCPM-o 2.6 is role-playing a specific character based on the audio prompt. It will mimic the voice of the character to some extent and act like the character in text, including language style. In this mode, MiniCPM-o-2.6 will sounds **more natural and human-like**. Self-defined audio prompts can be used to customize the voice of the character in an end-to-end manner.
|
1181 |
|
|
|
1215 |
print(res)
|
1216 |
```
|
1217 |
|
1218 |
+
<hr/>
|
1219 |
|
1220 |
+
##### Speech Conversation as an AI Assistant
|
|
|
|
|
|
|
1221 |
|
1222 |
An enhanced feature of MiniCPM-o-2.6 is to act as an AI assistant, but only with limited choice of voices. In this mode, MiniCPM-o-2.6 is **less human-like and more like a voice assistant**. But it is more instruction-following.
|
1223 |
|
|
|
1255 |
print(res)
|
1256 |
```
|
1257 |
|
1258 |
+
<hr/>
|
1259 |
|
1260 |
+
##### Instruction-to-Speech
|
1261 |
|
1262 |
MiniCPM-o-2.6 can also do Instruction-to-Speech, aka **Voice Creation**. You can describe a voice in detail, and the model will generate a voice that matches the description. For more Instruction-to-Speech sample instructions, you can refer to https://voxinstruct.github.io/VoxInstruct/.
|
1263 |
|
|
|
1284 |
```
|
1285 |
</details>
|
1286 |
|
1287 |
+
<hr/>
|
1288 |
|
1289 |
+
##### Voice Cloning
|
1290 |
|
1291 |
MiniCPM-o-2.6 can also do zero-shot text-to-speech, aka **Voice Cloning**. With this mode, model will act like a TTS model.
|
1292 |
|
|
|
1313 |
```
|
1314 |
</details>
|
1315 |
|
1316 |
+
<hr/>
|
1317 |
|
1318 |
+
##### Addressing Various Audio Understanding Tasks
|
1319 |
|
1320 |
MiniCPM-o-2.6 can also be used to address various audio understanding tasks, such as ASR, speaker analysis, general audio captioning, and sound scene tagging.
|
1321 |
|
|
|
1348 |
```
|
1349 |
|
1350 |
|
|
|
|
|
|
|
|
|
1351 |
### Vision-Only mode
|
1352 |
|
1353 |
`MiniCPM-o-2_6` has the same inference methods as `MiniCPM-V-2_6`
|