bokesyo commited on
Commit
1939b37
·
verified ·
1 Parent(s): da91e91

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -18
README.md CHANGED
@@ -1130,6 +1130,7 @@ else:
1130
 
1131
 
1132
  #### Speech and Audio Mode
 
1133
  Model initialization
1134
 
1135
  ```python
@@ -1146,9 +1147,9 @@ model.init_tts()
1146
  model.tts.float()
1147
  ```
1148
 
1149
- <br/>
1150
 
1151
- ##### **Mimick**
1152
 
1153
  `Mimick` task reflects a model's end-to-end speech modeling capability. The model takes audio input, and outputs an ASR transcription and subsequently reconstructs the original audio with high similarity. The higher the similarity between the reconstructed audio and the original audio, the stronger the model's foundational capability in end-to-end speech modeling.
1154
 
@@ -1172,9 +1173,9 @@ res = model.chat(
1172
 
1173
  </details>
1174
 
1175
- <br/>
1176
 
1177
- ##### **General Speech Conversation with Configurable Voices**
1178
 
1179
  A general usage scenario of MiniCPM-o 2.6 is role-playing a specific character based on the audio prompt. It will mimic the voice of the character to some extent and act like the character in text, including language style. In this mode, MiniCPM-o-2.6 will sounds **more natural and human-like**. Self-defined audio prompts can be used to customize the voice of the character in an end-to-end manner.
1180
 
@@ -1214,11 +1215,9 @@ res = model.chat(
1214
  print(res)
1215
  ```
1216
 
 
1217
 
1218
- <br/>
1219
- <br/>
1220
-
1221
- ##### **Speech Conversation as an AI Assistant**
1222
 
1223
  An enhanced feature of MiniCPM-o-2.6 is to act as an AI assistant, but only with limited choice of voices. In this mode, MiniCPM-o-2.6 is **less human-like and more like a voice assistant**. But it is more instruction-following.
1224
 
@@ -1256,9 +1255,9 @@ res = model.chat(
1256
  print(res)
1257
  ```
1258
 
1259
- <br/>
1260
 
1261
- ##### **Instruction-to-Speech**
1262
 
1263
  MiniCPM-o-2.6 can also do Instruction-to-Speech, aka **Voice Creation**. You can describe a voice in detail, and the model will generate a voice that matches the description. For more Instruction-to-Speech sample instructions, you can refer to https://voxinstruct.github.io/VoxInstruct/.
1264
 
@@ -1285,9 +1284,9 @@ res = model.chat(
1285
  ```
1286
  </details>
1287
 
1288
- <br/>
1289
 
1290
- ##### **Voice Cloning**
1291
 
1292
  MiniCPM-o-2.6 can also do zero-shot text-to-speech, aka **Voice Cloning**. With this mode, model will act like a TTS model.
1293
 
@@ -1314,9 +1313,9 @@ res = model.chat(
1314
  ```
1315
  </details>
1316
 
1317
- <br/>
1318
 
1319
- ##### **Addressing Various Audio Understanding Tasks**
1320
 
1321
  MiniCPM-o-2.6 can also be used to address various audio understanding tasks, such as ASR, speaker analysis, general audio captioning, and sound scene tagging.
1322
 
@@ -1349,10 +1348,6 @@ print(res)
1349
  ```
1350
 
1351
 
1352
- <br/>
1353
- <br/>
1354
-
1355
-
1356
  ### Vision-Only mode
1357
 
1358
  `MiniCPM-o-2_6` has the same inference methods as `MiniCPM-V-2_6`
 
1130
 
1131
 
1132
  #### Speech and Audio Mode
1133
+
1134
  Model initialization
1135
 
1136
  ```python
 
1147
  model.tts.float()
1148
  ```
1149
 
1150
+ <hr/>
1151
 
1152
+ ##### Mimick
1153
 
1154
  `Mimick` task reflects a model's end-to-end speech modeling capability. The model takes audio input, and outputs an ASR transcription and subsequently reconstructs the original audio with high similarity. The higher the similarity between the reconstructed audio and the original audio, the stronger the model's foundational capability in end-to-end speech modeling.
1155
 
 
1173
 
1174
  </details>
1175
 
1176
+ <hr/>
1177
 
1178
+ ##### General Speech Conversation with Configurable Voices
1179
 
1180
  A general usage scenario of MiniCPM-o 2.6 is role-playing a specific character based on the audio prompt. It will mimic the voice of the character to some extent and act like the character in text, including language style. In this mode, MiniCPM-o-2.6 will sounds **more natural and human-like**. Self-defined audio prompts can be used to customize the voice of the character in an end-to-end manner.
1181
 
 
1215
  print(res)
1216
  ```
1217
 
1218
+ <hr/>
1219
 
1220
+ ##### Speech Conversation as an AI Assistant
 
 
 
1221
 
1222
  An enhanced feature of MiniCPM-o-2.6 is to act as an AI assistant, but only with limited choice of voices. In this mode, MiniCPM-o-2.6 is **less human-like and more like a voice assistant**. But it is more instruction-following.
1223
 
 
1255
  print(res)
1256
  ```
1257
 
1258
+ <hr/>
1259
 
1260
+ ##### Instruction-to-Speech
1261
 
1262
  MiniCPM-o-2.6 can also do Instruction-to-Speech, aka **Voice Creation**. You can describe a voice in detail, and the model will generate a voice that matches the description. For more Instruction-to-Speech sample instructions, you can refer to https://voxinstruct.github.io/VoxInstruct/.
1263
 
 
1284
  ```
1285
  </details>
1286
 
1287
+ <hr/>
1288
 
1289
+ ##### Voice Cloning
1290
 
1291
  MiniCPM-o-2.6 can also do zero-shot text-to-speech, aka **Voice Cloning**. With this mode, model will act like a TTS model.
1292
 
 
1313
  ```
1314
  </details>
1315
 
1316
+ <hr/>
1317
 
1318
+ ##### Addressing Various Audio Understanding Tasks
1319
 
1320
  MiniCPM-o-2.6 can also be used to address various audio understanding tasks, such as ASR, speaker analysis, general audio captioning, and sound scene tagging.
1321
 
 
1348
  ```
1349
 
1350
 
 
 
 
 
1351
  ### Vision-Only mode
1352
 
1353
  `MiniCPM-o-2_6` has the same inference methods as `MiniCPM-V-2_6`