bokesyo commited on
Commit
1f9e583
·
verified ·
1 Parent(s): e1b08c1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -7
README.md CHANGED
@@ -1130,7 +1130,7 @@ else:
1130
 
1131
 
1132
 
1133
- #### Speech Conversation
1134
  <details> <summary> Model initialization </summary>
1135
 
1136
  ```python
@@ -1151,7 +1151,7 @@ model.tts.float()
1151
 
1152
  <br/>
1153
 
1154
- ##### Mimick
1155
 
1156
  `Mimick` task reflects a model's end-to-end speech modeling capability. The model takes audio input, and outputs an ASR transcription and subsequently reconstructs the original audio with high similarity. The higher the similarity between the reconstructed audio and the original audio, the stronger the model's foundational capability in end-to-end speech modeling.
1157
 
@@ -1177,7 +1177,7 @@ res = model.chat(
1177
 
1178
  <br/>
1179
 
1180
- ##### General Speech Conversation with Configurable Voices
1181
 
1182
  A general usage scenario of MiniCPM-o 2.6 is role-playing a specific character based on the audio prompt. It will mimic the voice of the character to some extent and act like the character in text, including language style. In this mode, MiniCPM-o-2.6 will sounds **more natural and human-like**. Self-defined audio prompts can be used to customize the voice of the character in an end-to-end manner.
1183
 
@@ -1220,9 +1220,10 @@ print(res)
1220
 
1221
  </details>
1222
 
 
1223
  <br/>
1224
 
1225
- ##### Speech Conversation as an AI Assistant
1226
 
1227
  An enhanced feature of MiniCPM-o-2.6 is to act as an AI assistant, but only with limited choice of voices. In this mode, MiniCPM-o-2.6 is **less human-like and more like a voice assistant**. But it is more instruction-following.
1228
 
@@ -1265,7 +1266,7 @@ print(res)
1265
 
1266
  <br/>
1267
 
1268
- ##### Instruction-to-Speech
1269
 
1270
  MiniCPM-o-2.6 can also do Instruction-to-Speech, aka **Voice Creation**. You can describe a voice in detail, and the model will generate a voice that matches the description. For more Instruction-to-Speech sample instructions, you can refer to https://voxinstruct.github.io/VoxInstruct/.
1271
 
@@ -1294,7 +1295,7 @@ res = model.chat(
1294
 
1295
  <br/>
1296
 
1297
- ##### Voice Cloning
1298
 
1299
  MiniCPM-o-2.6 can also do zero-shot text-to-speech, aka **Voice Cloning**. With this mode, model will act like a TTS model.
1300
 
@@ -1323,7 +1324,7 @@ res = model.chat(
1323
 
1324
  <br/>
1325
 
1326
- ##### Addressing Various Audio Understanding Tasks
1327
 
1328
  MiniCPM-o-2.6 can also be used to address various audio understanding tasks, such as ASR, speaker analysis, general audio captioning, and sound scene tagging.
1329
 
@@ -1358,6 +1359,8 @@ print(res)
1358
  ```
1359
  </details>
1360
 
 
 
1361
 
1362
 
1363
  ### Vision-Only mode
 
1130
 
1131
 
1132
 
1133
+ #### Audio and Speech
1134
  <details> <summary> Model initialization </summary>
1135
 
1136
  ```python
 
1151
 
1152
  <br/>
1153
 
1154
+ ##### **Mimick**
1155
 
1156
  `Mimick` task reflects a model's end-to-end speech modeling capability. The model takes audio input, and outputs an ASR transcription and subsequently reconstructs the original audio with high similarity. The higher the similarity between the reconstructed audio and the original audio, the stronger the model's foundational capability in end-to-end speech modeling.
1157
 
 
1177
 
1178
  <br/>
1179
 
1180
+ ##### **General Speech Conversation with Configurable Voices**
1181
 
1182
  A general usage scenario of MiniCPM-o 2.6 is role-playing a specific character based on the audio prompt. It will mimic the voice of the character to some extent and act like the character in text, including language style. In this mode, MiniCPM-o-2.6 will sounds **more natural and human-like**. Self-defined audio prompts can be used to customize the voice of the character in an end-to-end manner.
1183
 
 
1220
 
1221
  </details>
1222
 
1223
+ <br/>
1224
  <br/>
1225
 
1226
+ ##### **Speech Conversation as an AI Assistant**
1227
 
1228
  An enhanced feature of MiniCPM-o-2.6 is to act as an AI assistant, but only with limited choice of voices. In this mode, MiniCPM-o-2.6 is **less human-like and more like a voice assistant**. But it is more instruction-following.
1229
 
 
1266
 
1267
  <br/>
1268
 
1269
+ ##### **Instruction-to-Speech**
1270
 
1271
  MiniCPM-o-2.6 can also do Instruction-to-Speech, aka **Voice Creation**. You can describe a voice in detail, and the model will generate a voice that matches the description. For more Instruction-to-Speech sample instructions, you can refer to https://voxinstruct.github.io/VoxInstruct/.
1272
 
 
1295
 
1296
  <br/>
1297
 
1298
+ ##### **Voice Cloning**
1299
 
1300
  MiniCPM-o-2.6 can also do zero-shot text-to-speech, aka **Voice Cloning**. With this mode, model will act like a TTS model.
1301
 
 
1324
 
1325
  <br/>
1326
 
1327
+ ##### **Addressing Various Audio Understanding Tasks**
1328
 
1329
  MiniCPM-o-2.6 can also be used to address various audio understanding tasks, such as ASR, speaker analysis, general audio captioning, and sound scene tagging.
1330
 
 
1359
  ```
1360
  </details>
1361
 
1362
+ <br/>
1363
+ <br/>
1364
 
1365
 
1366
  ### Vision-Only mode