bokesyo commited on
Commit
da91e91
·
verified ·
1 Parent(s): 1f9e583

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -13
README.md CHANGED
@@ -1129,9 +1129,8 @@ else:
1129
  ```
1130
 
1131
 
1132
-
1133
- #### Audio and Speech
1134
- <details> <summary> Model initialization </summary>
1135
 
1136
  ```python
1137
  import torch
@@ -1147,8 +1146,6 @@ model.init_tts()
1147
  model.tts.float()
1148
  ```
1149
 
1150
- </details>
1151
-
1152
  <br/>
1153
 
1154
  ##### **Mimick**
@@ -1181,7 +1178,6 @@ res = model.chat(
1181
 
1182
  A general usage scenario of MiniCPM-o 2.6 is role-playing a specific character based on the audio prompt. It will mimic the voice of the character to some extent and act like the character in text, including language style. In this mode, MiniCPM-o-2.6 will sounds **more natural and human-like**. Self-defined audio prompts can be used to customize the voice of the character in an end-to-end manner.
1183
 
1184
- <details> <summary>Click to view the Python code for enabling MiniCPM-o 2.6 to interact with you in a specified voice.</summary>
1185
 
1186
  ```python
1187
  ref_audio, _ = librosa.load('./assets/voice_01.wav', sr=16000, mono=True) # load the reference audio
@@ -1218,7 +1214,6 @@ res = model.chat(
1218
  print(res)
1219
  ```
1220
 
1221
- </details>
1222
 
1223
  <br/>
1224
  <br/>
@@ -1227,8 +1222,6 @@ print(res)
1227
 
1228
  An enhanced feature of MiniCPM-o-2.6 is to act as an AI assistant, but only with limited choice of voices. In this mode, MiniCPM-o-2.6 is **less human-like and more like a voice assistant**. But it is more instruction-following.
1229
 
1230
- <details> <summary>Click to view the Python code for enabling MiniCPM-o 2.6 to act as an AI assistant.</summary>
1231
-
1232
  ```python
1233
  sys_prompt = model.get_sys_prompt(ref_audio=ref_audio, mode='audio_assistant', language='en')
1234
  user_question = {'role': 'user', 'content': [librosa.load('xxx.wav', sr=16000, mono=True)[0]]}
@@ -1262,7 +1255,6 @@ res = model.chat(
1262
  )
1263
  print(res)
1264
  ```
1265
- </details>
1266
 
1267
  <br/>
1268
 
@@ -1328,8 +1320,6 @@ res = model.chat(
1328
 
1329
  MiniCPM-o-2.6 can also be used to address various audio understanding tasks, such as ASR, speaker analysis, general audio captioning, and sound scene tagging.
1330
 
1331
- <details>
1332
- <summary> Click to show Python code running MiniCPM-o 2.6 with specific audioQA task. </summary>
1333
 
1334
  For audio-to-text tasks, you can use the following prompts:
1335
 
@@ -1357,7 +1347,7 @@ res = model.chat(
1357
  )
1358
  print(res)
1359
  ```
1360
- </details>
1361
 
1362
  <br/>
1363
  <br/>
 
1129
  ```
1130
 
1131
 
1132
+ #### Speech and Audio Mode
1133
+ Model initialization
 
1134
 
1135
  ```python
1136
  import torch
 
1146
  model.tts.float()
1147
  ```
1148
 
 
 
1149
  <br/>
1150
 
1151
  ##### **Mimick**
 
1178
 
1179
  A general usage scenario of MiniCPM-o 2.6 is role-playing a specific character based on the audio prompt. It will mimic the voice of the character to some extent and act like the character in text, including language style. In this mode, MiniCPM-o-2.6 will sounds **more natural and human-like**. Self-defined audio prompts can be used to customize the voice of the character in an end-to-end manner.
1180
 
 
1181
 
1182
  ```python
1183
  ref_audio, _ = librosa.load('./assets/voice_01.wav', sr=16000, mono=True) # load the reference audio
 
1214
  print(res)
1215
  ```
1216
 
 
1217
 
1218
  <br/>
1219
  <br/>
 
1222
 
1223
  An enhanced feature of MiniCPM-o-2.6 is to act as an AI assistant, but only with limited choice of voices. In this mode, MiniCPM-o-2.6 is **less human-like and more like a voice assistant**. But it is more instruction-following.
1224
 
 
 
1225
  ```python
1226
  sys_prompt = model.get_sys_prompt(ref_audio=ref_audio, mode='audio_assistant', language='en')
1227
  user_question = {'role': 'user', 'content': [librosa.load('xxx.wav', sr=16000, mono=True)[0]]}
 
1255
  )
1256
  print(res)
1257
  ```
 
1258
 
1259
  <br/>
1260
 
 
1320
 
1321
  MiniCPM-o-2.6 can also be used to address various audio understanding tasks, such as ASR, speaker analysis, general audio captioning, and sound scene tagging.
1322
 
 
 
1323
 
1324
  For audio-to-text tasks, you can use the following prompts:
1325
 
 
1347
  )
1348
  print(res)
1349
  ```
1350
+
1351
 
1352
  <br/>
1353
  <br/>