Update README.md
Browse files
README.md
CHANGED
@@ -1129,9 +1129,8 @@ else:
|
|
1129 |
```
|
1130 |
|
1131 |
|
1132 |
-
|
1133 |
-
|
1134 |
-
<details> <summary> Model initialization </summary>
|
1135 |
|
1136 |
```python
|
1137 |
import torch
|
@@ -1147,8 +1146,6 @@ model.init_tts()
|
|
1147 |
model.tts.float()
|
1148 |
```
|
1149 |
|
1150 |
-
</details>
|
1151 |
-
|
1152 |
<br/>
|
1153 |
|
1154 |
##### **Mimick**
|
@@ -1181,7 +1178,6 @@ res = model.chat(
|
|
1181 |
|
1182 |
A general usage scenario of MiniCPM-o 2.6 is role-playing a specific character based on the audio prompt. It will mimic the voice of the character to some extent and act like the character in text, including language style. In this mode, MiniCPM-o-2.6 will sounds **more natural and human-like**. Self-defined audio prompts can be used to customize the voice of the character in an end-to-end manner.
|
1183 |
|
1184 |
-
<details> <summary>Click to view the Python code for enabling MiniCPM-o 2.6 to interact with you in a specified voice.</summary>
|
1185 |
|
1186 |
```python
|
1187 |
ref_audio, _ = librosa.load('./assets/voice_01.wav', sr=16000, mono=True) # load the reference audio
|
@@ -1218,7 +1214,6 @@ res = model.chat(
|
|
1218 |
print(res)
|
1219 |
```
|
1220 |
|
1221 |
-
</details>
|
1222 |
|
1223 |
<br/>
|
1224 |
<br/>
|
@@ -1227,8 +1222,6 @@ print(res)
|
|
1227 |
|
1228 |
An enhanced feature of MiniCPM-o-2.6 is to act as an AI assistant, but only with limited choice of voices. In this mode, MiniCPM-o-2.6 is **less human-like and more like a voice assistant**. But it is more instruction-following.
|
1229 |
|
1230 |
-
<details> <summary>Click to view the Python code for enabling MiniCPM-o 2.6 to act as an AI assistant.</summary>
|
1231 |
-
|
1232 |
```python
|
1233 |
sys_prompt = model.get_sys_prompt(ref_audio=ref_audio, mode='audio_assistant', language='en')
|
1234 |
user_question = {'role': 'user', 'content': [librosa.load('xxx.wav', sr=16000, mono=True)[0]]}
|
@@ -1262,7 +1255,6 @@ res = model.chat(
|
|
1262 |
)
|
1263 |
print(res)
|
1264 |
```
|
1265 |
-
</details>
|
1266 |
|
1267 |
<br/>
|
1268 |
|
@@ -1328,8 +1320,6 @@ res = model.chat(
|
|
1328 |
|
1329 |
MiniCPM-o-2.6 can also be used to address various audio understanding tasks, such as ASR, speaker analysis, general audio captioning, and sound scene tagging.
|
1330 |
|
1331 |
-
<details>
|
1332 |
-
<summary> Click to show Python code running MiniCPM-o 2.6 with specific audioQA task. </summary>
|
1333 |
|
1334 |
For audio-to-text tasks, you can use the following prompts:
|
1335 |
|
@@ -1357,7 +1347,7 @@ res = model.chat(
|
|
1357 |
)
|
1358 |
print(res)
|
1359 |
```
|
1360 |
-
|
1361 |
|
1362 |
<br/>
|
1363 |
<br/>
|
|
|
1129 |
```
|
1130 |
|
1131 |
|
1132 |
+
#### Speech and Audio Mode
|
1133 |
+
Model initialization
|
|
|
1134 |
|
1135 |
```python
|
1136 |
import torch
|
|
|
1146 |
model.tts.float()
|
1147 |
```
|
1148 |
|
|
|
|
|
1149 |
<br/>
|
1150 |
|
1151 |
##### **Mimick**
|
|
|
1178 |
|
1179 |
A general usage scenario of MiniCPM-o 2.6 is role-playing a specific character based on the audio prompt. It will mimic the voice of the character to some extent and act like the character in text, including language style. In this mode, MiniCPM-o-2.6 will sounds **more natural and human-like**. Self-defined audio prompts can be used to customize the voice of the character in an end-to-end manner.
|
1180 |
|
|
|
1181 |
|
1182 |
```python
|
1183 |
ref_audio, _ = librosa.load('./assets/voice_01.wav', sr=16000, mono=True) # load the reference audio
|
|
|
1214 |
print(res)
|
1215 |
```
|
1216 |
|
|
|
1217 |
|
1218 |
<br/>
|
1219 |
<br/>
|
|
|
1222 |
|
1223 |
An enhanced feature of MiniCPM-o-2.6 is to act as an AI assistant, but only with limited choice of voices. In this mode, MiniCPM-o-2.6 is **less human-like and more like a voice assistant**. But it is more instruction-following.
|
1224 |
|
|
|
|
|
1225 |
```python
|
1226 |
sys_prompt = model.get_sys_prompt(ref_audio=ref_audio, mode='audio_assistant', language='en')
|
1227 |
user_question = {'role': 'user', 'content': [librosa.load('xxx.wav', sr=16000, mono=True)[0]]}
|
|
|
1255 |
)
|
1256 |
print(res)
|
1257 |
```
|
|
|
1258 |
|
1259 |
<br/>
|
1260 |
|
|
|
1320 |
|
1321 |
MiniCPM-o-2.6 can also be used to address various audio understanding tasks, such as ASR, speaker analysis, general audio captioning, and sound scene tagging.
|
1322 |
|
|
|
|
|
1323 |
|
1324 |
For audio-to-text tasks, you can use the following prompts:
|
1325 |
|
|
|
1347 |
)
|
1348 |
print(res)
|
1349 |
```
|
1350 |
+
|
1351 |
|
1352 |
<br/>
|
1353 |
<br/>
|