openbmb
/

MiniCPM-o-2_6

Model card Files Files and versions Community

bokesyo commited on 19 days ago

Commit

da91e91

verified ·

1 Parent(s): 1f9e583

Update README.md

Browse files

Files changed (1) hide show

README.md +3 -13

README.md CHANGED Viewed

@@ -1129,9 +1129,8 @@ else:
 ```
-#### Audio and Speech
-<details> <summary> Model initialization </summary>
 ```python
 import torch
@@ -1147,8 +1146,6 @@ model.init_tts()
 model.tts.float()
 ```
-</details>
 <br/>
 ##### **Mimick**
@@ -1181,7 +1178,6 @@ res = model.chat(
 A general usage scenario of MiniCPM-o 2.6 is role-playing a specific character based on the audio prompt. It will mimic the voice of the character to some extent and act like the character in text, including language style. In this mode, MiniCPM-o-2.6 will sounds **more natural and human-like**. Self-defined audio prompts can be used to customize the voice of the character in an end-to-end manner.
-<details> <summary>Click to view the Python code for enabling MiniCPM-o 2.6 to interact with you in a specified voice.</summary>
 ```python
 ref_audio, _ = librosa.load('./assets/voice_01.wav', sr=16000, mono=True) # load the reference audio
@@ -1218,7 +1214,6 @@ res = model.chat(
 print(res)
 ```
-</details>
 <br/>
 <br/>
@@ -1227,8 +1222,6 @@ print(res)
 An enhanced feature of MiniCPM-o-2.6 is to act as an AI assistant, but only with limited choice of voices. In this mode, MiniCPM-o-2.6 is **less human-like and more like a voice assistant**. But it is more instruction-following.
-<details> <summary>Click to view the Python code for enabling MiniCPM-o 2.6 to act as an AI assistant.</summary>
 ```python
 sys_prompt = model.get_sys_prompt(ref_audio=ref_audio, mode='audio_assistant', language='en')
 user_question = {'role': 'user', 'content': [librosa.load('xxx.wav', sr=16000, mono=True)[0]]}
@@ -1262,7 +1255,6 @@ res = model.chat(
 )
 print(res)
 ```
-</details>
 <br/>
@@ -1328,8 +1320,6 @@ res = model.chat(
 MiniCPM-o-2.6 can also be used to address various audio understanding tasks, such as ASR, speaker analysis, general audio captioning, and sound scene tagging.
-<details>
-<summary> Click to show Python code running MiniCPM-o 2.6 with specific audioQA task. </summary>
 For audio-to-text tasks, you can use the following prompts:
@@ -1357,7 +1347,7 @@ res = model.chat(
 )
 print(res)
 ```
-</details>
 <br/>
 <br/>

 ```
+#### Speech and Audio Mode
+Model initialization
 ```python
 import torch
 model.tts.float()
 ```
 <br/>
 ##### **Mimick**
 A general usage scenario of MiniCPM-o 2.6 is role-playing a specific character based on the audio prompt. It will mimic the voice of the character to some extent and act like the character in text, including language style. In this mode, MiniCPM-o-2.6 will sounds **more natural and human-like**. Self-defined audio prompts can be used to customize the voice of the character in an end-to-end manner.
 ```python
 ref_audio, _ = librosa.load('./assets/voice_01.wav', sr=16000, mono=True) # load the reference audio
 print(res)
 ```
 <br/>
 <br/>
 An enhanced feature of MiniCPM-o-2.6 is to act as an AI assistant, but only with limited choice of voices. In this mode, MiniCPM-o-2.6 is **less human-like and more like a voice assistant**. But it is more instruction-following.
 ```python
 sys_prompt = model.get_sys_prompt(ref_audio=ref_audio, mode='audio_assistant', language='en')
 user_question = {'role': 'user', 'content': [librosa.load('xxx.wav', sr=16000, mono=True)[0]]}
 )
 print(res)
 ```
 <br/>
 MiniCPM-o-2.6 can also be used to address various audio understanding tasks, such as ASR, speaker analysis, general audio captioning, and sound scene tagging.
 For audio-to-text tasks, you can use the following prompts:
 )
 print(res)
 ```
 <br/>
 <br/>