jhauret commited on
Commit
3ac7602
·
verified ·
1 Parent(s): d174d13

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -39
README.md CHANGED
@@ -1,32 +1,28 @@
1
  ---
2
- datasets:
3
- - Cnam-LMSSC/vibravox
4
  language: fr
5
- library_name: transformers
6
  license: mit
 
7
  tags:
8
- - audio
9
- - audio-to-audio
10
- - speech
 
 
11
  model-index:
12
- - name: EBEN(M=?,P=?,Q=?)
13
- results:
14
- - task:
15
- type: speech-enhancement
16
- name: Bandwidth Extension
17
- dataset:
18
- name: Vibravox["YOUR_MIC"]
19
- type: Cnam-LMSSC/vibravox
20
- args: fr
21
- metrics:
22
- - type: stoi
23
- value: ???
24
- name: Test STOI, in-domain training
25
- - type: n-mos
26
- value: ???
27
- name: Test Noresqa-MOS, in-domain training
28
  ---
29
-
30
  <p align="center">
31
  <img src="https://cdn-uploads.huggingface.co/production/uploads/65302a613ecbe51d6a6ddcec/zhB1fh-c0pjlj-Tr4Vpmr.png" style="object-fit:contain; width:280px; height:280px;" >
32
  </p>
@@ -34,7 +30,7 @@ model-index:
34
  # Model Card
35
 
36
  - **Developed by:** [Cnam-LMSSC](https://huggingface.co/Cnam-LMSSC)
37
- - **Model:** [EBEN(M=?,P=?,Q=?)](https://github.com/jhauret/vibravox/blob/main/vibravox/torch_modules/dnn/eben_generator.py) (see [publication in IEEE TASLP](https://ieeexplore.ieee.org/document/10244161) - [arXiv link](https://arxiv.org/abs/2303.10008))
38
  - **Language:** French
39
  - **License:** MIT
40
  - **Training dataset:** `speech_clean` subset of [Cnam-LMSSC/vibravox](https://huggingface.co/datasets/Cnam-LMSSC/vibravox)
@@ -42,18 +38,7 @@ model-index:
42
 
43
  ## Overview
44
 
45
- This bandwidth extension model, trained on [Vibravox](https://huggingface.co/datasets/Cnam-LMSSC/vibravox) body conduction sensor data, enhances body-conducted speech audio by denoising and regenerating mid and high frequencies from low-frequency content.
46
-
47
- ## Disclaimer
48
- This model, trained for **a specific non-conventional speech sensor**, is intended to be used with **in-domain data**. Using it with other sensor data may lead to suboptimal performance.
49
-
50
- ## Link to BWE models trained on other body conducted sensors :
51
-
52
- The entry point to all EBEN models for Bandwidth Extension (BWE) is available at [https://huggingface.co/Cnam-LMSSC/vibravox_EBEN_models](https://huggingface.co/Cnam-LMSSC/vibravox_EBEN_models).
53
-
54
- ## Training procedure
55
-
56
- Detailed instructions for reproducing the experiments are available on the [jhauret/vibravox](https://github.com/jhauret/vibravox) Github repository.
57
 
58
  ## Inference script :
59
 
@@ -62,12 +47,12 @@ import torch, torchaudio
62
  from vibravox.torch_modules.dnn.eben_generator import EBENGenerator
63
  from datasets import load_dataset
64
 
65
- model = EBENGenerator.from_pretrained("Cnam-LMSSC/EBEN_YOUR_MIC")
66
  test_dataset = load_dataset("Cnam-LMSSC/vibravox", "speech_clean", split="test", streaming=True)
67
 
68
- audio_48kHz = torch.Tensor(next(iter(test_dataset))["audio.YOUR_MIC"]["array"])
69
  audio_16kHz = torchaudio.functional.resample(audio_48kHz, orig_freq=48_000, new_freq=16_000)
70
 
71
  cut_audio_16kHz = model.cut_to_valid_length(audio_16kHz[None, None, :])
72
- enhanced_audio_16kHz = model(cut_audio_16kHz)
73
  ```
 
1
  ---
 
 
2
  language: fr
 
3
  license: mit
4
+ library_name: transformers
5
  tags:
6
+ - audio
7
+ - audio-to-audio
8
+ - speech
9
+ datasets:
10
+ - Cnam-LMSSC/vibravox
11
  model-index:
12
+ - name: EBEN(M=4,P=4,Q=4)
13
+ results:
14
+ - task:
15
+ name: Bandwidth Extension
16
+ type: speech-enhancement
17
+ dataset:
18
+ name: Vibravox["headset_microphone"] to Vibravox["soft_in_ear_microphone"]
19
+ type: Cnam-LMSSC/vibravox
20
+ args: fr
21
+ metrics:
22
+ - name: Test STOI, in-domain training
23
+ type: stoi
24
+ value: 0.7865
 
 
 
25
  ---
 
26
  <p align="center">
27
  <img src="https://cdn-uploads.huggingface.co/production/uploads/65302a613ecbe51d6a6ddcec/zhB1fh-c0pjlj-Tr4Vpmr.png" style="object-fit:contain; width:280px; height:280px;" >
28
  </p>
 
30
  # Model Card
31
 
32
  - **Developed by:** [Cnam-LMSSC](https://huggingface.co/Cnam-LMSSC)
33
+ - **Model:** [EBEN(M=4,P=4,Q=4)](https://github.com/jhauret/vibravox/blob/main/vibravox/torch_modules/dnn/eben_generator.py) (see [publication in IEEE TASLP](https://ieeexplore.ieee.org/document/10244161) - [arXiv link](https://arxiv.org/abs/2303.10008))
34
  - **Language:** French
35
  - **License:** MIT
36
  - **Training dataset:** `speech_clean` subset of [Cnam-LMSSC/vibravox](https://huggingface.co/datasets/Cnam-LMSSC/vibravox)
 
38
 
39
  ## Overview
40
 
41
+ This model, trained on [Vibravox](https://huggingface.co/datasets/Cnam-LMSSC/vibravox) body conduction sensor data, maps clean speech to body-conducted speech.
 
 
 
 
 
 
 
 
 
 
 
42
 
43
  ## Inference script :
44
 
 
47
  from vibravox.torch_modules.dnn.eben_generator import EBENGenerator
48
  from datasets import load_dataset
49
 
50
+ model = EBENGenerator.from_pretrained("Cnam-LMSSC/EBEN_reverse_soft_in_ear_microphone")
51
  test_dataset = load_dataset("Cnam-LMSSC/vibravox", "speech_clean", split="test", streaming=True)
52
 
53
+ audio_48kHz = torch.Tensor(next(iter(test_dataset))["audio.headset_microphone"]["array"])
54
  audio_16kHz = torchaudio.functional.resample(audio_48kHz, orig_freq=48_000, new_freq=16_000)
55
 
56
  cut_audio_16kHz = model.cut_to_valid_length(audio_16kHz[None, None, :])
57
+ degraded_audio_16kHz, _ = model(cut_audio_16kHz)
58
  ```