jhauret commited on
Commit
81a8086
·
verified ·
1 Parent(s): 3c8c80f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -34
README.md CHANGED
@@ -1,30 +1,27 @@
1
  ---
2
- datasets:
3
- - Cnam-LMSSC/vibravox
4
  language: fr
5
- library_name: transformers
6
  license: mit
 
7
  tags:
8
- - audio
9
- - audio-to-audio
10
- - speech
 
 
11
  model-index:
12
- - name: EBEN(M=?,P=?,Q=?)
13
- results:
14
- - task:
15
- type: speech-enhancement
16
- name: Bandwidth Extension
17
- dataset:
18
- name: Vibravox["YOUR_MIC"]
19
- type: Cnam-LMSSC/vibravox
20
- args: fr
21
- metrics:
22
- - type: stoi
23
- value: ???
24
- name: Test STOI, in-domain training
25
- - type: n-mos
26
- value: ???
27
- name: Test Noresqa-MOS, in-domain training
28
  ---
29
 
30
  <p align="center">
@@ -34,7 +31,7 @@ model-index:
34
  # Model Card
35
 
36
  - **Developed by:** [Cnam-LMSSC](https://huggingface.co/Cnam-LMSSC)
37
- - **Model:** [EBEN(M=?,P=?,Q=?)](https://github.com/jhauret/vibravox/blob/main/vibravox/torch_modules/dnn/eben_generator.py) (see [publication in IEEE TASLP](https://ieeexplore.ieee.org/document/10244161) - [arXiv link](https://arxiv.org/abs/2303.10008))
38
  - **Language:** French
39
  - **License:** MIT
40
  - **Training dataset:** `speech_clean` subset of [Cnam-LMSSC/vibravox](https://huggingface.co/datasets/Cnam-LMSSC/vibravox)
@@ -42,18 +39,11 @@ model-index:
42
 
43
  ## Overview
44
 
45
- This bandwidth extension model, trained on [Vibravox](https://huggingface.co/datasets/Cnam-LMSSC/vibravox) body conduction sensor data, enhances body-conducted speech audio by denoising and regenerating mid and high frequencies from low-frequency content.
46
 
47
  ## Disclaimer
48
  This model, trained for **a specific non-conventional speech sensor**, is intended to be used with **in-domain data**. Using it with other sensor data may lead to suboptimal performance.
49
 
50
- ## Link to BWE models trained on other body conducted sensors :
51
-
52
- The entry point to all EBEN models for Bandwidth Extension (BWE) is available at [https://huggingface.co/Cnam-LMSSC/vibravox_EBEN_models](https://huggingface.co/Cnam-LMSSC/vibravox_EBEN_models).
53
-
54
- ## Training procedure
55
-
56
- Detailed instructions for reproducing the experiments are available on the [jhauret/vibravox](https://github.com/jhauret/vibravox) Github repository.
57
 
58
  ## Inference script :
59
 
@@ -62,12 +52,12 @@ import torch, torchaudio
62
  from vibravox.torch_modules.dnn.eben_generator import EBENGenerator
63
  from datasets import load_dataset
64
 
65
- model = EBENGenerator.from_pretrained("Cnam-LMSSC/EBEN_YOUR_MIC")
66
  test_dataset = load_dataset("Cnam-LMSSC/vibravox", "speech_clean", split="test", streaming=True)
67
 
68
- audio_48kHz = torch.Tensor(next(iter(test_dataset))["audio.YOUR_MIC"]["array"])
69
  audio_16kHz = torchaudio.functional.resample(audio_48kHz, orig_freq=48_000, new_freq=16_000)
70
 
71
  cut_audio_16kHz = model.cut_to_valid_length(audio_16kHz[None, None, :])
72
- enhanced_audio_16kHz = model(cut_audio_16kHz)
73
  ```
 
1
  ---
 
 
2
  language: fr
 
3
  license: mit
4
+ library_name: transformers
5
  tags:
6
+ - audio
7
+ - audio-to-audio
8
+ - speech
9
+ datasets:
10
+ - Cnam-LMSSC/vibravox
11
  model-index:
12
+ - name: EBEN(M=4,P=4,Q=4)
13
+ results:
14
+ - task:
15
+ name: Bandwidth Extension
16
+ type: speech-enhancement
17
+ dataset:
18
+ name: Vibravox["headset_microphone"] to Vibravox["forehead_accelerometer"]
19
+ type: Cnam-LMSSC/vibravox
20
+ args: fr
21
+ metrics:
22
+ - name: Test STOI, in-domain training
23
+ type: stoi
24
+ value: 0.7477
 
 
 
25
  ---
26
 
27
  <p align="center">
 
31
  # Model Card
32
 
33
  - **Developed by:** [Cnam-LMSSC](https://huggingface.co/Cnam-LMSSC)
34
+ - **Model:** [EBEN(M=4,P=4,Q=4)](https://github.com/jhauret/vibravox/blob/main/vibravox/torch_modules/dnn/eben_generator.py) (see [publication in IEEE TASLP](https://ieeexplore.ieee.org/document/10244161) - [arXiv link](https://arxiv.org/abs/2303.10008))
35
  - **Language:** French
36
  - **License:** MIT
37
  - **Training dataset:** `speech_clean` subset of [Cnam-LMSSC/vibravox](https://huggingface.co/datasets/Cnam-LMSSC/vibravox)
 
39
 
40
  ## Overview
41
 
42
+ This model, trained on [Vibravox](https://huggingface.co/datasets/Cnam-LMSSC/vibravox) body conduction sensor data, map clean speech to body-conducted speech.
43
 
44
  ## Disclaimer
45
  This model, trained for **a specific non-conventional speech sensor**, is intended to be used with **in-domain data**. Using it with other sensor data may lead to suboptimal performance.
46
 
 
 
 
 
 
 
 
47
 
48
  ## Inference script :
49
 
 
52
  from vibravox.torch_modules.dnn.eben_generator import EBENGenerator
53
  from datasets import load_dataset
54
 
55
+ model = EBENGenerator.from_pretrained("Cnam-LMSSC/EBEN_reverse_throat_microphone")
56
  test_dataset = load_dataset("Cnam-LMSSC/vibravox", "speech_clean", split="test", streaming=True)
57
 
58
+ audio_48kHz = torch.Tensor(next(iter(test_dataset))["audio.headset_microphone"]["array"])
59
  audio_16kHz = torchaudio.functional.resample(audio_48kHz, orig_freq=48_000, new_freq=16_000)
60
 
61
  cut_audio_16kHz = model.cut_to_valid_length(audio_16kHz[None, None, :])
62
+ degraded_audio_16kHz, _ = model(cut_audio_16kHz)
63
  ```