jhauret commited on
Commit
222fd51
·
verified ·
1 Parent(s): 4690398

Upload EBENGenerator after 150 epochs

Browse files
Files changed (3) hide show
  1. README.md +73 -0
  2. config.json +5 -0
  3. model.safetensors +3 -0
README.md ADDED
@@ -0,0 +1,73 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - Cnam-LMSSC/vibravox
4
+ language: fr
5
+ library_name: transformers
6
+ license: mit
7
+ tags:
8
+ - audio
9
+ - audio-to-audio
10
+ - speech
11
+ model-index:
12
+ - name: EBEN(M=?,P=?,Q=?)
13
+ results:
14
+ - task:
15
+ type: speech-enhancement
16
+ name: Bandwidth Extension
17
+ dataset:
18
+ name: Vibravox["YOUR_MIC"]
19
+ type: Cnam-LMSSC/vibravox
20
+ args: fr
21
+ metrics:
22
+ - type: stoi
23
+ value: ???
24
+ name: Test STOI, in-domain training
25
+ - type: n-mos
26
+ value: ???
27
+ name: Test Noresqa-MOS, in-domain training
28
+ ---
29
+
30
+ <p align="center">
31
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/65302a613ecbe51d6a6ddcec/zhB1fh-c0pjlj-Tr4Vpmr.png" style="object-fit:contain; width:280px; height:280px;" >
32
+ </p>
33
+
34
+ # Model Card
35
+
36
+ - **Developed by:** [Cnam-LMSSC](https://huggingface.co/Cnam-LMSSC)
37
+ - **Model:** [EBEN(M=?,P=?,Q=?)](https://github.com/jhauret/vibravox/blob/main/vibravox/torch_modules/dnn/eben_generator.py) (see [publication in IEEE TASLP](https://ieeexplore.ieee.org/document/10244161) - [arXiv link](https://arxiv.org/abs/2303.10008))
38
+ - **Language:** French
39
+ - **License:** MIT
40
+ - **Training dataset:** `speech_clean` subset of [Cnam-LMSSC/vibravox](https://huggingface.co/datasets/Cnam-LMSSC/vibravox)
41
+ - **Samplerate for usage:** 16kHz
42
+
43
+ ## Overview
44
+
45
+ This bandwidth extension model, trained on [Vibravox](https://huggingface.co/datasets/Cnam-LMSSC/vibravox) body conduction sensor data, enhances body-conducted speech audio by denoising and regenerating mid and high frequencies from low-frequency content.
46
+
47
+ ## Disclaimer
48
+ This model, trained for **a specific non-conventional speech sensor**, is intended to be used with **in-domain data**. Using it with other sensor data may lead to suboptimal performance.
49
+
50
+ ## Link to BWE models trained on other body conducted sensors :
51
+
52
+ The entry point to all EBEN models for Bandwidth Extension (BWE) is available at [https://huggingface.co/Cnam-LMSSC/vibravox_EBEN_models](https://huggingface.co/Cnam-LMSSC/vibravox_EBEN_models).
53
+
54
+ ## Training procedure
55
+
56
+ Detailed instructions for reproducing the experiments are available on the [jhauret/vibravox](https://github.com/jhauret/vibravox) Github repository.
57
+
58
+ ## Inference script :
59
+
60
+ ```python
61
+ import torch, torchaudio
62
+ from vibravox.torch_modules.dnn.eben_generator import EBENGenerator
63
+ from datasets import load_dataset
64
+
65
+ model = EBENGenerator.from_pretrained("Cnam-LMSSC/EBEN_YOUR_MIC")
66
+ test_dataset = load_dataset("Cnam-LMSSC/vibravox", "speech_clean", split="test", streaming=True)
67
+
68
+ audio_48kHz = torch.Tensor(next(iter(test_dataset))["audio.YOUR_MIC"]["array"])
69
+ audio_16kHz = torchaudio.functional.resample(audio_48kHz, orig_freq=48_000, new_freq=16_000)
70
+
71
+ cut_audio_16kHz = model.cut_to_valid_length(audio_16kHz[None, None, :])
72
+ enhanced_audio_16kHz = model(cut_audio_16kHz)
73
+ ```
config.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "m": 4,
3
+ "n": 32,
4
+ "p": 4
5
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e73c0d486b2dbc0f5e953f706e209e9e4cdd773ab71ff862ebeb82dc8444b0be
3
+ size 7798600