|
--- |
|
license: apache-2.0 |
|
language: |
|
- es |
|
base_model: |
|
- pyannote/segmentation-3.0 |
|
library_name: pyannote-audio |
|
tags: |
|
- pyannote |
|
- pyannote-audio |
|
- audio |
|
- voice |
|
- speech |
|
- speaker |
|
- speaker-diarization |
|
- segmentation |
|
pipeline_tag: automatic-speech-recognition |
|
--- |
|
# pyannote-segmentation-3.0-RTVE-primary |
|
|
|
## Model Details |
|
|
|
This system is a collection of three fine-tuned models, to be fused with [DOVER-Lap](https://github.com/desh2608/dover-lap). |
|
Each models is fine-tuned monitoring a different metric component of Diarization Error Rate (i.e., False Alarm, Missed Detection, and Speaker Confusion). |
|
More information about the fusion of these models can be found in this [paper](https://www.isca-archive.org/iberspeech_2024/souganidis24_iberspeech.html). |
|
|
|
Each model is a fine-tuned version of [pyannote/segmentation-3.0](https://huggingface.co/pyannote/segmentation-3.0) on [the RTVE database](https://catedrartve.unizar.es/rtvedatabase.html) used for Albayzin Evaluations of IberSPEECH 2024. |
|
|
|
On the RTVE2024 test set it achives the following results (two-decimal rounding), being the best-performing system of Albayzin Evaluations 2024: |
|
|
|
- Diarization Error Rate (DER): 14.98% |
|
- False Alarm: 2.64% |
|
- Missed Detection: 4.54% |
|
- Speaker Confusion: 7.80% |
|
|
|
|
|
## Uses |
|
|
|
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. --> |
|
This system is intented to be used for speaker diarization of TV shows. |
|
|
|
## Usage |
|
|
|
The instructions to obtain the RTTM output of each model can be found [here](https://huggingface.co/pyannote/speaker-diarization-3.1), using this [configuration file](config.yaml) |
|
|
|
Once obtained, [this script](primary_fusion.py) can be modified to obtain the fusion of each model's output. |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. --> |
|
|
|
The [train.lst](train.lst) file includes the URIs of the training data. |
|
|
|
|
|
|
|
#### Training Hyperparameters |
|
|
|
**Model:** <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision --> |
|
|
|
- duration: 10.0 |
|
- max_speakers_per_chunk: 3 |
|
- max_speakers_per_frame: 2 |
|
- train_batch_size: 32 |
|
- powerset_max_classes: 2 |
|
|
|
**Adam Optimizer:** |
|
- lr: 0.0001 |
|
|
|
**Early Stopping:** |
|
|
|
- direction: 'min' |
|
- max_epochs: 20 |
|
|
|
### Development Data |
|
|
|
The [development.lst](development.lst) file includes the URIs of the development data. |
|
|
|
## Evaluation |
|
|
|
<!-- This section describes the evaluation protocols and provides the results. --> |
|
|
|
- Forgiveness collar: 250ms |
|
- Skip overlap: False |
|
|
|
### Testing Data & Metrics |
|
|
|
#### Testing Data |
|
|
|
<!-- This should link to a Dataset Card if possible. --> |
|
|
|
The [test.lst](test.lst) file includes the URIs of the testing data. |
|
|
|
|
|
#### Metrics |
|
|
|
Diarization Error Rate, False Alarm, Missed Detection, Speaker Confusion. |
|
|
|
|
|
## Citation |
|
|
|
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. --> |
|
If you use these models, please cite: |
|
|
|
|
|
**BibTeX:** |
|
```bibtex |
|
@inproceedings{souganidis24_iberspeech, |
|
title = {HiTZ-Aholab Speaker Diarization System for Albayzin Evaluations of IberSPEECH 2024}, |
|
author = {Christoforos Souganidis and Gemma Meseguer and Asier Herranz and Inma {Hernáez Rioja} and Eva Navas and Ibon Saratxaga}, |
|
year = {2024}, |
|
booktitle = {IberSPEECH 2024}, |
|
pages = {327--330}, |
|
doi = {10.21437/IberSPEECH.2024-68}, |
|
} |
|
```` |
|
|
|
## Acknowledgments |
|
|
|
This project with reference 2022/TL22/00215335 has been parcially funded by the Ministerio de Transformación Digital and by the Plan de Recuperación, Transformación y Resiliencia – Funded by the European Union – NextGenerationEU [ILENIA](https://proyectoilenia.es/) and by the project [IkerGaitu](https://www.hitz.eus/iker-gaitu/) funded by the Basque Government. |