Update README.md

178cd78 verified about 2 months ago

4.05 kB

	---
	license: apache-2.0
	language:
	- es
	base_model:
	- pyannote/segmentation-3.0
	library_name: pyannote-audio
	tags:
	- pyannote
	- pyannote-audio
	- audio
	- voice
	- speech
	- speaker
	- speaker-diarization
	- segmentation
	pipeline_tag: automatic-speech-recognition
	---
	# pyannote-segmentation-3.0-RTVE-primary

	## Model Details

	This system is a collection of three fine-tuned models, to be fused with [DOVER-Lap](https://github.com/desh2608/dover-lap).
	Each models is fine-tuned monitoring a different metric component of Diarization Error Rate (i.e., False Alarm, Missed Detection, and Speaker Confusion).
	More information about the fusion of these models can be found in this [paper](https://www.isca-archive.org/iberspeech_2024/souganidis24_iberspeech.html).

	Each model is a fine-tuned version of [pyannote/segmentation-3.0](https://huggingface.co/pyannote/segmentation-3.0) on [the RTVE database](https://catedrartve.unizar.es/rtvedatabase.html) used for Albayzin Evaluations of IberSPEECH 2024.

	On the RTVE2024 test set it achives the following results (two-decimal rounding), being the best-performing system of Albayzin Evaluations 2024:

	- Diarization Error Rate (DER): 14.98%
	- False Alarm: 2.64%
	- Missed Detection: 4.54%
	- Speaker Confusion: 7.80%


	## Uses

	<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
	This system is intented to be used for speaker diarization of TV shows.

	## Usage

	The instructions to obtain the RTTM output of each model can be found [here](https://huggingface.co/pyannote/speaker-diarization-3.1), using this [configuration file](config.yaml)

	Once obtained, [this script](primary_fusion.py) can be modified to obtain the fusion of each model's output.

	## Training Details

	### Training Data

	<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->

	The [train.lst](train.lst) file includes the URIs of the training data.



	#### Training Hyperparameters

	Model: <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->

	- duration: 10.0
	- max_speakers_per_chunk: 3
	- max_speakers_per_frame: 2
	- train_batch_size: 32
	- powerset_max_classes: 2

	Adam Optimizer:
	- lr: 0.0001

	Early Stopping:

	- direction: 'min'
	- max_epochs: 20

	### Development Data

	The [development.lst](development.lst) file includes the URIs of the development data.

	## Evaluation

	<!-- This section describes the evaluation protocols and provides the results. -->

	- Forgiveness collar: 250ms
	- Skip overlap: False

	### Testing Data & Metrics

	#### Testing Data

	<!-- This should link to a Dataset Card if possible. -->

	The [test.lst](test.lst) file includes the URIs of the testing data.


	#### Metrics

	Diarization Error Rate, False Alarm, Missed Detection, Speaker Confusion.


	## Citation

	<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
	If you use these models, please cite:


	BibTeX:
	```bibtex
	@inproceedings{souganidis24_iberspeech,
	title = {HiTZ-Aholab Speaker Diarization System for Albayzin Evaluations of IberSPEECH 2024},
	author = {Christoforos Souganidis and Gemma Meseguer and Asier Herranz and Inma {Hernáez Rioja} and Eva Navas and Ibon Saratxaga},
	year = {2024},
	booktitle = {IberSPEECH 2024},
	pages = {327--330},
	doi = {10.21437/IberSPEECH.2024-68},
	}
	````

	## Acknowledgments

	This project with reference 2022/TL22/00215335 has been parcially funded by the Ministerio de Transformación Digital and by the Plan de Recuperación, Transformación y Resiliencia – Funded by the European Union – NextGenerationEU [ILENIA](https://proyectoilenia.es/) and by the project [IkerGaitu](https://www.hitz.eus/iker-gaitu/) funded by the Basque Government.