Commit
·
0c82ac9
1
Parent(s):
9063905
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,82 @@
|
|
1 |
---
|
2 |
license: mit
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
+
tags:
|
4 |
+
- pyannote
|
5 |
+
- pyannote-audio
|
6 |
+
- pyannote-audio-pipeline
|
7 |
+
- audio
|
8 |
+
- voice
|
9 |
+
- speech
|
10 |
+
- speaker
|
11 |
+
- speaker-diarization
|
12 |
+
- speaker-change-detection
|
13 |
+
- endpoints-template
|
14 |
+
library_name: generic
|
15 |
---
|
16 |
+
# 🎹 Speaker diarization with Pyannote and Inference Endpoints
|
17 |
+
|
18 |
+
|
19 |
+
This repository implements a custom `handler` for `speaker-diarization` for 🤗 Inference Endpoints using Pyannote. The code for the customized pipeline is in the [handler.py](https://huggingface.co/philschmid/pyannote-speaker-diarization-endpoint/blob/main/handler.py).
|
20 |
+
|
21 |
+
There is also a [notebook](https://huggingface.co/philschmid/pyannote-speaker-diarization-endpoint/blob/main/create_handler.ipynb) included, on how to create the `handler.py`
|
22 |
+
|
23 |
+
### Request
|
24 |
+
|
25 |
+
The endpoint expects a binary audio file. Below are a cURL and a Python example using the `requests` library.
|
26 |
+
|
27 |
+
**curl**
|
28 |
+
|
29 |
+
```bash
|
30 |
+
# load audio file
|
31 |
+
wget https://cdn-media.huggingface.co/speech_samples/sample1.flac
|
32 |
+
|
33 |
+
# run request
|
34 |
+
curl --request POST \
|
35 |
+
--url https://{ENDPOINT}/ \
|
36 |
+
--header 'Content-Type: audio/x-wav' \
|
37 |
+
--header 'Authorization: Bearer {HF_TOKEN}' \
|
38 |
+
--data-binary '@sample.wav'
|
39 |
+
```
|
40 |
+
|
41 |
+
**Python**
|
42 |
+
|
43 |
+
```python
|
44 |
+
import json
|
45 |
+
from typing import List
|
46 |
+
import requests as r
|
47 |
+
import base64
|
48 |
+
import mimetypes
|
49 |
+
|
50 |
+
ENDPOINT_URL=""
|
51 |
+
HF_TOKEN=""
|
52 |
+
|
53 |
+
def predict(path_to_audio:str=None):
|
54 |
+
# read audio file
|
55 |
+
with open(path_to_audio, "rb") as i:
|
56 |
+
b = i.read()
|
57 |
+
# get mimetype
|
58 |
+
content_type= mimetypes.guess_type(path_to_audio)[0]
|
59 |
+
|
60 |
+
headers= {
|
61 |
+
"Authorization": f"Bearer {HF_TOKEN}",
|
62 |
+
"Content-Type": content_type
|
63 |
+
}
|
64 |
+
response = r.post(ENDPOINT_URL, headers=headers, data=b)
|
65 |
+
return response.json()
|
66 |
+
|
67 |
+
prediction = predict(path_to_audio="sample.wav")
|
68 |
+
|
69 |
+
prediction
|
70 |
+
|
71 |
+
```
|
72 |
+
expected output
|
73 |
+
|
74 |
+
```json
|
75 |
+
{"diarization": [
|
76 |
+
{"label": "SPEAKER_01", "start": "0.4978125", "stop": "1.3921875"},
|
77 |
+
{"label": "SPEAKER_01", "start": "1.8984375", "stop": "2.7590624999999998"},
|
78 |
+
{"label": "SPEAKER_02", "start": "2.9953125", "stop": "3.5015625000000004"},
|
79 |
+
{"label": "SPEAKER_01", "start": "3.5690625000000002", "stop": "4.311562500000001"}
|
80 |
+
...
|
81 |
+
```
|
82 |
+
|