Spaces:
Running
Running
chore: Fix HF space
Browse files
no_header_readme.md
ADDED
@@ -0,0 +1,76 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Pronunciation Trainer π£οΈ
|
2 |
+
|
3 |
+
This repository/app showcases how a [phoneme-based pronunciation trainer](https://github.com/pwenker/pronunciation_trainer/blob/main/docs/phoneme_based_solution.md)
|
4 |
+
(including personalized LLM-based feedback) overcomes the limitations of a [grapheme-based approach](https://github.com/pwenker/pronunciation_trainer/blob/main/docs/grapheme_based_solution.md)
|
5 |
+
|
6 |
+
| Feature | Grapheme-Based Solution | Phoneme-Based Solution |
|
7 |
+
|-----------------------------------|----------------------------------------------------------|---------------------------------------------------------|
|
8 |
+
| **Input Type** | Text transcriptions of speech | Audio files and phoneme transcriptions |
|
9 |
+
| **Feedback Mechanism** | Comparison of grapheme sequences | Comparison of phoneme sequences and advanced LLM-based feedback |
|
10 |
+
| **Technological Approach** | Simple text comparison using `SequenceMatcher` | Advanced ASR models like Wav2Vec2 for phoneme recognition |
|
11 |
+
| **Feedback Detail** | Basic similarity score and diff | Detailed phoneme comparison, LLM-based feedback including motivational and corrective elements |
|
12 |
+
| **Error Sensitivity** | Sensitive to homophones and transcription errors | More accurate in capturing pronunciation nuances |
|
13 |
+
| **Suprasegmental Features** | Does not capture (stress, intonation) | Potentially captures through phoneme dynamics and advanced evaluation |
|
14 |
+
| **Personalization** | Limited to error feedback based on text similarity | Advanced personalization considering learner's native language and target language proficiency |
|
15 |
+
| **Scalability** | Easy to scale with basic text processing tools | Requires more computational resources for ASR and LLM processing |
|
16 |
+
| **Cost** | Lower, primarily involves basic computational resources | Higher, due to usage of advanced APIs and model processing |
|
17 |
+
| **Accuracy** | Lower, prone to misinterpretations of homophones | Higher, better at handling diverse pronunciation patterns (but LLM hallucinations) |
|
18 |
+
| **Feedback Quality** | Basic, often not linguistically rich | Rich, detailed, personalized, and linguistically informed |
|
19 |
+
| **Potential for Learning** | Limited to recognizing text differences | High, includes phonetic and prosodic feedback, as well as resource and practice recommendations |
|
20 |
+
|
21 |
+
## Quickstart π
|
22 |
+
|
23 |
+
### π Click here to try out the app directly:
|
24 |
+
[**Pronunciation Trainer App**](https://pwenker-pronunciation_trainer.hf.space/)
|
25 |
+
|
26 |
+
### π Inspect the code at:
|
27 |
+
- **GitHub:** [pwenker/pronunciation_trainer](https://github.com/pwenker/pronounciation_trainer)
|
28 |
+
- **Hugging Face Spaces:** [pwenker/pronunciation_trainer](https://huggingface.co/spaces/pwenker/pronounciation_trainer)
|
29 |
+
|
30 |
+
### π Read about the pronounciation trainer:
|
31 |
+
|
32 |
+
1. [Grapheme-based Approach](https://github.com/pwenker/pronunciation_trainer/blob/main/docs/grapheme_based_solution.md)
|
33 |
+
2. [Phoneme-based Approach](https://github.com/pwenker/pronunciation_trainer/blob/main/docs/phoneme_based_solution.md)
|
34 |
+
|
35 |
+
|
36 |
+
## Local Deployment π
|
37 |
+
|
38 |
+
### Prerequisites π
|
39 |
+
|
40 |
+
#### Rye πΎ
|
41 |
+
[Install `Rye`](https://rye-up.com/guide/installation/#installing-rye)
|
42 |
+
> Rye is a comprehensive tool designed for Python developers. It simplifies your workflow by managing Python installations and dependencies. Simply install Rye, and it takes care of the rest.
|
43 |
+
|
44 |
+
- Create a `.env` file in the `pronunciation_trainer` folder and add the following variable:
|
45 |
+
|
46 |
+
#### OPENAI API Token π
|
47 |
+
```
|
48 |
+
OPENAI_TOKEN=... # Token for the OpenAI API
|
49 |
+
```
|
50 |
+
|
51 |
+
### Set-Up π οΈ
|
52 |
+
|
53 |
+
Clone the repository:
|
54 |
+
```
|
55 |
+
git clone [repository-url] # Replace [repository-url] with the actual URL of the repository
|
56 |
+
```
|
57 |
+
Navigate to the directory:
|
58 |
+
```
|
59 |
+
cd pronunciation_trainer
|
60 |
+
```
|
61 |
+
|
62 |
+
Create a virtual environment in `.venv` and synchronize the repo:
|
63 |
+
```
|
64 |
+
rye sync
|
65 |
+
```
|
66 |
+
For more details, visit: [Basics - Rye](https://rye-up.com/guide/basics/)
|
67 |
+
|
68 |
+
### Start the App π
|
69 |
+
|
70 |
+
Launch the app using:
|
71 |
+
```
|
72 |
+
rye run python src/pronunciation_trainer/app.py
|
73 |
+
```
|
74 |
+
|
75 |
+
Then, open your browser and visit [http://localhost:7860](http://localhost:7860/) to start practicing!
|
76 |
+
|
src/pronunciation_trainer/app.py
CHANGED
@@ -19,9 +19,8 @@ from pronunciation_trainer.transcription import (transcribe_to_graphemes,
|
|
19 |
|
20 |
with gr.Blocks() as demo:
|
21 |
with gr.Tab("Welcome"):
|
22 |
-
readme = Path("
|
23 |
-
|
24 |
-
gr.Markdown(gr_readme)
|
25 |
|
26 |
with gr.Tab("Grapheme-Based Speech Evaluation"):
|
27 |
with gr.Row():
|
|
|
19 |
|
20 |
with gr.Blocks() as demo:
|
21 |
with gr.Tab("Welcome"):
|
22 |
+
readme = Path("no_header_readme.md").read_text()
|
23 |
+
gr.Markdown(readme)
|
|
|
24 |
|
25 |
with gr.Tab("Grapheme-Based Speech Evaluation"):
|
26 |
with gr.Row():
|
src/pronunciation_trainer/transcription.py
CHANGED
@@ -5,26 +5,21 @@ The transcribe function takes a single parameter, audio, which is a numpy array
|
|
5 |
|
6 |
There are two transcriber choices available: grapheme and phoneme. The grapheme transcriber uses the openai/whisper-base.en model, while the phoneme transcriber uses the facebook/wav2vec2-lv-60-espeak-cv-ft model.
|
7 |
"""
|
8 |
-
from enum import Enum
|
9 |
from functools import partial
|
10 |
|
11 |
import numpy as np
|
12 |
from transformers import pipeline
|
13 |
|
14 |
|
15 |
-
class TranscriberChoice(str, Enum):
|
16 |
-
grapheme = "openai/whisper-base.en"
|
17 |
-
phoneme = "facebook/wav2vec2-lv-60-espeak-cv-ft"
|
18 |
-
|
19 |
-
|
20 |
def transcribe(
|
21 |
-
audio,
|
|
|
22 |
):
|
23 |
"""
|
24 |
The transcribe function takes a single parameter, audio, which is a numpy array of the audio the user recorded.
|
25 |
The pipeline object expects this in float32 format,so we convert it first to float32, and then extract the transcribed text.
|
26 |
"""
|
27 |
-
transcriber = pipeline("automatic-speech-recognition", model=transcriber_choice
|
28 |
try:
|
29 |
sr, y = audio
|
30 |
except TypeError:
|
@@ -39,5 +34,5 @@ transcribe_to_phonemes = partial(
|
|
39 |
transcribe, transcriber_choice="facebook/wav2vec2-lv-60-espeak-cv-ft"
|
40 |
)
|
41 |
transcribe_to_graphemes = partial(
|
42 |
-
transcribe, transcriber_choice=
|
43 |
)
|
|
|
5 |
|
6 |
There are two transcriber choices available: grapheme and phoneme. The grapheme transcriber uses the openai/whisper-base.en model, while the phoneme transcriber uses the facebook/wav2vec2-lv-60-espeak-cv-ft model.
|
7 |
"""
|
|
|
8 |
from functools import partial
|
9 |
|
10 |
import numpy as np
|
11 |
from transformers import pipeline
|
12 |
|
13 |
|
|
|
|
|
|
|
|
|
|
|
14 |
def transcribe(
|
15 |
+
audio,
|
16 |
+
transcriber_choice: str,
|
17 |
):
|
18 |
"""
|
19 |
The transcribe function takes a single parameter, audio, which is a numpy array of the audio the user recorded.
|
20 |
The pipeline object expects this in float32 format,so we convert it first to float32, and then extract the transcribed text.
|
21 |
"""
|
22 |
+
transcriber = pipeline("automatic-speech-recognition", model=transcriber_choice)
|
23 |
try:
|
24 |
sr, y = audio
|
25 |
except TypeError:
|
|
|
34 |
transcribe, transcriber_choice="facebook/wav2vec2-lv-60-espeak-cv-ft"
|
35 |
)
|
36 |
transcribe_to_graphemes = partial(
|
37 |
+
transcribe, transcriber_choice="openai/whisper-base.en"
|
38 |
)
|