File size: 4,751 Bytes
7c03685
 
 
 
 
 
 
 
 
20943e6
 
6e52a9c
6c76925
20943e6
2a67a0f
 
20943e6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2a67a0f
20943e6
 
2a67a0f
 
20943e6
2a67a0f
6e52a9c
 
 
 
 
20943e6
 
 
 
 
 
 
 
 
 
aeb7880
20943e6
aeb7880
20943e6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
---
title: Pronunciation Trainer
emoji: πŸ—£οΈ
colorFrom: blue
colorTo: red
sdk: gradio
app_file: src/pronunciation_trainer/app.py
---

# Pronunciation Trainer πŸ—£οΈ

This repository/app showcases how a [phoneme-based pronunciation trainer](https://github.com/pwenker/pronunciation_trainer/blob/main/docs/phoneme_based_solution.md)
(including personalized LLM-based feedback) overcomes the limitations of a [grapheme-based approach](https://github.com/pwenker/pronunciation_trainer/blob/main/docs/grapheme_based_solution.md)

For convenience, you find a feature comparison overview of the two solutions below:

| Feature                           | Grapheme-Based Solution                                  | Phoneme-Based Solution                                  |
|-----------------------------------|----------------------------------------------------------|---------------------------------------------------------|
| **Input Type**                    | Text transcriptions of speech                            | Audio files and phoneme transcriptions                  |
| **Feedback Mechanism**            | Comparison of grapheme sequences                         | Comparison of phoneme sequences and advanced LLM-based feedback |
| **Technological Approach**        | Simple text comparison using `SequenceMatcher`           | Advanced ASR models like Wav2Vec2 for phoneme recognition |
| **Feedback Detail**               | Basic similarity score and diff                          | Detailed phoneme comparison, LLM-based feedback including motivational and corrective elements |
| **Error Sensitivity**             | Sensitive to homophones and transcription errors         | More accurate in capturing pronunciation nuances        |
| **Suprasegmental Features**       | Does not capture (stress, intonation)                    | Potentially captures through phoneme dynamics and advanced evaluation |
| **Personalization**               | Limited to error feedback based on text similarity       | Advanced personalization considering learner's native language and target language proficiency |
| **Scalability**                   | Easy to scale with basic text processing tools           | Requires more computational resources for ASR and LLM processing |
| **Cost**                          | Lower, primarily involves basic computational resources   | Higher, due to usage of advanced APIs and model processing |
| **Accuracy**                      | Lower, prone to misinterpretations of homophones         | Higher, better at handling diverse pronunciation patterns (but LLM hallucinations) |
| **Feedback Quality**              | Basic, often not linguistically rich                     | Rich, detailed, personalized, and linguistically informed              |
| **Potential for Learning**        | Limited to recognizing text differences                   | High, includes phonetic and prosodic feedback, as well as resource and practice recommendations           |

## Quickstart πŸš€

### πŸ‘‰ Click here to try out the app directly:
[**Pronunciation Trainer App**](https://pwenker-pronunciation-trainer.hf.space/)

### πŸ” Inspect the code at:
- **GitHub:** [pwenker/pronunciation_trainer](https://github.com/pwenker/pronunciation_trainer)
- **Hugging Face Spaces:** [pwenker/pronunciation_trainer](https://huggingface.co/spaces/pwenker/pronunciation_trainer)

### πŸ“š Read about the pronunciation trainer:

1. [Grapheme-based Approach](https://github.com/pwenker/pronunciation_trainer/blob/main/docs/grapheme_based_solution.md)
2. [Phoneme-based Approach](https://github.com/pwenker/pronunciation_trainer/blob/main/docs/phoneme_based_solution.md)


## Local Deployment 🏠

### Prerequisites πŸ“‹

#### Rye 🌾
[Install `Rye`](https://rye-up.com/guide/installation/#installing-rye)
> Rye is a comprehensive tool designed for Python developers. It simplifies your workflow by managing Python installations and dependencies. Simply install Rye, and it takes care of the rest.

- Create a `.env` file in the `pronunciation_trainer` folder and add the following variable:

#### OPENAI API Key πŸ”‘
```
OPENAI_API_KEY=... # Token for the OpenAI API
```

### Set-Up πŸ› οΈ

Clone the repository:
```
git clone [repository-url] # Replace [repository-url] with the actual URL of the repository
```
Navigate to the directory:
```
cd pronunciation_trainer
```

Create a virtual environment in `.venv` and synchronize the repo:
```
rye sync
```
For more details, visit: [Basics - Rye](https://rye-up.com/guide/basics/)

### Start the App 🌟

Launch the app using:
```
rye run python src/pronunciation_trainer/app.py
```

Then, open your browser and visit [http://localhost:7860](http://localhost:7860/) to start practicing!