pwenker commited on
Commit
8a8d8f4
Β·
1 Parent(s): 6e52a9c

chore: Fix HF space

Browse files
no_header_readme.md ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Pronunciation Trainer πŸ—£οΈ
2
+
3
+ This repository/app showcases how a [phoneme-based pronunciation trainer](https://github.com/pwenker/pronunciation_trainer/blob/main/docs/phoneme_based_solution.md)
4
+ (including personalized LLM-based feedback) overcomes the limitations of a [grapheme-based approach](https://github.com/pwenker/pronunciation_trainer/blob/main/docs/grapheme_based_solution.md)
5
+
6
+ | Feature | Grapheme-Based Solution | Phoneme-Based Solution |
7
+ |-----------------------------------|----------------------------------------------------------|---------------------------------------------------------|
8
+ | **Input Type** | Text transcriptions of speech | Audio files and phoneme transcriptions |
9
+ | **Feedback Mechanism** | Comparison of grapheme sequences | Comparison of phoneme sequences and advanced LLM-based feedback |
10
+ | **Technological Approach** | Simple text comparison using `SequenceMatcher` | Advanced ASR models like Wav2Vec2 for phoneme recognition |
11
+ | **Feedback Detail** | Basic similarity score and diff | Detailed phoneme comparison, LLM-based feedback including motivational and corrective elements |
12
+ | **Error Sensitivity** | Sensitive to homophones and transcription errors | More accurate in capturing pronunciation nuances |
13
+ | **Suprasegmental Features** | Does not capture (stress, intonation) | Potentially captures through phoneme dynamics and advanced evaluation |
14
+ | **Personalization** | Limited to error feedback based on text similarity | Advanced personalization considering learner's native language and target language proficiency |
15
+ | **Scalability** | Easy to scale with basic text processing tools | Requires more computational resources for ASR and LLM processing |
16
+ | **Cost** | Lower, primarily involves basic computational resources | Higher, due to usage of advanced APIs and model processing |
17
+ | **Accuracy** | Lower, prone to misinterpretations of homophones | Higher, better at handling diverse pronunciation patterns (but LLM hallucinations) |
18
+ | **Feedback Quality** | Basic, often not linguistically rich | Rich, detailed, personalized, and linguistically informed |
19
+ | **Potential for Learning** | Limited to recognizing text differences | High, includes phonetic and prosodic feedback, as well as resource and practice recommendations |
20
+
21
+ ## Quickstart πŸš€
22
+
23
+ ### πŸ‘‰ Click here to try out the app directly:
24
+ [**Pronunciation Trainer App**](https://pwenker-pronunciation_trainer.hf.space/)
25
+
26
+ ### πŸ” Inspect the code at:
27
+ - **GitHub:** [pwenker/pronunciation_trainer](https://github.com/pwenker/pronounciation_trainer)
28
+ - **Hugging Face Spaces:** [pwenker/pronunciation_trainer](https://huggingface.co/spaces/pwenker/pronounciation_trainer)
29
+
30
+ ### πŸ“š Read about the pronounciation trainer:
31
+
32
+ 1. [Grapheme-based Approach](https://github.com/pwenker/pronunciation_trainer/blob/main/docs/grapheme_based_solution.md)
33
+ 2. [Phoneme-based Approach](https://github.com/pwenker/pronunciation_trainer/blob/main/docs/phoneme_based_solution.md)
34
+
35
+
36
+ ## Local Deployment 🏠
37
+
38
+ ### Prerequisites πŸ“‹
39
+
40
+ #### Rye 🌾
41
+ [Install `Rye`](https://rye-up.com/guide/installation/#installing-rye)
42
+ > Rye is a comprehensive tool designed for Python developers. It simplifies your workflow by managing Python installations and dependencies. Simply install Rye, and it takes care of the rest.
43
+
44
+ - Create a `.env` file in the `pronunciation_trainer` folder and add the following variable:
45
+
46
+ #### OPENAI API Token πŸ”‘
47
+ ```
48
+ OPENAI_TOKEN=... # Token for the OpenAI API
49
+ ```
50
+
51
+ ### Set-Up πŸ› οΈ
52
+
53
+ Clone the repository:
54
+ ```
55
+ git clone [repository-url] # Replace [repository-url] with the actual URL of the repository
56
+ ```
57
+ Navigate to the directory:
58
+ ```
59
+ cd pronunciation_trainer
60
+ ```
61
+
62
+ Create a virtual environment in `.venv` and synchronize the repo:
63
+ ```
64
+ rye sync
65
+ ```
66
+ For more details, visit: [Basics - Rye](https://rye-up.com/guide/basics/)
67
+
68
+ ### Start the App 🌟
69
+
70
+ Launch the app using:
71
+ ```
72
+ rye run python src/pronunciation_trainer/app.py
73
+ ```
74
+
75
+ Then, open your browser and visit [http://localhost:7860](http://localhost:7860/) to start practicing!
76
+
src/pronunciation_trainer/app.py CHANGED
@@ -19,9 +19,8 @@ from pronunciation_trainer.transcription import (transcribe_to_graphemes,
19
 
20
  with gr.Blocks() as demo:
21
  with gr.Tab("Welcome"):
22
- readme = Path("README.md").read_text()
23
- gr_readme = readme.split('---')[2].strip()
24
- gr.Markdown(gr_readme)
25
 
26
  with gr.Tab("Grapheme-Based Speech Evaluation"):
27
  with gr.Row():
 
19
 
20
  with gr.Blocks() as demo:
21
  with gr.Tab("Welcome"):
22
+ readme = Path("no_header_readme.md").read_text()
23
+ gr.Markdown(readme)
 
24
 
25
  with gr.Tab("Grapheme-Based Speech Evaluation"):
26
  with gr.Row():
src/pronunciation_trainer/transcription.py CHANGED
@@ -5,26 +5,21 @@ The transcribe function takes a single parameter, audio, which is a numpy array
5
 
6
  There are two transcriber choices available: grapheme and phoneme. The grapheme transcriber uses the openai/whisper-base.en model, while the phoneme transcriber uses the facebook/wav2vec2-lv-60-espeak-cv-ft model.
7
  """
8
- from enum import Enum
9
  from functools import partial
10
 
11
  import numpy as np
12
  from transformers import pipeline
13
 
14
 
15
- class TranscriberChoice(str, Enum):
16
- grapheme = "openai/whisper-base.en"
17
- phoneme = "facebook/wav2vec2-lv-60-espeak-cv-ft"
18
-
19
-
20
  def transcribe(
21
- audio, transcriber_choice: str,
 
22
  ):
23
  """
24
  The transcribe function takes a single parameter, audio, which is a numpy array of the audio the user recorded.
25
  The pipeline object expects this in float32 format,so we convert it first to float32, and then extract the transcribed text.
26
  """
27
- transcriber = pipeline("automatic-speech-recognition", model=transcriber_choice.value)
28
  try:
29
  sr, y = audio
30
  except TypeError:
@@ -39,5 +34,5 @@ transcribe_to_phonemes = partial(
39
  transcribe, transcriber_choice="facebook/wav2vec2-lv-60-espeak-cv-ft"
40
  )
41
  transcribe_to_graphemes = partial(
42
- transcribe, transcriber_choice= "openai/whisper-base.en"
43
  )
 
5
 
6
  There are two transcriber choices available: grapheme and phoneme. The grapheme transcriber uses the openai/whisper-base.en model, while the phoneme transcriber uses the facebook/wav2vec2-lv-60-espeak-cv-ft model.
7
  """
 
8
  from functools import partial
9
 
10
  import numpy as np
11
  from transformers import pipeline
12
 
13
 
 
 
 
 
 
14
  def transcribe(
15
+ audio,
16
+ transcriber_choice: str,
17
  ):
18
  """
19
  The transcribe function takes a single parameter, audio, which is a numpy array of the audio the user recorded.
20
  The pipeline object expects this in float32 format,so we convert it first to float32, and then extract the transcribed text.
21
  """
22
+ transcriber = pipeline("automatic-speech-recognition", model=transcriber_choice)
23
  try:
24
  sr, y = audio
25
  except TypeError:
 
34
  transcribe, transcriber_choice="facebook/wav2vec2-lv-60-espeak-cv-ft"
35
  )
36
  transcribe_to_graphemes = partial(
37
+ transcribe, transcriber_choice="openai/whisper-base.en"
38
  )