Spaces:

pwenker
/

pronunciation_trainer

Running

App Files Files Community

pwenker commited on May 11, 2024

Commit

8a8d8f4

1 Parent(s): 6e52a9c

chore: Fix HF space

Browse files

Files changed (3) hide show

no_header_readme.md +76 -0
src/pronunciation_trainer/app.py +2 -3
src/pronunciation_trainer/transcription.py +4 -9

no_header_readme.md ADDED Viewed

	@@ -0,0 +1,76 @@

+# Pronunciation Trainer 🗣️
+This repository/app showcases how a [phoneme-based pronunciation trainer](https://github.com/pwenker/pronunciation_trainer/blob/main/docs/phoneme_based_solution.md)
+(including personalized LLM-based feedback) overcomes the limitations of a [grapheme-based approach](https://github.com/pwenker/pronunciation_trainer/blob/main/docs/grapheme_based_solution.md)
+| Feature                           | Grapheme-Based Solution                                  | Phoneme-Based Solution                                  |
+|-----------------------------------|----------------------------------------------------------|---------------------------------------------------------|
+| **Input Type**                    | Text transcriptions of speech                            | Audio files and phoneme transcriptions                  |
+| **Feedback Mechanism**            | Comparison of grapheme sequences                         | Comparison of phoneme sequences and advanced LLM-based feedback |
+| **Technological Approach**        | Simple text comparison using `SequenceMatcher`           | Advanced ASR models like Wav2Vec2 for phoneme recognition |
+| **Feedback Detail**               | Basic similarity score and diff                          | Detailed phoneme comparison, LLM-based feedback including motivational and corrective elements |
+| **Error Sensitivity**             | Sensitive to homophones and transcription errors         | More accurate in capturing pronunciation nuances        |
+| **Suprasegmental Features**       | Does not capture (stress, intonation)                    | Potentially captures through phoneme dynamics and advanced evaluation |
+| **Personalization**               | Limited to error feedback based on text similarity       | Advanced personalization considering learner's native language and target language proficiency |
+| **Scalability**                   | Easy to scale with basic text processing tools           | Requires more computational resources for ASR and LLM processing |
+| **Cost**                          | Lower, primarily involves basic computational resources   | Higher, due to usage of advanced APIs and model processing |
+| **Accuracy**                      | Lower, prone to misinterpretations of homophones         | Higher, better at handling diverse pronunciation patterns (but LLM hallucinations) |
+| **Feedback Quality**              | Basic, often not linguistically rich                     | Rich, detailed, personalized, and linguistically informed              |
+| **Potential for Learning**        | Limited to recognizing text differences                   | High, includes phonetic and prosodic feedback, as well as resource and practice recommendations           |
+## Quickstart 🚀
+### 👉 Click here to try out the app directly:
+[**Pronunciation Trainer App**](https://pwenker-pronunciation_trainer.hf.space/)
+### 🔍 Inspect the code at:
+- **GitHub:** [pwenker/pronunciation_trainer](https://github.com/pwenker/pronounciation_trainer)
+- **Hugging Face Spaces:** [pwenker/pronunciation_trainer](https://huggingface.co/spaces/pwenker/pronounciation_trainer)
+### 📚 Read about the pronounciation trainer:
+1. [Grapheme-based Approach](https://github.com/pwenker/pronunciation_trainer/blob/main/docs/grapheme_based_solution.md)
+2. [Phoneme-based Approach](https://github.com/pwenker/pronunciation_trainer/blob/main/docs/phoneme_based_solution.md)
+## Local Deployment 🏠
+### Prerequisites 📋
+#### Rye 🌾
+[Install `Rye`](https://rye-up.com/guide/installation/#installing-rye)
+> Rye is a comprehensive tool designed for Python developers. It simplifies your workflow by managing Python installations and dependencies. Simply install Rye, and it takes care of the rest.
+- Create a `.env` file in the `pronunciation_trainer` folder and add the following variable:
+#### OPENAI API Token 🔑
+```
+OPENAI_TOKEN=... # Token for the OpenAI API
+```
+### Set-Up 🛠️
+Clone the repository:
+```
+git clone [repository-url] # Replace [repository-url] with the actual URL of the repository
+```
+Navigate to the directory:
+```
+cd pronunciation_trainer
+```
+Create a virtual environment in `.venv` and synchronize the repo:
+```
+rye sync
+```
+For more details, visit: [Basics - Rye](https://rye-up.com/guide/basics/)
+### Start the App 🌟
+Launch the app using:
+```
+rye run python src/pronunciation_trainer/app.py
+```
+Then, open your browser and visit [http://localhost:7860](http://localhost:7860/) to start practicing!

src/pronunciation_trainer/app.py CHANGED Viewed

@@ -19,9 +19,8 @@ from pronunciation_trainer.transcription import (transcribe_to_graphemes,
 with gr.Blocks() as demo:
     with gr.Tab("Welcome"):
-        readme = Path("README.md").read_text()
-        gr_readme = readme.split('---')[2].strip()
-        gr.Markdown(gr_readme)
     with gr.Tab("Grapheme-Based Speech Evaluation"):
         with gr.Row():

 with gr.Blocks() as demo:
     with gr.Tab("Welcome"):
+        readme = Path("no_header_readme.md").read_text()
+        gr.Markdown(readme)
     with gr.Tab("Grapheme-Based Speech Evaluation"):
         with gr.Row():

src/pronunciation_trainer/transcription.py CHANGED Viewed

@@ -5,26 +5,21 @@ The transcribe function takes a single parameter, audio, which is a numpy array
 There are two transcriber choices available: grapheme and phoneme. The grapheme transcriber uses the openai/whisper-base.en model, while the phoneme transcriber uses the facebook/wav2vec2-lv-60-espeak-cv-ft model.
 """
-from enum import Enum
 from functools import partial
 import numpy as np
 from transformers import pipeline
-class TranscriberChoice(str, Enum):
-    grapheme = "openai/whisper-base.en"
-    phoneme = "facebook/wav2vec2-lv-60-espeak-cv-ft"
 def transcribe(
-    audio, transcriber_choice: str,
 ):
     """
     The transcribe function takes a single parameter, audio, which is a numpy array of the audio the user recorded.
     The pipeline object expects this in float32 format,so we convert it first to float32, and then extract the transcribed text.
     """
-    transcriber = pipeline("automatic-speech-recognition", model=transcriber_choice.value)
     try:
         sr, y = audio
     except TypeError:
@@ -39,5 +34,5 @@ transcribe_to_phonemes = partial(
     transcribe, transcriber_choice="facebook/wav2vec2-lv-60-espeak-cv-ft"
 )
 transcribe_to_graphemes = partial(
-    transcribe, transcriber_choice= "openai/whisper-base.en"
 )

 There are two transcriber choices available: grapheme and phoneme. The grapheme transcriber uses the openai/whisper-base.en model, while the phoneme transcriber uses the facebook/wav2vec2-lv-60-espeak-cv-ft model.
 """
 from functools import partial
 import numpy as np
 from transformers import pipeline
 def transcribe(
+    audio,
+    transcriber_choice: str,
 ):
     """
     The transcribe function takes a single parameter, audio, which is a numpy array of the audio the user recorded.
     The pipeline object expects this in float32 format,so we convert it first to float32, and then extract the transcribed text.
     """
+    transcriber = pipeline("automatic-speech-recognition", model=transcriber_choice)
     try:
         sr, y = audio
     except TypeError:
     transcribe, transcriber_choice="facebook/wav2vec2-lv-60-espeak-cv-ft"
 )
 transcribe_to_graphemes = partial(
+    transcribe, transcriber_choice="openai/whisper-base.en"
 )