Ayoub-Laachir commited on
Commit
246daeb
·
verified ·
1 Parent(s): 346f4e6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +127 -0
README.md CHANGED
@@ -59,6 +59,133 @@ These metrics demonstrate the model's ability to accurately transcribe Moroccan
59
 
60
  The fine-tuned model shows improved handling of Darija-specific words, sentence structure, and overall accuracy.
61
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
62
  ## Challenges and Future Improvements
63
  ### Challenges Encountered
64
  - Diverse spellings of words in Moroccan Darija
 
59
 
60
  The fine-tuned model shows improved handling of Darija-specific words, sentence structure, and overall accuracy.
61
 
62
+ ## Audio Transcription Script
63
+
64
+ This script demonstrates how to transcribe audio files using the fine-tuned Whisper Large V3 model for Moroccan Darija. It includes steps for installing necessary libraries, loading the model, and processing audio files.
65
+
66
+ ### Required Libraries
67
+
68
+ Before running the script, ensure you have the following libraries installed. You can install them using:
69
+
70
+ ```bash
71
+ !pip install --upgrade pip
72
+ !pip install --upgrade transformers accelerate librosa soundfile pydub
73
+ ```
74
+
75
+ ```python
76
+ import torch
77
+ from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
78
+ import librosa
79
+ import soundfile as sf
80
+ from pydub import AudioSegment
81
+
82
+ # Set the device to GPU if available, else use CPU
83
+ device = "cuda:0" if torch.cuda.is_available() else "cpu"
84
+ torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
85
+
86
+ # Configuration for the model
87
+ config = {
88
+ "model_id": "Ayoub-Laachir/MaghrebVoice", # Model ID from Hugging Face
89
+ "language": "arabic", # Language for transcription
90
+ "task": "transcribe", # Task type
91
+ "chunk_length_s": 30, # Length of each audio chunk in seconds
92
+ "stride_length_s": 5, # Overlap between chunks in seconds
93
+ }
94
+
95
+ # Load the model and processor
96
+ def load_model_and_processor():
97
+ try:
98
+ model = AutoModelForSpeechSeq2Seq.from_pretrained(
99
+ config["model_id"],
100
+ torch_dtype=torch_dtype, # Use appropriate data type
101
+ low_cpu_mem_usage=True, # Use low CPU memory
102
+ use_safetensors=True, # Load model with safetensors
103
+ attn_implementation="sdpa", # Specify attention implementation
104
+ )
105
+ model.to(device) # Move model to the specified device
106
+
107
+ processor = AutoProcessor.from_pretrained(config["model_id"])
108
+
109
+ print("Model and processor loaded successfully.")
110
+ return model, processor
111
+ except Exception as e:
112
+ print(f"Error loading model and processor: {e}")
113
+ return None, None
114
+
115
+ # Load the model and processor
116
+ model, processor = load_model_and_processor()
117
+ if model is None or processor is None:
118
+ print("Failed to load model or processor")
119
+ exit(1)
120
+
121
+ # Configure the generation parameters for the pipeline
122
+ generate_kwargs = {
123
+ "language": config["language"], # Language for the pipeline
124
+ "task": config["task"], # Task for the pipeline
125
+ }
126
+
127
+ # Initialize the automatic speech recognition pipeline
128
+ pipe = pipeline(
129
+ "automatic-speech-recognition",
130
+ model=model,
131
+ tokenizer=processor.tokenizer,
132
+ feature_extractor=processor.feature_extractor,
133
+ torch_dtype=torch_dtype,
134
+ device=device,
135
+ generate_kwargs=generate_kwargs,
136
+ chunk_length_s=config["chunk_length_s"], # Length of each audio chunk
137
+ stride_length_s=config["stride_length_s"], # Overlap between chunks
138
+ )
139
+
140
+ # Convert audio to 16kHz sampling rate
141
+ def convert_audio_to_16khz(input_path, output_path):
142
+ audio, sr = librosa.load(input_path, sr=None) # Load the audio file
143
+ audio_16k = librosa.resample(audio, orig_sr=sr, target_sr=16000) # Resample to 16kHz
144
+ sf.write(output_path, audio_16k, 16000) # Save the converted audio
145
+
146
+ # Format time in HH:MM:SS.milliseconds
147
+ def format_time(seconds):
148
+ hours = int(seconds // 3600)
149
+ minutes = int((seconds % 3600) // 60)
150
+ seconds = seconds % 60
151
+ return f"{hours:02d}:{minutes:02d}:{seconds:06.3f}"
152
+
153
+ # Transcribe audio file
154
+ def transcribe_audio(audio_path):
155
+ try:
156
+ result = pipe(audio_path, return_timestamps=True) # Transcribe audio and get timestamps
157
+ return result["chunks"] # Return transcription chunks
158
+ except Exception as e:
159
+ print(f"Error transcribing audio: {e}")
160
+ return None
161
+
162
+ # Main function to execute the transcription process
163
+ def main():
164
+ # Specify input and output audio paths (update paths as needed)
165
+ input_audio_path = "/path/to/your/input/audio.mp3" # Replace with your input audio path
166
+ output_audio_path = "/path/to/your/output/audio_16khz.wav" # Replace with your output audio path
167
+
168
+ # Convert audio to 16kHz
169
+ convert_audio_to_16khz(input_audio_path, output_audio_path)
170
+
171
+ # Transcribe the converted audio
172
+ transcription_chunks = transcribe_audio(output_audio_path)
173
+
174
+ if transcription_chunks:
175
+ print("WEBVTT\n") # Print header for WEBVTT format
176
+ for chunk in transcription_chunks:
177
+ start_time = format_time(chunk["timestamp"][0]) # Format start time
178
+ end_time = format_time(chunk["timestamp"][1]) # Format end time
179
+ text = chunk["text"] # Get the transcribed text
180
+ print(f"{start_time} --> {end_time}") # Print time range
181
+ print(f"{text}\n") # Print transcribed text
182
+ else:
183
+ print("Transcription failed.")
184
+
185
+ if __name__ == "__main__":
186
+ main()
187
+ ```
188
+
189
  ## Challenges and Future Improvements
190
  ### Challenges Encountered
191
  - Diverse spellings of words in Moroccan Darija