--- license: mit language: - es base_model: - openai/whisper-large-v3-turbo tags: - susurro - audio - whisper --- # Susurro: Spanish Speech Recognition Model ## Model Description Susurro is a fine-tuned version of OpenAI's Whisper model, specifically optimized for Spanish speech recognition. This model has been trained on Spanish speech datasets to improve its performance for Spanish language transcription tasks. ## Training Data The model was trained on a Spanish speech dataset consisting of: - Training set: Spanish speech audio samples - Test set: Separate validation audio samples - Audio sampling rate: 16kHz - Language: Spanish - Task: Speech transcription ## Training Procedure The model was trained using the following configuration: - Base model: openai/whisper-large-v3-turbo - Training type: Fine-tuning - Batch size: 2 per device - Gradient accumulation steps: 16 - Learning rate: 1e-5 - Warmup steps: 500 - Max steps: 8000 - Training optimizations: - Gradient checkpointing enabled - FP16 training - 8-bit Adam optimizer ## Intended Uses This model is designed for: - Spanish speech recognition - Audio transcription in Spanish - Real-time speech-to-text applications ## How to Use ```python from transformers import WhisperProcessor, WhisperForConditionalGeneration import torch # Load model and processor processor = WhisperProcessor.from_pretrained("IsmaelRR/SusurroModel-WhisperTurboV3Spanish") model = WhisperForConditionalGeneration.from_pretrained("IsmaelRR/SusurroModel-WhisperTurboV3Spanish") # If you have GPU device = "cuda" if torch.cuda.is_available() else "cpu" model = model.to(device) # Process your audio file # Note: Make sure your audio is sampled at 16kHz input_features = processor( audio["array"], sampling_rate=16000, return_tensors="pt" ).input_features.to(device) # Generate transcription predicted_ids = model.generate(input_features) transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True) print(transcription) ``` ## Limitations - The model is specifically trained for Spanish language and may not perform well with other languages - Audio input should be sampled at 16kHz for optimal performance - Performance may vary with different audio qualities and accents ## Training Infrastructure - Training framework: 🤗 Transformers - Python version: 3.8+ - Key dependencies: - transformers - torch - datasets - numpy ## Citation If you use this model in your research, please cite: ``` @misc{susurro2024, author = {Your Name}, title = {Susurro: Fine-tuned Whisper Model for Spanish Speech Recognition}, year = {2024}, publisher = {Hugging Face}, journal = {Hugging Face Model Hub}, howpublished = {\url{https://huggingface.co/IsmaelRR/SusurroModel-WhisperTurboV3Spanish}} } ``` ## License MIT ## Acknowledgements This model builds upon the OpenAI Whisper model and was trained using the Hugging Face Transformers library. Special thanks to the open-source community and contributors.