---
license: mit
language:
- es
base_model:
- openai/whisper-large-v3-turbo
tags:
- susurro
- audio
- whisper
---
# Susurro: Spanish Speech Recognition Model

## Model Description

Susurro is a fine-tuned version of OpenAI's Whisper model, specifically optimized for Spanish speech recognition. This model has been trained on Spanish speech datasets to improve its performance for Spanish language transcription tasks.

## Training Data

The model was trained on a Spanish speech dataset consisting of:
- Training set: Spanish speech audio samples
- Test set: Separate validation audio samples
- Audio sampling rate: 16kHz
- Language: Spanish
- Task: Speech transcription

## Training Procedure

The model was trained using the following configuration:
- Base model: openai/whisper-large-v3-turbo
- Training type: Fine-tuning
- Batch size: 2 per device
- Gradient accumulation steps: 16
- Learning rate: 1e-5
- Warmup steps: 500
- Max steps: 8000
- Training optimizations:
  - Gradient checkpointing enabled
  - FP16 training
  - 8-bit Adam optimizer

## Intended Uses

This model is designed for:
- Spanish speech recognition
- Audio transcription in Spanish
- Real-time speech-to-text applications

## How to Use

```python
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torch

# Load model and processor
processor = WhisperProcessor.from_pretrained("IsmaelRR/SusurroModel-WhisperTurboV3Spanish")
model = WhisperForConditionalGeneration.from_pretrained("IsmaelRR/SusurroModel-WhisperTurboV3Spanish")

# If you have GPU
device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)

# Process your audio file
# Note: Make sure your audio is sampled at 16kHz
input_features = processor(
    audio["array"], 
    sampling_rate=16000, 
    return_tensors="pt"
).input_features.to(device)

# Generate transcription
predicted_ids = model.generate(input_features)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
print(transcription)
```

## Limitations

- The model is specifically trained for Spanish language and may not perform well with other languages
- Audio input should be sampled at 16kHz for optimal performance
- Performance may vary with different audio qualities and accents

## Training Infrastructure

- Training framework: 🤗 Transformers
- Python version: 3.8+
- Key dependencies:
  - transformers
  - torch
  - datasets
  - numpy

## Citation

If you use this model in your research, please cite:

```
@misc{susurro2024,
  author = {Your Name},
  title = {Susurro: Fine-tuned Whisper Model for Spanish Speech Recognition},
  year = {2024},
  publisher = {Hugging Face},
  journal = {Hugging Face Model Hub},
  howpublished = {\url{https://huggingface.co/IsmaelRR/SusurroModel-WhisperTurboV3Spanish}}
}
```

## License

MIT

## Acknowledgements

This model builds upon the OpenAI Whisper model and was trained using the Hugging Face Transformers library. Special thanks to the open-source community and contributors.