VocRT

This repository contains the complete codebase for building your personal Realtime Voice-to-Voice (V2V) solution. It integrates a powerful TTS model, gRPC communication, an Express server, and a React-based client. Follow this guide to set up and explore the system effectively.

Repository Structure

├── backend/         # Express server for handling API requests
├── frontend/        # React client for user interaction
├── .env             # Environment variables (OpenAI API key, etc.)
├── voices           # All available voices
├── demo             # Contains sample audio and demo files
├── other...

Docker

🐳 VocRT on Docker Hub: https://hub.docker.com/r/anuragsingh922/vocrt

Repository

Setup Guide

Step 1: Clone the Repository

Clone this repository to your local machine:

git clone https://huggingface.co/anuragsingh922/VocRT
cd VocRT

Step 2: Python Virtual Environment Setup

Create a virtual environment to manage dependencies:

macOS/Linux:

python3 -m venv venv
source venv/bin/activate

Windows:

python -m venv venv
venv\Scripts\activate

Step 3: Install Python Dependencies

With the virtual environment activated, install the required dependencies:

pip install --upgrade pip setuptools wheel
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu
pip install phonemizer transformers scipy munch python-dotenv openai grpcio grpcio-tools

Installing eSpeak

eSpeak is a necessary dependency for the VocRT system. Follow the instructions below to install it on your platform:

Ubuntu/Linux

Use the apt-get package manager to install eSpeak:

sudo apt-get update
sudo apt-get install espeak

macOS

Install eSpeak using Homebrew:

Ensure Homebrew is installed on your system:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Install espeak:
```
brew install espeak
```

Windows

For Windows, follow these steps to install eSpeak:

Download the eSpeak installer from the official website: eSpeak Downloads.
Run the installer and follow the on-screen instructions to complete the installation.
Add the eSpeak installation path to your system's PATH environment variable:
- Open System Properties → Advanced → Environment Variables.
- In the "System Variables" section, find the Path variable and edit it.
- Add the path to the espeak.exe file (e.g., C:\Program Files (x86)\eSpeak).
Verify the installation: Open Command Prompt and run:
```
espeak --version
```

Verification

After installing eSpeak, verify it is correctly set up by running:

espeak "Hello, world!"

This should output "Hello, world!" as audio on your system.

Step 4: Backend Setup (Express Server)

Navigate to the backend directory:
```
cd backend
```
Install Node.js dependencies:
```
npm install
```
Update the config.env file with your Deepgram API key:
- Open config.env in a text editor.
- Replace <deepgram_api_key> with your actual Deepgram API key.
Start the Express server:
```
node app.js
```

Step 5: Frontend Setup (React Client)

Open a new terminal and navigate to the frontend directory:
```
cd frontend
```
Install client dependencies:
```
npm install
```
Start the client:
```
npm start
```

Step 6: Start the VocRT Server

Add your OpenAI API key to the .env file:
- Open .env in a text editor.
- Replace <openai_api_key> with your actual OpenAI API key.
Start the VocRT server:
```
python3 app.py
```

Step 7: Test the Full System

Once all servers are running:
1. Access the React client at http://localhost:3000.
2. Interact with the VocRT system via the web interface.

Model Used

VocRT uses Kokoro-82M for text-to-speech synthesis, processing user inputs into high-quality voice responses.

Key Features

Realtime voice response generation: Convert speech input into speech with minimal latency.
React Client: A user-friendly frontend for interaction.
Express Backend: Handles API requests and integrates the VocRT system with external services.
gRPC Communication: Seamless communication between the VocRT server and other components.
Configurable APIs: Integrates with OpenAI and Deepgram APIs for speech recognition and text generation.

Dependencies

Python:

torch, torchvision, torchaudio
phonemizer
transformers
scipy
munch
python-dotenv
openai
grpcio, grpcio-tools
espeak

Node.js:

Express server dependencies (npm install in backend).
React client dependencies (npm install in frontend).

Contributing

Contributions are welcome! Feel free to fork this repository and create a pull request with your improvements.

Acknowledgments

Hugging Face for hosting the Kokoro-82M model.
The amazing communities behind PyTorch, OpenAI, and Deepgram APIs.

anuragsingh922
/

VocRT