VocRT

This repository contains the complete codebase for building your personal Realtime Voice-to-Voice (V2V) solution. It integrates a powerful TTS model, gRPC communication, an Express server, and a React-based client. Follow this guide to set up and explore the system effectively.


Repository Structure

β”œβ”€β”€ backend/         # Express server for handling API requests
β”œβ”€β”€ frontend/        # React client for user interaction
β”œβ”€β”€ .env             # Environment variables (OpenAI API key, etc.)
β”œβ”€β”€ voices           # All available voices
β”œβ”€β”€ demo             # Contains sample audio and demo files
β”œβ”€β”€ other...

Docker

🐳 VocRT on Docker Hub: https://hub.docker.com/r/anuragsingh922/vocrt

Repository

Setup Guide

Step 1: Clone the Repository

Clone this repository to your local machine:

git clone https://huggingface.co/anuragsingh922/VocRT
cd VocRT

Step 2: Python Virtual Environment Setup

Create a virtual environment to manage dependencies:

macOS/Linux:

python3 -m venv venv
source venv/bin/activate

Windows:

python -m venv venv
venv\Scripts\activate

Step 3: Install Python Dependencies

With the virtual environment activated, install the required dependencies:

pip install --upgrade pip setuptools wheel
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu
pip install phonemizer transformers scipy munch python-dotenv openai grpcio grpcio-tools

Installing eSpeak

eSpeak is a necessary dependency for the VocRT system. Follow the instructions below to install it on your platform:

Ubuntu/Linux

Use the apt-get package manager to install eSpeak:

sudo apt-get update
sudo apt-get install espeak

macOS

Install eSpeak using Homebrew:

  1. Ensure Homebrew is installed on your system:
    /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
    
  2. Install espeak:
    brew install espeak
    

Windows

For Windows, follow these steps to install eSpeak:

  1. Download the eSpeak installer from the official website: eSpeak Downloads.
  2. Run the installer and follow the on-screen instructions to complete the installation.
  3. Add the eSpeak installation path to your system's PATH environment variable:
    • Open System Properties β†’ Advanced β†’ Environment Variables.
    • In the "System Variables" section, find the Path variable and edit it.
    • Add the path to the espeak.exe file (e.g., C:\Program Files (x86)\eSpeak).
  4. Verify the installation: Open Command Prompt and run:
    espeak --version
    

Verification

After installing eSpeak, verify it is correctly set up by running:

espeak "Hello, world!"

This should output "Hello, world!" as audio on your system.


Step 4: Backend Setup (Express Server)

  1. Navigate to the backend directory:

    cd backend
    
  2. Install Node.js dependencies:

    npm install
    
  3. Update the config.env file with your Deepgram API key:

    • Open config.env in a text editor.
    • Replace <deepgram_api_key> with your actual Deepgram API key.
  4. Start the Express server:

    node app.js
    

Step 5: Frontend Setup (React Client)

  1. Open a new terminal and navigate to the frontend directory:
    cd frontend
    
  2. Install client dependencies:
    npm install
    
  3. Start the client:
    npm start
    

Step 6: Start the VocRT Server

  1. Add your OpenAI API key to the .env file:

    • Open .env in a text editor.
    • Replace <openai_api_key> with your actual OpenAI API key.
  2. Start the VocRT server:

    python3 app.py
    

Step 7: Test the Full System

  • Once all servers are running:
    1. Access the React client at http://localhost:3000.
    2. Interact with the VocRT system via the web interface.

Model Used

VocRT uses Kokoro-82M for text-to-speech synthesis, processing user inputs into high-quality voice responses.


Key Features

  1. Realtime voice response generation: Convert speech input into speech with minimal latency.
  2. React Client: A user-friendly frontend for interaction.
  3. Express Backend: Handles API requests and integrates the VocRT system with external services.
  4. gRPC Communication: Seamless communication between the VocRT server and other components.
  5. Configurable APIs: Integrates with OpenAI and Deepgram APIs for speech recognition and text generation.

Dependencies

Python:

  • torch, torchvision, torchaudio
  • phonemizer
  • transformers
  • scipy
  • munch
  • python-dotenv
  • openai
  • grpcio, grpcio-tools
  • espeak

Node.js:

  • Express server dependencies (npm install in backend).
  • React client dependencies (npm install in frontend).

Contributing

Contributions are welcome! Feel free to fork this repository and create a pull request with your improvements.


Acknowledgments

  • Hugging Face for hosting the Kokoro-82M model.
  • The amazing communities behind PyTorch, OpenAI, and Deepgram APIs.
Downloads last month
0
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Model tree for anuragsingh922/VocRT

Finetuned
hexgrad/Kokoro-82M
Quantized
(6)
this model