VocRT
This repository contains the complete codebase for building your personal Realtime Voice-to-Voice (V2V) solution. It integrates a powerful TTS model, gRPC communication, an Express server, and a React-based client. Follow this guide to set up and explore the system effectively.
Repository Structure
βββ backend/ # Express server for handling API requests
βββ frontend/ # React client for user interaction
βββ .env # Environment variables (OpenAI API key, etc.)
βββ voices # All available voices
βββ demo # Contains sample audio and demo files
βββ other...
Docker
π³ VocRT on Docker Hub: https://hub.docker.com/r/anuragsingh922/vocrt
Repository
Setup Guide
Step 1: Clone the Repository
Clone this repository to your local machine:
git clone https://huggingface.co/anuragsingh922/VocRT
cd VocRT
Step 2: Python Virtual Environment Setup
Create a virtual environment to manage dependencies:
macOS/Linux:
python3 -m venv venv
source venv/bin/activate
Windows:
python -m venv venv
venv\Scripts\activate
Step 3: Install Python Dependencies
With the virtual environment activated, install the required dependencies:
pip install --upgrade pip setuptools wheel
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu
pip install phonemizer transformers scipy munch python-dotenv openai grpcio grpcio-tools
Installing eSpeak
eSpeak
is a necessary dependency for the VocRT system. Follow the instructions below to install it on your platform:
Ubuntu/Linux
Use the apt-get
package manager to install eSpeak
:
sudo apt-get update
sudo apt-get install espeak
macOS
Install eSpeak
using Homebrew:
- Ensure Homebrew is installed on your system:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
- Install
espeak
:brew install espeak
Windows
For Windows, follow these steps to install eSpeak
:
- Download the eSpeak installer from the official website: eSpeak Downloads.
- Run the installer and follow the on-screen instructions to complete the installation.
- Add the
eSpeak
installation path to your system'sPATH
environment variable:- Open System Properties β Advanced β Environment Variables.
- In the "System Variables" section, find the
Path
variable and edit it. - Add the path to the
espeak.exe
file (e.g.,C:\Program Files (x86)\eSpeak
).
- Verify the installation:
Open Command Prompt and run:
espeak --version
Verification
After installing eSpeak
, verify it is correctly set up by running:
espeak "Hello, world!"
This should output "Hello, world!" as audio on your system.
Step 4: Backend Setup (Express Server)
Navigate to the
backend
directory:cd backend
Install Node.js dependencies:
npm install
Update the
config.env
file with your Deepgram API key:- Open
config.env
in a text editor. - Replace
<deepgram_api_key>
with your actual Deepgram API key.
- Open
Start the Express server:
node app.js
Step 5: Frontend Setup (React Client)
- Open a new terminal and navigate to the
frontend
directory:cd frontend
- Install client dependencies:
npm install
- Start the client:
npm start
Step 6: Start the VocRT Server
Add your OpenAI API key to the
.env
file:- Open
.env
in a text editor. - Replace
<openai_api_key>
with your actual OpenAI API key.
- Open
Start the VocRT server:
python3 app.py
Step 7: Test the Full System
- Once all servers are running:
- Access the React client at http://localhost:3000.
- Interact with the VocRT system via the web interface.
Model Used
VocRT uses Kokoro-82M for text-to-speech synthesis, processing user inputs into high-quality voice responses.
Key Features
- Realtime voice response generation: Convert speech input into speech with minimal latency.
- React Client: A user-friendly frontend for interaction.
- Express Backend: Handles API requests and integrates the VocRT system with external services.
- gRPC Communication: Seamless communication between the VocRT server and other components.
- Configurable APIs: Integrates with OpenAI and Deepgram APIs for speech recognition and text generation.
Dependencies
Python:
- torch, torchvision, torchaudio
- phonemizer
- transformers
- scipy
- munch
- python-dotenv
- openai
- grpcio, grpcio-tools
- espeak
Node.js:
- Express server dependencies (
npm install
inbackend
). - React client dependencies (
npm install
infrontend
).
Contributing
Contributions are welcome! Feel free to fork this repository and create a pull request with your improvements.
Acknowledgments
- Hugging Face for hosting the Kokoro-82M model.
- The amazing communities behind PyTorch, OpenAI, and Deepgram APIs.
- Downloads last month
- 0