PHASE 2 // MULTIMODAL ORCHESTRATION & DUAL CLIENTS - VTUBE STUDIO

MULTIMODAL INTELLIGENCE / SYSTEMS

ABOUT THIS PROJECT

A modular, local-first conversational companion and Discord assistant framework built using an asynchronous Python framework to operate entirely inside consumer-grade hardware.

The system segregates cognitive processing (Ollama local LLM execution) and high-fidelity expressive voice cloning (GPT-SoVITS Auto-Regressive TTS) into a decoupled microservice architecture. It orchestrates real-time audio perception via Faster-Whisper, masters the synthesized output in real time using a studio-grade Pedalboard DSP signal chain, and streams frequency data into the VTube Studio API to dynamically drive 2D/3D blendshapes.

Additionally, the engine implements a pluggable dual-client topology, enabling a lightweight Discord bot client to run concurrently with the desktop companion, routing interactions through a shared, CUDA-accelerated central brain and a unified PostgreSQL pgvector semantic RAG memory core.