Voice Control
OSIA's voice system uses local speech-to-text (faster-whisper) for privacy and multiple TTS providers for natural responses.
Wake Word System
- Detection: openWakeWord with custom ML verifier
- Threshold: 0.85 confidence with 8-frame patience
- Cooldown: 1.5 seconds between activations
- Dataset: ~294 positive + ~500 negative samples (synthetic)
- Model:
osia_verifier.pkl
Speech-to-Text (STT)
OSIA uses faster-whisper for local STT processing. Model size can be configured (small recommended for balance).
{
"stt_provider": "local",
"stt_model_size": "small",
"stt_language": "auto"
}Text-to-Speech (TTS)
Multiple TTS providers supported with automatic fallback:
- Gemini TTS: High quality, low latency (recommended)
- Kokoro: Local TTS option
- OpenAI TTS: Premium voices