Documentation

Everything you need to know about OSIA

Voice Control

OSIA's voice system uses local speech-to-text (faster-whisper) for privacy and multiple TTS providers for natural responses.

Wake Word System

  • Detection: openWakeWord with custom ML verifier
  • Threshold: 0.85 confidence with 8-frame patience
  • Cooldown: 1.5 seconds between activations
  • Dataset: ~294 positive + ~500 negative samples (synthetic)
  • Model: osia_verifier.pkl

Speech-to-Text (STT)

OSIA uses faster-whisper for local STT processing. Model size can be configured (small recommended for balance).

{
  "stt_provider": "local",
  "stt_model_size": "small",
  "stt_language": "auto"
}

Text-to-Speech (TTS)

Multiple TTS providers supported with automatic fallback:

  • Gemini TTS: High quality, low latency (recommended)
  • Kokoro: Local TTS option
  • OpenAI TTS: Premium voices