Documentation

Everything you need to know about OSIA

Voice Control

OSIA's voice system uses local speech-to-text (faster-whisper) for privacy and multiple TTS providers for natural responses.

Wake Word System

Detection: openWakeWord with custom ML verifier
Threshold: 0.85 confidence with 8-frame patience
Cooldown: 1.5 seconds between activations
Dataset: ~294 positive + ~500 negative samples (synthetic)
Model: osia_verifier.pkl

Speech-to-Text (STT)

OSIA uses faster-whisper for local STT processing. Model size can be configured (small recommended for balance).

{
  "stt_provider": "local",
  "stt_model_size": "small",
  "stt_language": "auto"
}

Text-to-Speech (TTS)

Multiple TTS providers supported with automatic fallback:

Gemini TTS: High quality, low latency (recommended)
Kokoro: Local TTS option
OpenAI TTS: Premium voices

← Quick Start Computer Use →