Sub-modality profiles
| Profile | Use case |
|---|---|
| Real-Time Voice Assistant | Low-latency conversational / command endpoints |
| High-Noise / Industrial | Factory floors, vehicles, call centres |
| Batch Transcription / STT | Offline Whisper-style segmentation (~28 s max chunk) |
| Quiet Studio / Whisperer | Distant mic, soft speech, long pauses |
| Custom | Manual Silero parameters |
Input / output
Input:audio from Audio Track (any supported encoding; adapted to int16 @ 16 kHz mono).
Outputs (every chunk while listening):
Output buffer preset (optional)
After a full utterance is detected, audio can be re-chunked for downstream consumers:| Preset | Duration | Samples @ 16 kHz |
|---|---|---|
| None | Full segment | — |
| Wake Word Engine | 80 ms | 1280 |
| Speech-To-Text | 4 s | 64000 |
| Custom | User-defined | — |