Skip to main content
The Wake Word Engine is an edge-only voice trigger. It sleeps until it hears the wake word, then streams PCM int16 @ 16 kHz mono downstream (typically to Call Model / Whisper) until you stop speaking. Internally uses OpenWakeWord.

Typical pipeline

Audio Track (80 ms buffer) → Wake Word Engine → Call Model (Whisper) → Send Alert
Set Audio Track buffer preset to Wake Word (80 ms) — not Speech-To-Text (4 s). A 4 s upstream buffer only invokes scoring about once every 4 seconds and makes short phrases like “alexa” easy to miss. Compile rejects Audio Track speech-to-text → Wake Word Engine.
StageRecommended buffer
Audio Track → Wake Word (input)Wake Word — 80 ms / 1,280 samples
Wake Word → Whisper (output while AWAKE)Speech-To-Text — 4 s / 64,000 samples

Built-in wake words (only these six)

Legacy ids such as hey_siri are not in openWakeWord v0.5.x and are stripped at compile and runtime.
Model IDPhrase to speak
alexa”alexa”
hey_mycroft”hey mycroft”
hey_jarvis”hey jarvis”
hey_rhasspy”hey rhasspy”
weather”what’s the weather”
timer”set a 10 minute timer”
Select multiple words in the UI — the engine activates on any of them.

Detection threshold

One Detection Threshold in the workflow editor applies to every selected wake-word model (default 0.5, range 0–1). A frame triggers AWAKE when any loaded model’s score is ≥ that value.
SettingWhereDefault
Detection ThresholdWake Word node → Detection Parameters0.5
There is no per-model threshold override in the engine — tune this single knob if a phrase is too sensitive or too hard to trigger.
Pretrained models do not peak at the same level on identical audio. For example, committed test clips often score ~1.0 for alexa, ~0.68 for hey_mycroft, and ~0.41 for hey_jarvis. If hey_jarvis (or hey_rhasspy) rarely fires at 0.5, lower the threshold (e.g. 0.35–0.40) or switch models — do not expect the engine to auto-adjust per model.

Two-state machine

SLEEPING

  • Audio is scored in 80 ms frames (1,280 samples @ 16 kHz).
  • Downstream nodes receive no speech audio (only state metadata).
  • A lookback deque (default 500 ms) is used internally for detection only — it is not forwarded to STT.
  • When any selected model score ≥ threshold (default 0.5), the engine switches to AWAKE.

AWAKE

  • After detection, the engine skips the wake phrase (default 500 ms, Wake Word Trim) then buffers command-only audio.
  • One audio output is emitted per session (after silence timeout or listening limit), without the wake word in the PCM.
  • Listening ends per listening policy (see below), then state returns to SLEEPING.

Prediction keys (important)

openWakeWord stores files like alexa_v0.1.onnx, but predict() returns scores under short keys such as alexa. Compiled workers use prediction_keys = list(oww_model.models.keys()) so scores are read correctly. If logs show words=['alexa_v0.1'] without prediction_keys=['alexa'], the edge worker is stale — recompile and sync the workflow.

Model download (three layers)

Only the wake words selected in the workflow are provisioned — not the full catalog.
WhenWhat runs
Workflow compile (cloud)openwakeword.utils.download_models(model_names=[...]) + load probe. Fails compile if openwakeword is missing on Django or download fails. /compile returns openwakeword_models.
Worker import (edge)Top of wf_*.py: download_models(model_names=_CW_WW_WORKFLOW_WAKE_WORDS) then _cw_ww_ensure_models_downloaded(...).
Engine init (edge)First audio frame: same download (idempotent) before Model(wakeword_models=...).
Edge-sync model_requirements includes wake_word_models: ["alexa", ...] and edge_package: ml-wakeword. The official cyberwaveos/edge-ml-worker image also pre-bakes all built-in models at image build time for air-gapped use.

Configuration

ParameterDefaultDescription
Wake Wordshey_mycroftMulti-select built-in models (see table above).
Detection Threshold0.5Min score [0, 1] for all selected models to enter AWAKE (workflow editor → Detection Parameters).
Limit listening timeOffWhen on, forward audio only for N seconds (default 2 s), then SLEEPING. When off, forward until silence timeout (default 1.5 s).
Silence Timeout (s)1.5Used when limit listening is off.
Wake Word Trim (ms)500Audio discarded after detection before STT (excludes wake phrase).
Output Buffer Presetwake-wordChunks emitted while AWAKE (see presets).
Digital twin (signaling)Twin that receives MQTT/Zenoh assistant commands (see below).
Play start/stop assistant soundsOnPublish signaling when forwarding starts/stops.
Feedback targetDesktopWhere the assistant feedback plays: Desktop (MQTT → browser), Edge (Zenoh → on-device speaker), or Both.

Output buffer presets (while AWAKE)

PresetSizeUse case
voice-assistant32 ms (512)Another VAD / Audio Assistant
wake-word80 ms (1,280)Low-latency chaining
speech-to-text4 s (64,000)Whisper / Call Model STT (recommended after wake word)
customUser secondsArbitrary chunk size

Assistant sound signaling

When Play start/stop assistant sounds is enabled and a digital twin is selected (same twin as the Audio Track microphone), the engine emits two events per AWAKE session:
EventCommand
Wake word detected, forwarding startsstart_assistant
Forwarding ends (timeout or limit)stop_assistant
Both events carry the same JSON payload:
{
  "source_type": "edge",
  "source_subtype": "wake_word_engine",
  "command": "start_assistant"
}
source_type is edge when the workflow runs on the edge (default for Wake Word Engine) and tele when the workflow runs on the cloud. source_subtype is the wake word node name (defaults to wake_word_engine), following the CwProcessor source subtype convention.

Feedback target

A Feedback target sidebar toggle controls where the audio plays:
OptionTransportSubscriber
Desktop (default)MQTT — {prefix}cyberwave/twin/{TWIN_UUID}/commands/assistant_signalingEnvironment viewer in the browser (/sounds/assistant/start-assistant.mp3, end-assistant.mp3)
EdgeZenoh — {prefix}/{TWIN_UUID}/data/commands/assistant_signaling (built via build_key)On-edge speaker driver subscribed to the Zenoh channel
BothMQTT + ZenohBrowser and edge speaker react simultaneously
Edge and Both require a speaker physically connected to the edge device, plus an edge speaker driver subscribing to the Zenoh commands/assistant_signaling channel for the twin.
The frontend handler is cyberwave-frontend/lib/workflows/assistant-signaling-sounds.ts (handleAssistantSignalingCommand), wired into useMQTTTwin for environment viewers.

Input formats

Adapted automatically to int16 @ 16 kHz mono:
FormatSource
NumPy int16 / float32Audio Track audio, Audio Assistant audio
Raw PCM bytesaudio or audio_bytes on upstream dict
WAV bytes (RIFF)Audio Track audio_bytes
WAV file path (*.wav)Batch / offline tests

Output (only after score ≥ threshold)

FieldDescription
audioCommand-only int16 PCM @ 16 kHz mono (wake phrase trimmed)
wake_wordTrigger id (e.g. alexa) — metadata only, not in audio
wake_word_scoreConfidence at detection
wake_word_scorese.g. {"alexa": 0.512}
While SLEEPING, outputs contain state only (state, is_speaking) — no speech audio.

Logging on edge

At INFO (worker log level):
  • Wake Word Engine initialized: wake_words=[...] prediction_keys=[...]
  • Periodic Wake word listening scores=... (~every 2 s while listening)
  • Wake word detected: 'alexa' scores={'alexa': 0.512} threshold=0.50, switching to AWAKE
Set CYBERWAVE_WORKER_LOG_LEVEL=DEBUG for per-frame scores above 0.15.

Dependencies

LayerRequirement
Compile serveropenwakeword, onnxruntime — see models & compile
Edge workercyberwave[ml-wakeword]
edge-syncedge_package: ml-wakeword, wake_word_models: [...]
Edge workflow dependencies lists all voice nodes and STT catalog extras.

Edge dependencies

Full compile + edge matrix

Models & compile

Compile-time download and edge-sync requirements

Testing

WAV fixtures and offline detection tests

Audio Track

Upstream buffer presets

Audio in Workflows

Shared PCM format