Wake Word Engine

The Wake Word Engine is an edge-only voice trigger. It sleeps until it hears the wake word, then streams PCM int16 @ 16 kHz mono downstream (typically to Call Model / Whisper) until you stop speaking. Internally uses OpenWakeWord.

Typical pipeline

Audio Track (80 ms buffer) → Wake Word Engine → Call Model (Whisper) → Send Alert

Set Audio Track buffer preset to Wake Word (80 ms) — not Speech-To-Text (4 s). A 4 s upstream buffer only invokes scoring about once every 4 seconds and makes short phrases like “alexa” easy to miss. Compile rejects Audio Track speech-to-text → Wake Word Engine.

Stage	Recommended buffer
Audio Track → Wake Word (input)	Wake Word — 80 ms / 1,280 samples
Wake Word → Whisper (output while AWAKE)	Speech-To-Text — 4 s / 64,000 samples

Built-in wake words (only these six)

Legacy ids such as hey_siri are not in openWakeWord v0.5.x and are stripped at compile and runtime.

Model ID	Phrase to speak
`alexa`	”alexa”
`hey_mycroft`	”hey mycroft”
`hey_jarvis`	”hey jarvis”
`hey_rhasspy`	”hey rhasspy”
`weather`	”what’s the weather”
`timer`	”set a 10 minute timer”

Select multiple words in the UI — the engine activates on any of them.

Detection threshold

One Detection Threshold in the workflow editor applies to every selected wake-word model (default 0.5, range 0–1). A frame triggers AWAKE when any loaded model’s score is ≥ that value.

Setting	Where	Default
Detection Threshold	Wake Word node → Detection Parameters	`0.5`

There is no per-model threshold override in the engine — tune this single knob if a phrase is too sensitive or too hard to trigger.

Pretrained models do not peak at the same level on identical audio. For example, committed test clips often score ~1.0 for alexa, ~0.68 for hey_mycroft, and ~0.41 for hey_jarvis. If hey_jarvis (or hey_rhasspy) rarely fires at 0.5, lower the threshold (e.g. 0.35–0.40) or switch models — do not expect the engine to auto-adjust per model.

Two-state machine

SLEEPING

Audio is scored in 80 ms frames (1,280 samples @ 16 kHz).
Downstream nodes receive no speech audio (only state metadata).
A lookback deque (default 500 ms) is used internally for detection only — it is not forwarded to STT.
When any selected model score ≥ threshold (default 0.5), the engine switches to AWAKE.

AWAKE

After detection, the engine skips the wake phrase (default 500 ms, Wake Word Trim) then buffers command-only audio.
One audio output is emitted per session (after silence timeout or listening limit), without the wake word in the PCM.
Listening ends per listening policy (see below), then state returns to SLEEPING.

Prediction keys (important)

openWakeWord stores files like alexa_v0.1.onnx, but predict() returns scores under short keys such as alexa. Compiled workers use prediction_keys = list(oww_model.models.keys()) so scores are read correctly. If logs show words=['alexa_v0.1'] without prediction_keys=['alexa'], the edge worker is stale — recompile and sync the workflow.

Model download (three layers)

Only the wake words selected in the workflow are provisioned — not the full catalog.

When	What runs
Workflow compile (cloud)	`openwakeword.utils.download_models(model_names=[...])` + load probe. Fails compile if `openwakeword` is missing on Django or download fails. `/compile` returns `openwakeword_models`.
Worker import (edge)	Top of `wf_*.py`: `download_models(model_names=_CW_WW_WORKFLOW_WAKE_WORDS)` then `_cw_ww_ensure_models_downloaded(...)`.
Engine init (edge)	First audio frame: same download (idempotent) before `Model(wakeword_models=...)`.

Edge-sync model_requirements includes wake_word_models: ["alexa", ...] and edge_package: ml-wakeword. The official cyberwaveos/edge-ml-worker image also pre-bakes all built-in models at image build time for air-gapped use.

Configuration

Parameter	Default	Description
Wake Words	`hey_mycroft`	Multi-select built-in models (see table above).
Detection Threshold	`0.5`	Min score [0, 1] for all selected models to enter AWAKE (workflow editor → Detection Parameters).
Limit listening time	Off	When on, forward audio only for N seconds (default 2 s), then SLEEPING. When off, forward until silence timeout (default 1.5 s).
Silence Timeout (s)	`1.5`	Used when limit listening is off.
Wake Word Trim (ms)	`500`	Audio discarded after detection before STT (excludes wake phrase).
Output Buffer Preset	`wake-word`	Chunks emitted while AWAKE (see presets).
Digital twin (signaling)	—	Twin that receives MQTT/Zenoh assistant commands (see below).
Play start/stop assistant sounds	On	Publish signaling when forwarding starts/stops.
Feedback target	`Desktop`	Where the assistant feedback plays: `Desktop` (MQTT → browser), `Edge` (Zenoh → on-device speaker), or `Both`.

Output buffer presets (while AWAKE)

Preset	Size	Use case
`voice-assistant`	32 ms (512)	Another VAD / Audio Assistant
`wake-word`	80 ms (1,280)	Low-latency chaining
`speech-to-text`	4 s (64,000)	Whisper / Call Model STT (recommended after wake word)
`custom`	User seconds	Arbitrary chunk size

Assistant sound signaling

When Play start/stop assistant sounds is enabled and a digital twin is selected (same twin as the Audio Track microphone), the engine emits two events per AWAKE session:

Event	Command
Wake word detected, forwarding starts	`start_assistant`
Forwarding ends (timeout or limit)	`stop_assistant`

Both events carry the same JSON payload:

{
  "source_type": "edge",
  "source_subtype": "wake_word_engine",
  "command": "start_assistant"
}

source_type is edge when the workflow runs on the edge (default for Wake Word Engine) and tele when the workflow runs on the cloud. source_subtype is the wake word node name (defaults to wake_word_engine), following the CwProcessor source subtype convention.

Feedback target

A Feedback target sidebar toggle controls where the audio plays:

Option	Transport	Subscriber
Desktop (default)	MQTT — `{prefix}cyberwave/twin/{TWIN_UUID}/commands/assistant_signaling`	Environment viewer in the browser (`/sounds/assistant/start-assistant.mp3`, `end-assistant.mp3`)
Edge	Zenoh — `{prefix}/{TWIN_UUID}/data/commands/assistant_signaling` (built via `build_key`)	On-edge speaker driver subscribed to the Zenoh channel
Both	MQTT + Zenoh	Browser and edge speaker react simultaneously

Edge and Both require a speaker physically connected to the edge device, plus an edge speaker driver subscribing to the Zenoh commands/assistant_signaling channel for the twin.

The frontend handler is cyberwave-frontend/lib/workflows/assistant-signaling-sounds.ts (handleAssistantSignalingCommand), wired into useMQTTTwin for environment viewers.

Input formats

Adapted automatically to int16 @ 16 kHz mono:

Format	Source
NumPy int16 / float32	Audio Track `audio`, Audio Assistant `audio`
Raw PCM bytes	`audio` or `audio_bytes` on upstream dict
WAV bytes (RIFF)	Audio Track `audio_bytes`
WAV file path (`*.wav`)	Batch / offline tests

Output (only after score ≥ threshold)

Field	Description
`audio`	Command-only int16 PCM @ 16 kHz mono (wake phrase trimmed)
`wake_word`	Trigger id (e.g. `alexa`) — metadata only, not in `audio`
`wake_word_score`	Confidence at detection
`wake_word_scores`	e.g. `{"alexa": 0.512}`

While SLEEPING, outputs contain state only (state, is_speaking) — no speech audio.

Logging on edge

At INFO (worker log level):

Wake Word Engine initialized: wake_words=[...] prediction_keys=[...]
Periodic Wake word listening scores=... (~every 2 s while listening)
Wake word detected: 'alexa' scores={'alexa': 0.512} threshold=0.50, switching to AWAKE

Set CYBERWAVE_WORKER_LOG_LEVEL=DEBUG for per-frame scores above 0.15.

Dependencies

Layer	Requirement
Compile server	`openwakeword`, `onnxruntime` — see models & compile
Edge worker	`cyberwave[ml-wakeword]`
edge-sync	`edge_package: ml-wakeword`, `wake_word_models: [...]`

Edge workflow dependencies lists all voice nodes and STT catalog extras.

Edge dependencies

Full compile + edge matrix

Models & compile

Compile-time download and edge-sync requirements

Testing

WAV fixtures and offline detection tests

Audio Track

Upstream buffer presets

Audio in Workflows

Shared PCM format

Concepts

Platform Features

Cyberwave Edge

Technical Reference

Use-Case Recipes

Typical pipeline

Built-in wake words (only these six)

Detection threshold

Two-state machine

SLEEPING

AWAKE

Prediction keys (important)

Model download (three layers)

Configuration

Output buffer presets (while AWAKE)

Assistant sound signaling

Feedback target

Input formats

Output (only after score ≥ threshold)

Logging on edge

Dependencies

Edge dependencies

Models & compile

Testing

Audio Track

Audio in Workflows

​Typical pipeline

​Built-in wake words (only these six)

​Detection threshold

​Two-state machine

​SLEEPING

​AWAKE

​Prediction keys (important)

​Model download (three layers)

​Configuration

​Output buffer presets (while AWAKE)

​Assistant sound signaling

​Feedback target

​Input formats

​Output (only after score ≥ threshold)

​Logging on edge

​Dependencies

​Related

Edge dependencies

Models & compile

Testing

Audio Track

Audio in Workflows

Typical pipeline

Built-in wake words (only these six)

Detection threshold

Two-state machine

SLEEPING

AWAKE

Prediction keys (important)

Model download (three layers)

Configuration

Output buffer presets (while AWAKE)

Assistant sound signaling

Feedback target

Input formats

Output (only after score ≥ threshold)

Logging on edge

Dependencies

Related