Skip to main content
The Audio Assistant node sits between an Audio Track trigger and downstream nodes (STT, alerts, models). It normalizes all input to PCM S16LE int16 @ 16 kHz mono via the shared audio ingress layer—the same contract as the rest of the audio pipeline.

Pipeline position

Audio Track Trigger → Audio Assistant → Call Model / Send Alert / …

Modalities

ModalityPurposeEngine
Voice Assistant (VA)Segment speech utterancesSilero VAD
Sound Security Guard (SSG)Detect security-related acoustic eventsMIT AST (AudioSet)
Only one modality is active per node instance.

Shared audio contract

DirectionKeyFormat
InputaudioPCM S16LE numpy int16, float32, raw bytes, or WAV—adapted to int16 @ 16 kHz mono
Outputaudioint16 mono @ 16 kHz (when a segment or alert window is emitted)
Outputsample_rate_hzAlways 16000
OutputchannelsAlways 1
Upstream Audio Track buffer presets control chunk size into the node:
PresetChunk sizeTypical use
Voice Assistant (32 ms)512 samplesVA streaming (matches Silero frame size)
Wake Word (80 ms)1280 samplesWake Word Engine
Speech-To-Text (4 s)64000 samplesBatch / long windows
For SSG, prefer accumulating at least 1–4 s per analysis window (default analysis preset: Speech-To-Text 4 s).

Edge execution

Audio Assistant nodes are edge-only. Workflows compile to a wf_*.py worker module and sync to the device:
cyberwave workflow compile <workflow-uuid>    # inspect emitted source + warnings
cyberwave workflow sync --twin-uuid <twin>  # deploy to edge
Typical chain:
Audio Track (@cw.on_audio) → Audio Assistant → Send Alert / Call Model
Compile checks:
  • VA upstream Audio Track must use buffer preset Voice Assistant (32 ms).
  • SSG upstream Audio Track should use Speech-To-Text (4 s) (or custom ≥1 s).
The compiler emits a warning listing required Python extras when VA/SSG nodes are present.

Edge dependencies

pip install "cyberwave[ml-vad]"      # Voice Assistant (Silero)
pip install "cyberwave[ml-aed]"      # Sound Security Guard (transformers + AST)
The edge-ml-worker container image pre-installs zenoh, ml-vad, ml-aed, and ml-wakeword, and pre-downloads AST and OpenWakeWord weights for air-gapped use. SSG on bare-metal edges downloads MIT/ast-finetuned-audioset-10-10-0.4593 on first run (~340 MB) unless baked into the image.

Pages