The Audio Track trigger fires once for every audio chunk a twin’s microphone publishes on the edge data bus. Each chunk is typically 20 ms of PCM audio — the workflow receives an infinite stream of small frames, not a single large buffer.Documentation Index
Fetch the complete documentation index at: https://docs.cyberwave.com/llms.txt
Use this file to discover all available pages before exploring further.
Audio Track does not limit the total stream length. A conference call, book reading, or any other continuous audio source runs indefinitely — each 20 ms chunk triggers the workflow independently.
Pipeline Position
@cw.on_audio(twin_uuid) decorator and receives decoded PCM samples from the local Zenoh data bus — no cloud round-trip.
Quick Start
- Add an Audio Track trigger node to your workflow
- Select the Twin whose microphone you want to stream
- (Optional) Set the Audio Track / Sensor ID if the twin has multiple microphones
- Connect to a downstream node — typically an Audio Assistant for VAD, or a Call Model node for direct STT
- Activate the workflow and sync to the edge device
Inputs (Configuration)
| Parameter | Label | Default | Required | Description |
|---|---|---|---|---|
twin_uuid | Twin UUID | — | Yes | The twin whose audio stream triggers the workflow. |
audio_track_id | Audio Track / Sensor ID | "default" | No | Sensor identifier on the twin. Use "default" for the primary microphone. Only needed when a twin has multiple audio sensors. |
sample_rate_hz | Expected Sample Rate | 16000 | No | Expected sample rate in Hz. The actual rate from wire metadata overrides this at runtime, so you rarely need to change it. |
channels | Channels | 1 | No | Expected channel count (1 = mono, 2 = stereo). Wire metadata overrides at runtime. |
buffer_preset | Buffer Preset | "vad" | No | FIFO buffer mode: vad (32 ms), wake-word (80 ms), stt (4 s), or custom. See Buffer Presets. |
buffer_size_s | Buffer Size (s) | 1.0 | No | Custom buffer duration. Only used when preset is custom. |
min_samples | Skip Empty Audio | 1 | No | Minimum samples a chunk must contain. Drops empty frames from device glitches. See Safety Guards. |
max_chunk_seconds | Max Single Chunk Length (s) | 10 | No | Maximum duration of a single chunk in seconds. Drops oversized buffers from device stalls. See Safety Guards. |
Outputs
Every time the trigger fires, it produces these outputs for downstream nodes:| Output | Type | Description |
|---|---|---|
audio | AUDIO | PCM S16LE int16 numpy array, mono, 16 kHz. Standard format for zero-copy pass-by-reference between nodes. |
audio_ts | NUMBER | Timestamp of the audio sample from the data bus (epoch seconds). |
sensor | STRING | Name of the audio sensor that produced the sample. |
sample_rate_hz | NUMBER | Always 16000 Hz after resampling. |
channels | NUMBER | Always 1 (mono) after downmix. |
sample_count | NUMBER | Number of audio samples in this chunk. |
duration_s | NUMBER | Duration of this chunk in seconds (derived from sample_count / sample_rate_hz). |
metadata | OBJECT | Full transport metadata from the data bus (sample_rate_hz, channels, encoding, content_type, etc.). |
Buffer Presets
The microphone sends audio in tiny 20 ms chunks. Most downstream nodes need a larger buffer to work correctly. The Buffer Preset parameter controls how many chunks are accumulated before emitting:| Preset | Duration | Samples (@ 16 kHz) | Use Case |
|---|---|---|---|
vad | 32 ms | 512 | Audio Assistant / Silero VAD (default) |
wake-word | 80 ms | 1,280 | Wake Word Engine (OpenWakeWord internal frame size) |
stt | 4 s | 64,000 | Whisper / STT models |
custom | User-defined | int(16000 × buffer_size_s) | Any custom duration |
Supported Sample Rates
The Audio Track trigger is fully sample-rate-agnostic. The microphone driver sends 20 ms chunks regardless of the sample rate — only the number of samples per chunk changes:| Sample Rate | Samples per 20 ms chunk | Common Devices |
|---|---|---|
| 48,000 Hz | 960 | USB microphones, most laptops |
| 44,100 Hz | 882 | CD-quality audio interfaces |
| 32,000 Hz | 640 | Some embedded boards, Bluetooth profiles |
| 16,000 Hz | 320 | Telephony, low-bandwidth IoT devices |
sample_rate_hz, a warning is logged and the wire value takes precedence.
Wire Metadata Validation
When the actual microphone sample rate or channel count differs from what you configured, the Audio Track node:- Logs a warning with both the configured and actual values
- Publishes a twin alert (
audio_track_mismatch, severitywarning) so you can see the mismatch in the Cyberwave UI - Uses the actual wire value for all processing — the configured value is only a fallback when metadata is missing
sample_rate_hz at the default 16000 and the node will still work correctly with a 48 kHz microphone — it just generates a one-time mismatch warning.
Safety Guards
The Audio Track trigger includes two optional guards that filter out degenerate chunks before they reach downstream nodes. Both are under Advanced settings in the inspector.Skip Empty Audio (min_samples)
Default: 1 — accept everything except completely empty (0-sample) frames.
Empty frames are never intentional audio. They occur when:
- A USB microphone briefly disconnects and reconnects
- The audio driver has a buffer underrun (CPU spike, thermal throttling)
- A Bluetooth mic drops a packet
- The container starts before the hardware stream is fully open
1 filters these out without rejecting any real audio. You almost never need to change this.
Max Single Chunk Length (max_chunk_seconds)
Default: 10 seconds — reject any individual chunk longer than 10 s.
Normal chunks are ~20 ms. An oversized chunk only appears when the device stalls and then flushes a large backlog at once. Sending such a chunk into downstream processing (VAD, STT) could cause memory spikes or crashes.
Typical Pipelines
Voice Assistant (VAD + STT)
Direct STT (no VAD)
Wake Word + STT
Acoustic Monitoring
Edge-Only Execution
The Audio Track trigger generates a Python worker function that runs directly on the edge device:cyberwave workflow sync or automatic periodic sync. No audio data leaves the device unless a downstream node explicitly sends it (e.g. to a cloud STT API).
Next Steps
Audio Assistant
VAD-powered speech segmentation for the audio stream
Audio Assistant Technical Reference
Output schema, resampling, and architecture details