Skip to main content

Cyberwave is in Private Beta.

Request early access to get access to the Cyberwave dashboard.

STUB DOCUMENT: This page captures the current native-driver contract and known edge cases for the speaker counterpart of the native microphone driver. A human will expand it before publishing.

What It Provides

The native speaker driver runs on the edge device that has the speaker attached. It is the downstream/playback counterpart of the microphone driver — it consumes a WebRTC audio stream from the media-service, hands the decoded PCM to sounddevice / PortAudio, and plays it through the configured output device. It also subscribes to the workflow-emitted Zenoh cue channels so the device can play short status chimes from a bundled MP3 library. The driver lives at:
cyberwave-edge-runtime/runtime-services/drivers/native/cyberwave/generic-speaker/
It mirrors the layout of generic-microphone and reuses the same BaseAudioTrack / BaseAudioStreamer SDK base classes — the speaker subclasses are SpeakerAudioTrack(BaseAudioTrack) and SpeakerAudioStreamer(BaseAudioStreamer) in cyberwave.sensor.speaker.

Quick start

cd cyberwave-edge-runtime/runtime-services/drivers/native/cyberwave/generic-speaker
cp .env.example .env
# fill in CYBERWAVE_TWIN_UUID and (optionally) CYBERWAVE_SPEAKER_DEVICE
docker compose -f docker-compose.local.yml up
On macOS, run bare-metal with ./run-local.sh from the driver directory — Docker Desktop on macOS runs in LinuxKit and cannot expose /dev/snd, so playback must happen on the host so PortAudio can talk to CoreAudio.

MQTT command surface (TR-1.26)

TopicPayloadNotes
cyberwave/twin/{uuid}/command{"command":"start_speaker", ...}Starts the WebRTC consumer + opens the host sounddevice.OutputStream.
cyberwave/twin/{uuid}/command{"command":"stop_speaker", ...}Stops the consumer and closes the audio device gracefully.
cyberwave/twin/{uuid}/start_speaker/status{"status":"ok" | "error", ...}ACK published by the driver.
cyberwave/twin/{uuid}/stop_speaker/status{"status":"ok" | "error", ...}ACK published by the driver.
The driver also mirrors the legacy start_audio / stop_audio verbs so existing integrations keep working.

Zenoh cue contract (TR-1.17 / TR-1.18)

The driver subscribes to two cue channels with policy="latest":
ChannelCommandsFile played
commands/recording_signalingstart_recording, stop_recordingsounds/general/recording_signal.mp3
commands/assistant_signalingstart_assistantsounds/assistant/start-assistant.mp3
commands/assistant_signalingstop_assistantsounds/assistant/end-assistant.mp3
Payloads are the standard Cyberwave envelope (HeaderTemplate + JSON body). The driver dedups identical (channel, command) pairs inside a 250 ms window, matching the browser-side handler.

Standard catalog sensor configuration

STUB DOCUMENT: Canonical catalog asset shape; a human will expand before publishing.
The standard Cyberwave catalog speaker asset declares one sensor block. id and name are "audio" (same routing key as the native microphone driver); type is "speaker" (passive consumer).
{
  "id": "audio",
  "name": "audio",
  "type": "speaker",
  "parent_link": "generic_speaker_link",
  "parameters": {
    "audio_device": "default",
    "audio_source": "webrtc",
    "audio_volume": "0.8",
    "audio_channels": "2",
    "enable_speaker": "true",
    "audio_bit_depth": "16",
    "audio_sample_rate": "48000",
    "auto_play_on_boot": "false"
  }
}
Edge-core maps parametersCYBERWAVE_METADATA_* env vars. The driver publishes WebRTC offers with:
Offer fieldCatalog valueRole
sensor"audio"Routing key (sensors[].id) — shared with mic twins
sensor_type"speaker"Matches sensors[].type; media-service classifies as consumer
role"consumer"Passive twin — receives audio from the SFU
sender"edge"Edge driver consumes the mixed downstream leg

Media-service contract (TR-1.21 / TR-1.22 / TR-1.24)

WebRTC offers carry sensor (routing id), sensor_type (catalog type), and role. The media-service classifies offers before dispatch:
Sensor typeRoleAllowed sender
audio, mic, microphone, audio_in, audio_mono, audio_stereoProducer (upstream)edge only
speaker, loudspeaker, speakerphone, audio_outConsumer (downstream)edge (driver) or frontend (preview)
Catalog standard pairs: microphone sensor_type: "audio" + role: "producer"; speaker sensor_type: "speaker" + role: "consumer". Both use sensor: "audio". Offers that conflict with this contract are rejected (error answer on webrtc-answer). Raw microphone traffic stays inside the media-service — only the speaker consumer leg is fanned out to edge/frontend peers.

Configuration (CYBERWAVE_* env vars)

VariableDefaultMeaning
CYBERWAVE_TWIN_UUIDrequiredUUID of the speaker twin.
CYBERWAVE_SPEAKER_DEVICEdefaultOutput device — integer index, name substring, or default.
CYBERWAVE_METADATA_AUDIO_SAMPLE_RATE48000Sample rate of the speaker stream.
CYBERWAVE_METADATA_AUDIO_CHANNELS2Channel count (1 = mono, 2 = stereo, up to 8).
CYBERWAVE_METADATA_AUDIO_BIT_DEPTH1616, 24, or 32 bit PCM.
CYBERWAVE_METADATA_AUDIO_VOLUME0.8Master volume (0.0 – 1.0). Matches catalog standard-speaker asset.
CYBERWAVE_METADATA_SPEAKER_NAMEaudioWebRTC sensor routing id (catalog sensors[].id).
CYBERWAVE_METADATA_AUDIO_SOURCEwebrtcPlayback source: webrtc, file, queue, or both.
CYBERWAVE_METADATA_ENABLE_SPEAKERtrueGate speaker output without tearing down the driver.
CYBERWAVE_METADATA_AUTO_PLAY_ON_BOOTfalseAuto-issue start_speaker on boot.
CYBERWAVE_METADATA_AUDIO_CHANNELaudio/defaultZenoh data-bus channel for parallel raw PCM publishing.
Nothing about the device, sample rate, channel count, or routing target is hard-coded — every value flows through env vars, runtime device discovery, or the twin metadata audio_device block.

Cross-platform behaviour

  • Linux — ALSA direct pass-through via /dev/snd and --group-add audio. All discovery uses sounddevice.query_devices(); on pyudev-capable hosts, hot-plug events come from udev.
  • macOS — Bare-metal CoreAudio (no Docker), with full in-container DSP (volume, per-channel gain, channel routing matrix) applied before audio leaves the Python process. PulseAudio-CoreAudio bridging is a documented fallback for power users.

Hot-plug (TR-1.11 – TR-1.13)

Disconnecting the speaker triggers AudioDeviceMonitor, which reopens the SFU consumer leg + sounddevice.OutputStream against the new device — even when the replacement reports a different channel count or sample rate. The driver re-resolves the selected device against twin metadata on every recovery cycle and publishes SPEAKER_FAILURE / resolution alerts as the hardware comes and goes.

SDK helpers

The driver delegates everything to cyberwave.sensor.speaker:
  • SpeakerAudioStreamer — WebRTC + MQTT lifecycle (subclasses BaseAudioStreamer)
  • SpeakerAudioTrack — minimal upstream track required by the WebRTC SDP contract
  • HostSpeakerCapture — host sounddevice.OutputStream wrapper with file / queue / Zenoh sources
  • play_file(...), associate_speaker_to_microphone(...), associate_speaker_to_microphones(...) — high-level helpers from TR-1.25
See the native microphone driver page for the upstream producer counterpart and the shared sensor: "audio" routing contract.