Cyberwave is in Private Beta.
Request early access to get access to the Cyberwave dashboard.
What It Provides
The native microphone driver runs on the edge device that has the microphone attached. It captures audio throughsounddevice / PortAudio, sends a WebRTC audio stream to the twin, and publishes raw chunks to the local Zenoh data bus on audio/default by default.
It also subscribes to:
start_video / stop_video flow:
1. Live audio stream (start_audio / stop_audio)
Pure WebRTC lifecycle. No file is written.
cyberwave/twin/{twin_uuid}/start_audio/status (or stop_audio/status):
2. Recording (start_recording / stop_recording)
The frontend and the Recorder workflow node publish directly to:
source_type is "tele" when the frontend REC button publishes the message and "edge" when the microphone driver relays it (fallback path). session_id is a free-form correlation hint; the media-service derives the actual recording session from the live SFU producer state and only echoes the field back in its logs. The sensor value must match a sensor id from twin.capabilities.sensors (computed from twin.universal_schema.sensors[]); the audio widget resolves it via getAudioSensorConfig(twin).sensorId.
The REC button is recording-only: it never republishes start_audio / stop_audio. The audio widget auto-publishes start_audio on mount and stop_audio on unmount, so by the time the user can click REC the WebRTC producer is already running. Re-sending start_audio from the REC click would just log start_audio command received - stream already active on the driver and add a 3 s ACK wait before the actual recording publish.
The media-service SFU starts (or stops) persisting the running audio track and replies on cyberwave/twin/{twin_uuid}/webrtc-command/status:
start_recording / stop_recording on cyberwave/twin/{twin_uuid}/command and forwards them verbatim to webrtc-command. On the fallback path the driver does not publish its own status ACK — callers consume the authoritative status from webrtc-command/status.
CYBERWAVE_METADATA_AUTO_RECORDING_AUDIO=true makes the initial WebRTC offer carry recording: true so the media-service starts recording in the same negotiation. After startup, recording is controlled entirely by start_recording / stop_recording. When CYBERWAVE_METADATA_ENABLE_RECORDING=false, the driver drops fallback relays silently (the frontend publishes to webrtc-command itself and is not affected).
Setup
- Create or select a microphone twin in your Cyberwave environment.
- Pair the edge device through Cyberwave Edge so the driver receives
CYBERWAVE_API_KEYandCYBERWAVE_TWIN_UUID. - Attach a USB or built-in microphone to the edge device.
- Start the driver with Edge Core (recommended) or Docker. On macOS, run
cyberwave edge installso the CLI starts a host ffmpeg PCM bridge (like the MJPEG camera bridge) and edge-core injectsCYBERWAVE_METADATA_AUDIO_DEVICE=http://host.docker.internal:8101into the container. On Linux Docker hosts, pass the audio device into the container:
--privileged for ALSA device access. Use the narrower /dev/snd mapping first.
Standard catalog sensor configuration
The standard Cyberwave catalog microphone asset declares one sensor block. Bothid and name are "audio"; type is "audio" (active producer). This is the same routing key the native speaker driver uses for its id/name — the modalities are distinguished by type / WebRTC sensor_type, not by id.
parameters → CYBERWAVE_METADATA_* env vars when the driver container starts. The driver publishes WebRTC offers with:
| Offer field | Catalog value | Role |
|---|---|---|
sensor | "audio" | Routing key (sensors[].id) |
sensor_type | "audio" | Matches sensors[].type; media-service classifies as producer |
role | "producer" | Active edge twin — publishes audio into the SFU |
sender | "edge" | Only edge may produce microphone traffic (TR-1.24) |
start_recording / stop_recording) use "sensor": "audio" so they target the same stream identity as the live offer.
Configuration
| Variable | Default | Purpose |
|---|---|---|
CYBERWAVE_METADATA_AUDIO_DEVICE | default | Select an input by index, name fragment, or default. |
CYBERWAVE_METADATA_ENABLE_AUDIO | true | Enables WebRTC startup and reconnect. If false, no WebRTC audio or recording starts. Maps from enable_audio in the twin JSON. |
CYBERWAVE_METADATA_ENABLE_RECORDING | true | Gates the driver’s fallback relay of start_recording / stop_recording (on the command topic) to the media-service. Does not affect start_audio / stop_audio (live stream is always allowed when enable_audio=true), and does not affect direct webrtc-command publishes from the frontend or Recorder node. Maps from enable_recording in the twin JSON. |
CYBERWAVE_METADATA_AUTO_RECORDING_AUDIO | false | When true, sets recording: true on the initial WebRTC offer so the media-service auto-starts recording at startup. After startup, recording is driven by explicit start_recording / stop_recording. Maps from auto_recording_audio in the twin JSON. |
CYBERWAVE_METADATA_AUDIO_CHANNEL | audio/default | Zenoh channel for raw audio chunks. |
CYBERWAVE_METADATA_AUDIO_MIC_NAME | audio | WebRTC sensor routing key (catalog sensors[].id). |
CYBERWAVE_METADATA_AUDIO_SAMPLE_RATE | 48000 | Capture sample rate (catalog standard). WebRTC output is always resampled to 48 kHz Opus. |
CYBERWAVE_METADATA_AUDIO_CHANNELS | 1 | Capture channels; auto-detection can upgrade to stereo. |
source_type: "edge" here vs. "tele" for the frontend’s REC button):
audio/live/default. The same payload shape is used by the frontend audio widget’s REC button and by the Recorder workflow node, so the three paths converge on the same media-service handler and the same ACK on webrtc-command/status.
To run multiple microphones on one device, run multiple driver instances. Give each instance a different CYBERWAVE_TWIN_UUID and set CYBERWAVE_METADATA_AUDIO_DEVICE to the desired input.
Linux Audio Notes
The driver image installslibportaudio2, which gives sounddevice access to PortAudio’s ALSA backend. If the host routes audio through PulseAudio or PipeWire, also make the relevant Pulse/PipeWire socket and client libraries available to the container.
The driver logs all input devices at startup and publishes selected-device metadata to the twin. Set CYBERWAVE_LOG_LEVEL=DEBUG to see raw environment values and the resolved microphone configuration.
macOS Notes
Docker Desktop cannot pass CoreAudio into Linux containers. Cyberwave mirrors the camera MJPEG workaround: host ffmpeg captures AVFoundation audio and serves raw PCM over HTTP; the driver inside Docker reads that URL viaAudioBridgeCapture.
| Step | Command / artifact |
|---|---|
| Install bridge | cyberwave edge install or --reconfigure-microphone |
| Per-twin map | ~/.cyberwave/audio_streams.json |
| Container env | CYBERWAVE_METADATA_AUDIO_DEVICE=http://host.docker.internal:8101 |
| Recovery | cyberwave edge restart (kickstarts silent ffmpeg LaunchAgents) |
./run-local.sh (direct sounddevice / PortAudio).
Edge Cases
- No audio devices at startup: the driver enumerates devices and fails configuration if none are available. Publishing a microphone sensor-failure alert before retrying with backoff is the expected runtime behavior.
- Device disconnected mid-stream: the Linux/macOS device monitor detects add/remove events. The expected behavior is to stop WebRTC, publish a sensor-failure alert, and reconnect when the device returns.
- Docker access: prefer
/dev/sndplus theaudiogroup; use privileged mode only when host audio permissions require it. - Cloud STT URL expiry: use signed URLs with enough TTL for queued workloads, or inline
audio_base64for small files. - Large STT inputs: keep Whisper jobs below roughly
25 MB; oversized inputs should fail with a clear validation error.
Dual Audio Streaming Paths
The driver streams captured audio on two independent, parallel paths:| Path | Output rate | Resampling | Metadata | Consumer |
|---|---|---|---|---|
| WebRTC (Opus) | 48 kHz (always) | Yes, if hardware ≠ 48 kHz | stream_attributes.sample_rate in MQTT offer | Frontend, media service |
| Zenoh (raw PCM) | Hardware native | Never | sample_rate_hz, channels, encoding, layout in wire header | @cw.on_audio workers |
WebRTC path
Audio is resampled to 48 kHz (the Opus codec’s internal rate) before entering the WebRTC queue. Thestream_attributes field in the MQTT webrtc-offer payload includes the actual sample_rate used, so the media service and frontend can verify compliance. The media service router uses standard mediasoup Opus negotiation — no custom validation is needed.
Zenoh path
Raw PCM chunks are published at the hardware’s native capture rate with no resampling. On the first publish, the Zenoh wire header carries metadata:@cw.on_audio workers receive this metadata so they can correctly interpret the raw audio bytes.
Parallelism
The PortAudio callback places raw audio into a zero-copy swap buffer for Zenoh (O(1)) and queues resampled audio for WebRTC. Three threads run in true parallel: PortAudio capture, Zenoh publisher, and WebRTC streamer.Edge health (edge_health)
The driver delegates edge_health to the SDK’s MicrophoneAudioStreamer. Once WebRTC is connected, each heartbeat includes streams.stream.stream_config with kind: "audio", sample_rate_hz, channels, and codec: "opus". The block intentionally omits source (no host device path on the wire). Liveness uses mark_alive, so fps and frames_sent stay at zero for audio rows. The dashboard renders sample rate and channel layout from this block; workflow triggers still read Zenoh PCM, not MQTT health.
Twin metadata
The driver publishes both rates to the twin metadata underaudio_device:
| Field | Description |
|---|---|
capture_sample_rate | Hardware native rate (e.g. 32000) |
stream_sample_rate | WebRTC output rate (48000 when resampling is on) |
channels | Channel count |
layout | "mono" or "stereo" |
software_resampling | Whether resampling is active |
Success Checks
docker compose upwith a USB microphone streams audio to the microphone twin.- Frontend MQTT
start_audio/stop_audiotoggles the live WebRTC audio producer (no recording side effect); the driver replies withaudio_started/audio_stoppedon the matching status topic. - Frontend MQTT
start_recording/stop_recordingonwebrtc-commandtoggles persisted recording without disturbing the live stream; the media-service replies onwebrtc-command/status. - Zenoh
audio/defaultchunks are consumable by anon_audioworker hook at the hardware’s native sample rate. - Startup logs list available devices;
CYBERWAVE_METADATA_AUDIO_DEVICEselects a specific one. - USB disconnect and reconnect transitions through alert, reconnect, and alert resolution.
Related
Native speaker driver
Downstream playback counterpart — same
sensor: "audio" routing key, sensor_type: "speaker", role: "consumer".Recorder workflow node
Publishes the same
start_recording / stop_recording payload on webrtc-command from a workflow — converges on the same media-service handler as the audio widget’s REC button and the driver’s fallback relay.Audio in Workflows
Workflow-side audio pipeline (Audio Track, VAD, Wake Word, STT). Consumes Zenoh PCM from this driver — orthogonal to the WebRTC + recording path documented here.