Skip to main content

Cyberwave is in Private Beta.

Request early access to get access to the Cyberwave dashboard.

STUB DOCUMENT: This page captures the current native-driver contract and known edge cases. A human will expand it before publishing.

What It Provides

The native microphone driver runs on the edge device that has the microphone attached. It captures audio through sounddevice / PortAudio, sends a WebRTC audio stream to the twin, and publishes raw chunks to the local Zenoh data bus on audio/default by default. It also subscribes to:
cyberwave/twin/{twin_uuid}/command
The driver handles two independent contracts on this topic — symmetric with the camera driver’s start_video / stop_video flow:

1. Live audio stream (start_audio / stop_audio)

Pure WebRTC lifecycle. No file is written.
{ "command": "start_audio", "source_type": "tele", "timestamp": 1716643200.123 }
{ "command": "stop_audio",  "source_type": "tele", "timestamp": 1716643200.123 }
The driver replies on cyberwave/twin/{twin_uuid}/start_audio/status (or stop_audio/status):
{ "type": "audio_started", "status": "ok", "source_type": "edge", "timestamp": 1716643200.123 }
{ "type": "audio_stopped", "status": "ok", "source_type": "edge", "timestamp": 1716643200.123 }
These commands only open or close the WebRTC audio producer — they never trigger recording.

2. Recording (start_recording / stop_recording)

The frontend and the Recorder workflow node publish directly to:
cyberwave/twin/{twin_uuid}/webrtc-command
{ "command": "start_recording", "source_type": "tele", "sensor": "audio", "frontend_type": "audio", "session_id": "sdk_9314159f" }
{ "command": "stop_recording",  "source_type": "tele", "sensor": "audio", "frontend_type": "audio", "session_id": "sdk_9314159f" }
source_type is "tele" when the frontend REC button publishes the message and "edge" when the microphone driver relays it (fallback path). session_id is a free-form correlation hint; the media-service derives the actual recording session from the live SFU producer state and only echoes the field back in its logs. The sensor value must match a sensor id from twin.capabilities.sensors (computed from twin.universal_schema.sensors[]); the audio widget resolves it via getAudioSensorConfig(twin).sensorId. The REC button is recording-only: it never republishes start_audio / stop_audio. The audio widget auto-publishes start_audio on mount and stop_audio on unmount, so by the time the user can click REC the WebRTC producer is already running. Re-sending start_audio from the REC click would just log start_audio command received - stream already active on the driver and add a 3 s ACK wait before the actual recording publish. The media-service SFU starts (or stops) persisting the running audio track and replies on cyberwave/twin/{twin_uuid}/webrtc-command/status:
{ "command": "start_recording", "status": "ok", "message": "recording started", "recording_id": "...", "source_type": "backend", "timestamp": 1716643200.123 }
As a fallback path for callers that only know the twin command topic, the microphone driver also accepts start_recording / stop_recording on cyberwave/twin/{twin_uuid}/command and forwards them verbatim to webrtc-command. On the fallback path the driver does not publish its own status ACK — callers consume the authoritative status from webrtc-command/status. CYBERWAVE_METADATA_AUTO_RECORDING_AUDIO=true makes the initial WebRTC offer carry recording: true so the media-service starts recording in the same negotiation. After startup, recording is controlled entirely by start_recording / stop_recording. When CYBERWAVE_METADATA_ENABLE_RECORDING=false, the driver drops fallback relays silently (the frontend publishes to webrtc-command itself and is not affected).

Setup

  1. Create or select a microphone twin in your Cyberwave environment.
  2. Pair the edge device through Cyberwave Edge so the driver receives CYBERWAVE_API_KEY and CYBERWAVE_TWIN_UUID.
  3. Attach a USB or built-in microphone to the edge device.
  4. Start the driver with Edge Core (recommended) or Docker. On macOS, run cyberwave edge install so the CLI starts a host ffmpeg PCM bridge (like the MJPEG camera bridge) and edge-core injects CYBERWAVE_METADATA_AUDIO_DEVICE=http://host.docker.internal:8101 into the container. On Linux Docker hosts, pass the audio device into the container:
devices:
  - /dev/snd:/dev/snd
group_add:
  - audio
Some systems require --privileged for ALSA device access. Use the narrower /dev/snd mapping first.

Standard catalog sensor configuration

STUB DOCUMENT: Canonical catalog asset shape; a human will expand before publishing.
The standard Cyberwave catalog microphone asset declares one sensor block. Both id and name are "audio"; type is "audio" (active producer). This is the same routing key the native speaker driver uses for its id/name — the modalities are distinguished by type / WebRTC sensor_type, not by id.
{
  "id": "audio",
  "name": "audio",
  "type": "audio",
  "parameters": {
    "enable_audio": "true",
    "audio_channels": "1",
    "enable_recording": "true",
    "audio_sample_rate": "48000",
    "auto_recording_audio": "false"
  }
}
Edge-core maps parametersCYBERWAVE_METADATA_* env vars when the driver container starts. The driver publishes WebRTC offers with:
Offer fieldCatalog valueRole
sensor"audio"Routing key (sensors[].id)
sensor_type"audio"Matches sensors[].type; media-service classifies as producer
role"producer"Active edge twin — publishes audio into the SFU
sender"edge"Only edge may produce microphone traffic (TR-1.24)
Recording commands (start_recording / stop_recording) use "sensor": "audio" so they target the same stream identity as the live offer.

Configuration

VariableDefaultPurpose
CYBERWAVE_METADATA_AUDIO_DEVICEdefaultSelect an input by index, name fragment, or default.
CYBERWAVE_METADATA_ENABLE_AUDIOtrueEnables WebRTC startup and reconnect. If false, no WebRTC audio or recording starts. Maps from enable_audio in the twin JSON.
CYBERWAVE_METADATA_ENABLE_RECORDINGtrueGates the driver’s fallback relay of start_recording / stop_recording (on the command topic) to the media-service. Does not affect start_audio / stop_audio (live stream is always allowed when enable_audio=true), and does not affect direct webrtc-command publishes from the frontend or Recorder node. Maps from enable_recording in the twin JSON.
CYBERWAVE_METADATA_AUTO_RECORDING_AUDIOfalseWhen true, sets recording: true on the initial WebRTC offer so the media-service auto-starts recording at startup. After startup, recording is driven by explicit start_recording / stop_recording. Maps from auto_recording_audio in the twin JSON.
CYBERWAVE_METADATA_AUDIO_CHANNELaudio/defaultZenoh channel for raw audio chunks.
CYBERWAVE_METADATA_AUDIO_MIC_NAMEaudioWebRTC sensor routing key (catalog sensors[].id).
CYBERWAVE_METADATA_AUDIO_SAMPLE_RATE48000Capture sample rate (catalog standard). WebRTC output is always resampled to 48 kHz Opus.
CYBERWAVE_METADATA_AUDIO_CHANNELS1Capture channels; auto-detection can upgrade to stereo.
When relaying a fallback recording command, the driver publishes the smallest payload the media-service needs (note source_type: "edge" here vs. "tele" for the frontend’s REC button):
{ "command": "start_recording", "source_type": "edge", "sensor": "audio", "frontend_type": "audio" }
{ "command": "stop_recording", "source_type": "edge", "sensor": "audio", "frontend_type": "audio" }
The media-service resolves the twin UUID from the MQTT topic and defaults the stream identity to audio/live/default. The same payload shape is used by the frontend audio widget’s REC button and by the Recorder workflow node, so the three paths converge on the same media-service handler and the same ACK on webrtc-command/status. To run multiple microphones on one device, run multiple driver instances. Give each instance a different CYBERWAVE_TWIN_UUID and set CYBERWAVE_METADATA_AUDIO_DEVICE to the desired input.

Linux Audio Notes

The driver image installs libportaudio2, which gives sounddevice access to PortAudio’s ALSA backend. If the host routes audio through PulseAudio or PipeWire, also make the relevant Pulse/PipeWire socket and client libraries available to the container. The driver logs all input devices at startup and publishes selected-device metadata to the twin. Set CYBERWAVE_LOG_LEVEL=DEBUG to see raw environment values and the resolved microphone configuration.

macOS Notes

Docker Desktop cannot pass CoreAudio into Linux containers. Cyberwave mirrors the camera MJPEG workaround: host ffmpeg captures AVFoundation audio and serves raw PCM over HTTP; the driver inside Docker reads that URL via AudioBridgeCapture.
StepCommand / artifact
Install bridgecyberwave edge install or --reconfigure-microphone
Per-twin map~/.cyberwave/audio_streams.json
Container envCYBERWAVE_METADATA_AUDIO_DEVICE=http://host.docker.internal:8101
Recoverycyberwave edge restart (kickstarts silent ffmpeg LaunchAgents)
Grant microphone permission when macOS prompts the terminal running ffmpeg. For local debugging without Docker, use ./run-local.sh (direct sounddevice / PortAudio).

Edge Cases

  • No audio devices at startup: the driver enumerates devices and fails configuration if none are available. Publishing a microphone sensor-failure alert before retrying with backoff is the expected runtime behavior.
  • Device disconnected mid-stream: the Linux/macOS device monitor detects add/remove events. The expected behavior is to stop WebRTC, publish a sensor-failure alert, and reconnect when the device returns.
  • Docker access: prefer /dev/snd plus the audio group; use privileged mode only when host audio permissions require it.
  • Cloud STT URL expiry: use signed URLs with enough TTL for queued workloads, or inline audio_base64 for small files.
  • Large STT inputs: keep Whisper jobs below roughly 25 MB; oversized inputs should fail with a clear validation error.

Dual Audio Streaming Paths

STUB DOCUMENT: This section captures the current dual-path contract. A human will expand it before publishing.
The driver streams captured audio on two independent, parallel paths:
PathOutput rateResamplingMetadataConsumer
WebRTC (Opus)48 kHz (always)Yes, if hardware ≠ 48 kHzstream_attributes.sample_rate in MQTT offerFrontend, media service
Zenoh (raw PCM)Hardware nativeNeversample_rate_hz, channels, encoding, layout in wire header@cw.on_audio workers

WebRTC path

Audio is resampled to 48 kHz (the Opus codec’s internal rate) before entering the WebRTC queue. The stream_attributes field in the MQTT webrtc-offer payload includes the actual sample_rate used, so the media service and frontend can verify compliance. The media service router uses standard mediasoup Opus negotiation — no custom validation is needed.

Zenoh path

Raw PCM chunks are published at the hardware’s native capture rate with no resampling. On the first publish, the Zenoh wire header carries metadata:
{
  "sample_rate_hz": 32000,
  "channels": 1,
  "encoding": "pcm_s16le",
  "layout": "mono"
}
@cw.on_audio workers receive this metadata so they can correctly interpret the raw audio bytes.

Parallelism

The PortAudio callback places raw audio into a zero-copy swap buffer for Zenoh (O(1)) and queues resampled audio for WebRTC. Three threads run in true parallel: PortAudio capture, Zenoh publisher, and WebRTC streamer.

Edge health (edge_health)

STUB DOCUMENT: CYB-2005 contract summary; a human will expand before publishing.
The driver delegates edge_health to the SDK’s MicrophoneAudioStreamer. Once WebRTC is connected, each heartbeat includes streams.stream.stream_config with kind: "audio", sample_rate_hz, channels, and codec: "opus". The block intentionally omits source (no host device path on the wire). Liveness uses mark_alive, so fps and frames_sent stay at zero for audio rows. The dashboard renders sample rate and channel layout from this block; workflow triggers still read Zenoh PCM, not MQTT health.

Twin metadata

The driver publishes both rates to the twin metadata under audio_device:
FieldDescription
capture_sample_rateHardware native rate (e.g. 32000)
stream_sample_rateWebRTC output rate (48000 when resampling is on)
channelsChannel count
layout"mono" or "stereo"
software_resamplingWhether resampling is active

Success Checks

  • docker compose up with a USB microphone streams audio to the microphone twin.
  • Frontend MQTT start_audio / stop_audio toggles the live WebRTC audio producer (no recording side effect); the driver replies with audio_started / audio_stopped on the matching status topic.
  • Frontend MQTT start_recording / stop_recording on webrtc-command toggles persisted recording without disturbing the live stream; the media-service replies on webrtc-command/status.
  • Zenoh audio/default chunks are consumable by an on_audio worker hook at the hardware’s native sample rate.
  • Startup logs list available devices; CYBERWAVE_METADATA_AUDIO_DEVICE selects a specific one.
  • USB disconnect and reconnect transitions through alert, reconnect, and alert resolution.
For source-level details, see the Generic Microphone Driver README.

Native speaker driver

Downstream playback counterpart — same sensor: "audio" routing key, sensor_type: "speaker", role: "consumer".

Recorder workflow node

Publishes the same start_recording / stop_recording payload on webrtc-command from a workflow — converges on the same media-service handler as the audio widget’s REC button and the driver’s fallback relay.

Audio in Workflows

Workflow-side audio pipeline (Audio Track, VAD, Wake Word, STT). Consumes Zenoh PCM from this driver — orthogonal to the WebRTC + recording path documented here.