Edge Worker Container - Cyberwave Docs

Overview

Edge Core manages a dedicated worker container (cyberwave-worker-{env_uuid[:8]}) on each edge device. Worker scripts run inside this container with access to the Zenoh data bus, cached model weights, and all environment twin data.

One worker container runs per edge device (not per twin). Workers can consume data from all twins in the environment simultaneously.

Worker Directory

Place Python worker scripts in the edge config workers directory:

Platform	Path
All	`~/.cyberwave/workers/`

Model Requirements

Declare model dependencies in cyberwave.yml inside the workers directory:

models:
  - yolov8n
  - background-subtraction

Edge Core pre-downloads listed models before starting the worker container. Models are also auto-detected from cw.models.load("...") calls in worker Python files.

Picking the model format — stub

Each catalog model declares an edge runtime that tells the worker how to load the checkpoint. The runtime selector lives in the model editor (/models → “Add Model” or “Edit”), and is also accepted as a typed edge_runtime field on POST /api/v1/mlmodels and PUT /api/v1/mlmodels/{uuid}. The well-known runtimes mirror the loaders in the Cyberwave Python SDK’s cyberwave.models.runtimes registry:

Runtime	Extension	Loader
`ultralytics`	`.pt`	YOLOv5/8/11 via the Ultralytics package
`onnxruntime`	`.onnx`	ONNX Runtime (CPU, CUDA EP on the GPU image)
`opencv`	`.xml` / `.caffemodel`	OpenCV Haar / DNN models
`tflite`	`.tflite`	TensorFlow Lite
`torch`	`.pt` / `.pth`	TorchScript / raw PyTorch
`tensorrt`	`.engine` / `.trt`	TensorRT engines (GPU image only)
`whisper_cpp`	`.gguf` / `.bin`	Whisper.cpp speech-to-text on CPU edge devices
`faster_whisper`	CTranslate2 weights (HF cache)	faster-whisper STT on edge CPU/GPU
`hailo`	`.hef`	Hailo accelerator (`hailo_platform`) on Pi 5 + AI HAT+ / Pi AI Kit

Custom values are accepted via the editor’s “Other” entry — used today for framework-specific identifiers like sam2, sam3, and depth_anything_v2 that don’t have an SDK loader yet but still need to round-trip through the catalog. GET /api/v1/mlmodels/edge-runtimes returns the current well-known list (no auth required) so external tools can mirror the dropdown without hard-coding it.

stub: ONNX YOLO postprocessing applies per-class non-max suppression with a default IoU threshold of 0.7 so swapping yolov8s.pt for yolov8s.onnx produces the same number of boxes per object instead of a cluster of overlapping anchors. Override per-call via model.predict(frame, iou=0.5) (stricter) or iou=1.0 (raw output, no suppression).

stub: NMS-free detectors (YOLO26’s default one-to-one head, YOLOv10) emit [max_det, 6] = [x1, y1, x2, y2, conf, class_id]; the SDK’s ONNX runtime detects this layout automatically and parses it without the anchor-decoding step. The iou knob is ignored on that path — suppression is folded into the model graph. End-to-end pose / segmentation exports are not decoded yet; re-export those tasks with end2end=False until the e2e pose branch lands.

YOLO ONNX catalog — stub

The public catalog ships ONNX twins for the most common YOLO entries — Cyberwave hosts the exported weights at https://static.cyberwave.com/ml_models/{name}.onnx so edge nodes can fetch them with no Ultralytics CDN dependency. Roughly 2× faster on CPU than the .pt build:

Family	Sizes	Tasks
YOLOv8	n, s	detect; plus pose (n only)
YOLO26	n, s, m, l, x	detect, pose

Pick Edge Runtime: ONNX Runtime in the model editor (or metadata.edge_runtime = "onnxruntime" via the API) when registering a private fine-tune that you’ve exported via YOLO("…").export(format="onnx"). The SDK’s OnnxRuntime reads names and kpt_shape from the ONNX custom_metadata_map, so labelled detections and keypoints work out of the box. Segmentation, OBB, and classification ONNX exports are not yet supported by the SDK runtime — use the .pt (ultralytics) variants for those tasks.

YOLO Hailo catalog — stub

For Hailo-accelerated edge nodes (Raspberry Pi 5 + AI HAT+ and the older Pi AI Kit) the catalog also ships pre-compiled YOLO HEFs straight from the Hailo Model Zoo. Each upstream YOLO checkpoint gets two sibling rows — one per Hailo architecture — because HEFs are hardware-locked:

Slug	Family	Arch	Edge target
`yolov6n_h8` / `yolov6n_h8l`	YOLOv6 Nano	Hailo-8 / Hailo-8L	26 / 13 TOPS
`yolov8n_h8` / `yolov8n_h8l`	YOLOv8 Nano	Hailo-8 / Hailo-8L	26 / 13 TOPS
`yolov8s_h8` / `yolov8s_h8l`	YOLOv8 Small	Hailo-8 / Hailo-8L	26 / 13 TOPS
`yolov8m_h8` / `yolov8m_h8l`	YOLOv8 Medium	Hailo-8 / Hailo-8L	26 / 13 TOPS

Slugs use the _h8 / _h8l suffix on purpose: cw.models.load("yolov8s_h8") is enough to route through the SDK’s hailo runtime — no extra arguments needed. Pick the row that matches your edge target; a Hailo-8 HEF will refuse to run on a Hailo-8L (and vice versa). If you pick the wrong sibling, Edge Core catches it before the download starts: a Gate-3 preflight probes the connected accelerator with hailortcli fw-control identify and refuses to fetch a HEF whose metadata.hw_arch disagrees with the device, naming the correct sibling in the error message. The preflight is silent when hailortcli is not installed on the host — install it via apt install hailo-all (or the matching HailoRT package) to enable the check.

Edge speech-to-text — stub

Seeded Whisper.cpp STT models can run from Audio Track -> Call Model workflows on Raspberry Pi 4-class devices. The generated worker passes the audio chunk and seeded download_url into the local whisper_cpp runtime, so the first run downloads missing weights into the Cyberwave model cache. Downstream nodes can use text, transcribed_text, and segments. English GGML checkpoints: Tiny EN, Base EN, and Small EN (Q5_1). Multilingual checkpoints: Tiny Multilingual and Base Multilingual — hybrid deployment with edge_runtime: whisper_cpp on edge missions and whisper cloud-node fallback (openai/whisper-tiny / openai/whisper-base) on cloud missions. Use multilingual models when the spoken language is unknown or mixed. Hybrid Faster Whisper catalog entries (faster_whisper runtime) run on-device when the workflow mission executes on the edge; cloud missions use the whisper cloud node with the same OpenAI model aliases (openai/whisper-tiny → tiny.en, etc.). Prefer Faster Whisper Tiny EN for real-time English streams; use Whisper Tiny Multilingual Q5_1 for multilingual edge STT without CTranslate2.

Model weights resolution — stub

For each required model, Edge Core resolves weights in this order:

Local cache (intact). If ~/.cyberwave/models/{model_id}/... is present and its SHA-256 matches the manifest, it is used directly.
Cyberwave-hosted signed URL. If the catalog entry has a checkpoint mirror, Edge Core fetches a signed URL via GET /api/v1/mlmodels/{uuid}/weights and downloads from our private bucket.
Upstream weights URL. If no Cyberwave mirror exists, Edge Core falls back to the public download_url from the catalog (e.g. an official Ultralytics release).
Stale cache fallback. If every download attempt fails but the local file is intact, Edge Core returns the cached file with a warning. This keeps workers running across transient network failures and on permanently air-gapped sites.

Operators on air-gapped sites can pre-stage weights by copying them to ~/.cyberwave/models/{model_id}/. Edge Core computes a SHA-256, infers the runtime from the file extension (.pt, .onnx, .engine/.trt, .tflite, .pth, .xml), and writes a sidecar metadata.json on the next worker start. To update a pre-staged model, simply overwrite the file in place — Edge Core re-stamps the manifest from disk on the next call (no re-download attempted). Pre-staged files are never auto-overwritten by catalog updates; to force a re-download from Cyberwave, evict the model directory (rm -rf ~/.cyberwave/models/{model_id}).

CLI Commands

cyberwave worker start      # Start the worker container
cyberwave worker stop       # Stop the worker container
cyberwave worker restart    # Restart (re-scans workers, re-downloads models)
cyberwave worker status     # Show container state and loaded workers
cyberwave worker health     # Show detailed restart history and circuit-breaker state
cyberwave worker logs       # Stream worker container logs

These commands are also available via cyberwave-edge-core worker … if you prefer to use the edge-core CLI directly.

Hot-Reload on File Changes

Edge Core monitors the workers directory every ~15 seconds. When .py files are added, removed, or modified, the worker container is automatically restarted with the updated set of workers. A minimum cool-down of 10 seconds between successive automatic restarts prevents rapid churn when files are written incrementally.

Health Monitoring

Edge Core continuously monitors the worker container:

Restart accounting: every restart is recorded with timestamp and reason.
Circuit-breaker: after 5 restarts in 5 minutes, automatic restarts are suppressed until the window clears. Run worker health to inspect the state.
Spontaneous exit detection: if the container exits without a deliberate restart, a warning is logged.

Performance Tuning — stub

Model warm-up

The worker runtime automatically runs two dummy inferences on each loaded model at startup to eliminate cold-start latency (JIT compilation, memory allocation). Cold vs warm latency is logged. You can also warm up models explicitly:

model = cw.models.load("yolov8n")
cold_ms, warm_ms = model.warm_up(input_shape=(640, 640, 3))

Frame resolution scaling

Set CYBERWAVE_WORKER_INPUT_RESOLUTION to downscale incoming frames before they reach your worker hooks. This reduces inference time on constrained devices without changing the camera driver’s publish resolution.

export CYBERWAVE_WORKER_INPUT_RESOLUTION=640x480

Shared memory transport

Zenoh shared-memory (SHM) transport offers zero-copy frame delivery between the camera driver and worker containers on the same host. Edge Core leaves ZENOH_SHARED_MEMORY disabled by default because SHM between Docker containers requires them to share an IPC namespace via --ipc=host, which weakens container isolation and has historically been a source of instability in production. To opt in, set ZENOH_SHARED_MEMORY=true in the edge-core process environment and ensure every Cyberwave container is launched with --ipc=host. Edge Core then propagates the flag to both driver and worker containers through the same env-builder, keeping the two sides in lock-step.

GPU Access

Edge Core detects the NVIDIA container runtime and passes --gpus all to the worker container when available.

Image variants — stub

The worker image is published in three variants on Docker Hub:

Tag	Base	Accelerator	Architectures	Use when
`cyberwaveos/edge-ml-worker:<tag>`	`ubuntu:24.04`	CPU (`onnxruntime`)	`linux/amd64`, `linux/arm64`	No GPU available, or inference is light enough for CPU.
`cyberwaveos/edge-ml-worker:<tag>-gpu`	`nvidia/cuda:12.6.3-runtime-ubuntu24.04`	NVIDIA (`onnxruntime-gpu`)	`linux/amd64`	Edge device has an NVIDIA GPU and `nvidia-container-toolkit` installed.
`cyberwaveos/edge-ml-worker:<tag>-hailo`	`cyberwaveos/edge-ml-worker:<tag>`	Hailo-8 / Hailo-8L (`hailo_platform`)	`linux/arm64`	Edge device has a Hailo accelerator exposed at `/dev/hailo0` (e.g. Raspberry Pi 5 + AI HAT+).

The -hailo variant compiles HailoRT (libhailort + hailortcli + pyhailort) from the MIT-licensed upstream source at hailo-ai/hailort at a pinned tag. The LGPL-2.1 GStreamer plugin (hailonet) is left out of the build. The companion kernel driver (hailo-ai/hailort-drivers) is installed on the host (e.g. via apt install hailo-all on Raspberry Pi OS), not in the container — its minor version must match the userspace lib baked into the image. All three variants ship the same Python API. PyTorch/Ultralytics models pick the device automatically; ONNX models gain CUDAExecutionProvider only on the -gpu variant; .hef files load through the hailo runtime only on the -hailo variant. Edge Core selects the variant automatically based on host capabilities:

NVIDIA container runtime detected → appends -gpu to the configured worker image tag.
/dev/hailo0 present and no GPU runtime → appends -hailo, adds --device /dev/hailo0:/dev/hailo0:rwm and --group-add hailo (on HailoRT < 4.20 hosts), and sets CYBERWAVE_REQUIRED_DEVICES=/dev/hailo0 so the image’s Gate-4 entrypoint fails fast if the passthrough is missing.
Neither → uses the CPU base image.

GPU takes precedence over Hailo when both are available on the same host; mixing them in a single deployment is not supported. If a variant cannot be pulled (e.g. the Hailo image isn’t published for your channel yet), Edge Core falls back to the CPU image and logs the demotion.

​Overview

​Worker Directory

​Model Requirements

​Picking the model format — stub

​YOLO ONNX catalog — stub

​YOLO Hailo catalog — stub

​Edge speech-to-text — stub

​Model weights resolution — stub

​CLI Commands

​Hot-Reload on File Changes

​Health Monitoring

​Performance Tuning — stub

​Model warm-up

​Frame resolution scaling

​Shared memory transport

​GPU Access

​Image variants — stub

Overview

Worker Directory

Model Requirements

Picking the model format — stub

YOLO ONNX catalog — stub

YOLO Hailo catalog — stub

Edge speech-to-text — stub

Model weights resolution — stub

CLI Commands

Hot-Reload on File Changes

Health Monitoring

Performance Tuning — stub

Model warm-up

Frame resolution scaling

Shared memory transport

GPU Access

Image variants — stub