Overview
Edge Core manages a dedicated worker container (cyberwave-worker-{env_uuid[:8]}) on each edge device. Worker scripts run inside this container with access to the Zenoh data bus, cached model weights, and all environment twin data.
One worker container runs per edge device (not per twin). Workers can consume
data from all twins in the environment simultaneously.
Worker Directory
Place Python worker scripts in the edge config workers directory:
| Platform | Path |
|---|
| All | ~/.cyberwave/workers/ |
Model Requirements
Declare model dependencies in cyberwave.yml inside the workers directory:
models:
- yolov8n
- background-subtraction
Edge Core pre-downloads listed models before starting the worker container. Models are also auto-detected from cw.models.load("...") calls in worker Python files.
Each catalog model declares an edge runtime that tells the worker how to load the checkpoint. The runtime selector lives in the model editor (/models → “Add Model” or “Edit”), and is also accepted as a typed edge_runtime field on POST /api/v1/mlmodels and PUT /api/v1/mlmodels/{uuid}.
The well-known runtimes mirror the loaders in the Cyberwave Python SDK’s cyberwave.models.runtimes registry:
| Runtime | Extension | Loader |
|---|
ultralytics | .pt | YOLOv5/8/11 via the Ultralytics package |
onnxruntime | .onnx | ONNX Runtime (CPU, CUDA EP on the GPU image) |
opencv | .xml / .caffemodel | OpenCV Haar / DNN models |
tflite | .tflite | TensorFlow Lite |
torch | .pt / .pth | TorchScript / raw PyTorch |
tensorrt | .engine / .trt | TensorRT engines (GPU image only) |
whisper_cpp | .gguf / .bin | Whisper.cpp speech-to-text on CPU edge devices |
faster_whisper | CTranslate2 weights (HF cache) | faster-whisper STT on edge CPU/GPU |
hailo | .hef | Hailo accelerator (hailo_platform) on Pi 5 + AI HAT+ / Pi AI Kit |
Custom values are accepted via the editor’s “Other” entry — used today for framework-specific identifiers like sam2, sam3, and depth_anything_v2 that don’t have an SDK loader yet but still need to round-trip through the catalog.
GET /api/v1/mlmodels/edge-runtimes returns the current well-known list (no auth required) so external tools can mirror the dropdown without hard-coding it.
stub: ONNX YOLO postprocessing applies per-class non-max suppression with a default IoU threshold of 0.7 so swapping yolov8s.pt for yolov8s.onnx produces the same number of boxes per object instead of a cluster of overlapping anchors. Override per-call via model.predict(frame, iou=0.5) (stricter) or iou=1.0 (raw output, no suppression).
stub: NMS-free detectors (YOLO26’s default one-to-one head, YOLOv10) emit [max_det, 6] = [x1, y1, x2, y2, conf, class_id]; the SDK’s ONNX runtime detects this layout automatically and parses it without the anchor-decoding step. The iou knob is ignored on that path — suppression is folded into the model graph. End-to-end pose / segmentation exports are not decoded yet; re-export those tasks with end2end=False until the e2e pose branch lands.
YOLO ONNX catalog — stub
The public catalog ships ONNX twins for the most common YOLO entries — Cyberwave hosts the exported weights at https://static.cyberwave.com/ml_models/{name}.onnx so edge nodes can fetch them with no Ultralytics CDN dependency. Roughly 2× faster on CPU than the .pt build:
| Family | Sizes | Tasks |
|---|
| YOLOv8 | n, s | detect; plus pose (n only) |
| YOLO26 | n, s, m, l, x | detect, pose |
Pick Edge Runtime: ONNX Runtime in the model editor (or metadata.edge_runtime = "onnxruntime" via the API) when registering a private fine-tune that you’ve exported via YOLO("…").export(format="onnx"). The SDK’s OnnxRuntime reads names and kpt_shape from the ONNX custom_metadata_map, so labelled detections and keypoints work out of the box.
Segmentation, OBB, and classification ONNX exports are not yet supported by the SDK runtime — use the .pt (ultralytics) variants for those tasks.
YOLO Hailo catalog — stub
For Hailo-accelerated edge nodes (Raspberry Pi 5 + AI HAT+ and the older Pi AI Kit) the catalog also ships pre-compiled YOLO HEFs straight from the Hailo Model Zoo. Each upstream YOLO checkpoint gets two sibling rows — one per Hailo architecture — because HEFs are hardware-locked:
| Slug | Family | Arch | Edge target |
|---|
yolov6n_h8 / yolov6n_h8l | YOLOv6 Nano | Hailo-8 / Hailo-8L | 26 / 13 TOPS |
yolov8n_h8 / yolov8n_h8l | YOLOv8 Nano | Hailo-8 / Hailo-8L | 26 / 13 TOPS |
yolov8s_h8 / yolov8s_h8l | YOLOv8 Small | Hailo-8 / Hailo-8L | 26 / 13 TOPS |
yolov8m_h8 / yolov8m_h8l | YOLOv8 Medium | Hailo-8 / Hailo-8L | 26 / 13 TOPS |
Slugs use the _h8 / _h8l suffix on purpose: cw.models.load("yolov8s_h8") is enough to route through the SDK’s hailo runtime — no extra arguments needed. Pick the row that matches your edge target; a Hailo-8 HEF will refuse to run on a Hailo-8L (and vice versa).
If you pick the wrong sibling, Edge Core catches it before the download starts: a Gate-3 preflight probes the connected accelerator with hailortcli fw-control identify and refuses to fetch a HEF whose metadata.hw_arch disagrees with the device, naming the correct sibling in the error message. The preflight is silent when hailortcli is not installed on the host — install it via apt install hailo-all (or the matching HailoRT package) to enable the check.
Edge speech-to-text — stub
Seeded Whisper.cpp STT models can run from Audio Track -> Call Model workflows on Raspberry Pi 4-class devices. The generated worker passes the audio chunk and seeded download_url into the local whisper_cpp runtime, so the first run downloads missing weights into the Cyberwave model cache. Downstream nodes can use text, transcribed_text, and segments.
English GGML checkpoints: Tiny EN, Base EN, and Small EN (Q5_1). Multilingual checkpoints: Tiny Multilingual and Base Multilingual — hybrid deployment with edge_runtime: whisper_cpp on edge missions and whisper cloud-node fallback (openai/whisper-tiny / openai/whisper-base) on cloud missions. Use multilingual models when the spoken language is unknown or mixed.
Hybrid Faster Whisper catalog entries (faster_whisper runtime) run on-device when the workflow mission executes on the edge; cloud missions use the whisper cloud node with the same OpenAI model aliases (openai/whisper-tiny → tiny.en, etc.). Prefer Faster Whisper Tiny EN for real-time English streams; use Whisper Tiny Multilingual Q5_1 for multilingual edge STT without CTranslate2.
Model weights resolution — stub
For each required model, Edge Core resolves weights in this order:
- Local cache (intact). If
~/.cyberwave/models/{model_id}/... is present and its SHA-256 matches the manifest, it is used directly.
- Cyberwave-hosted signed URL. If the catalog entry has a checkpoint mirror, Edge Core fetches a signed URL via
GET /api/v1/mlmodels/{uuid}/weights and downloads from our private bucket.
- Upstream weights URL. If no Cyberwave mirror exists, Edge Core falls back to the public
download_url from the catalog (e.g. an official Ultralytics release).
- Stale cache fallback. If every download attempt fails but the local file is intact, Edge Core returns the cached file with a warning. This keeps workers running across transient network failures and on permanently air-gapped sites.
Operators on air-gapped sites can pre-stage weights by copying them to ~/.cyberwave/models/{model_id}/. Edge Core computes a SHA-256, infers the runtime from the file extension (.pt, .onnx, .engine/.trt, .tflite, .pth, .xml), and writes a sidecar metadata.json on the next worker start. To update a pre-staged model, simply overwrite the file in place — Edge Core re-stamps the manifest from disk on the next call (no re-download attempted). Pre-staged files are never auto-overwritten by catalog updates; to force a re-download from Cyberwave, evict the model directory (rm -rf ~/.cyberwave/models/{model_id}).
CLI Commands
cyberwave worker start # Start the worker container
cyberwave worker stop # Stop the worker container
cyberwave worker restart # Restart (re-scans workers, re-downloads models)
cyberwave worker status # Show container state and loaded workers
cyberwave worker health # Show detailed restart history and circuit-breaker state
cyberwave worker logs # Stream worker container logs
These commands are also available via cyberwave-edge-core worker … if you prefer to use the edge-core CLI directly.
Hot-Reload on File Changes
Edge Core monitors the workers directory every ~15 seconds. When .py files are added, removed, or modified, the worker container is automatically restarted with the updated set of workers.
A minimum cool-down of 10 seconds between successive automatic restarts prevents rapid churn when files are written incrementally.
Health Monitoring
Edge Core continuously monitors the worker container:
- Restart accounting: every restart is recorded with timestamp and reason.
- Circuit-breaker: after 5 restarts in 5 minutes, automatic restarts are suppressed until the window clears. Run
worker health to inspect the state.
- Spontaneous exit detection: if the container exits without a deliberate restart, a warning is logged.
Model warm-up
The worker runtime automatically runs two dummy inferences on each loaded model at startup to eliminate cold-start latency (JIT compilation, memory allocation). Cold vs warm latency is logged.
You can also warm up models explicitly:
model = cw.models.load("yolov8n")
cold_ms, warm_ms = model.warm_up(input_shape=(640, 640, 3))
Frame resolution scaling
Set CYBERWAVE_WORKER_INPUT_RESOLUTION to downscale incoming frames before they reach your worker hooks. This reduces inference time on constrained devices without changing the camera driver’s publish resolution.
export CYBERWAVE_WORKER_INPUT_RESOLUTION=640x480
Shared memory transport
Zenoh shared-memory (SHM) transport offers zero-copy frame delivery between the camera driver and worker containers on the same host. Edge Core leaves ZENOH_SHARED_MEMORY disabled by default because SHM between Docker containers requires them to share an IPC namespace via --ipc=host, which weakens container isolation and has historically been a source of instability in production.
To opt in, set ZENOH_SHARED_MEMORY=true in the edge-core process environment and ensure every Cyberwave container is launched with --ipc=host. Edge Core then propagates the flag to both driver and worker containers through the same env-builder, keeping the two sides in lock-step.
GPU Access
Edge Core detects the NVIDIA container runtime and passes --gpus all to the worker container when available.
Image variants — stub
The worker image is published in three variants on Docker Hub:
| Tag | Base | Accelerator | Architectures | Use when |
|---|
cyberwaveos/edge-ml-worker:<tag> | ubuntu:24.04 | CPU (onnxruntime) | linux/amd64, linux/arm64 | No GPU available, or inference is light enough for CPU. |
cyberwaveos/edge-ml-worker:<tag>-gpu | nvidia/cuda:12.6.3-runtime-ubuntu24.04 | NVIDIA (onnxruntime-gpu) | linux/amd64 | Edge device has an NVIDIA GPU and nvidia-container-toolkit installed. |
cyberwaveos/edge-ml-worker:<tag>-hailo | cyberwaveos/edge-ml-worker:<tag> | Hailo-8 / Hailo-8L (hailo_platform) | linux/arm64 | Edge device has a Hailo accelerator exposed at /dev/hailo0 (e.g. Raspberry Pi 5 + AI HAT+). |
The -hailo variant compiles HailoRT (libhailort + hailortcli + pyhailort) from the MIT-licensed upstream source at hailo-ai/hailort at a pinned tag. The LGPL-2.1 GStreamer plugin (hailonet) is left out of the build. The companion kernel driver (hailo-ai/hailort-drivers) is installed on the host (e.g. via apt install hailo-all on Raspberry Pi OS), not in the container — its minor version must match the userspace lib baked into the image.
All three variants ship the same Python API. PyTorch/Ultralytics models pick the device automatically; ONNX models gain CUDAExecutionProvider only on the -gpu variant; .hef files load through the hailo runtime only on the -hailo variant.
Edge Core selects the variant automatically based on host capabilities:
- NVIDIA container runtime detected → appends
-gpu to the configured worker image tag.
/dev/hailo0 present and no GPU runtime → appends -hailo, adds --device /dev/hailo0:/dev/hailo0:rwm and --group-add hailo (on HailoRT < 4.20 hosts), and sets CYBERWAVE_REQUIRED_DEVICES=/dev/hailo0 so the image’s Gate-4 entrypoint fails fast if the passthrough is missing.
- Neither → uses the CPU base image.
GPU takes precedence over Hailo when both are available on the same host; mixing them in a single deployment is not supported. If a variant cannot be pulled (e.g. the Hailo image isn’t published for your channel yet), Edge Core falls back to the CPU image and logs the demotion.