> ## Documentation Index
> Fetch the complete documentation index at: https://docs.cyberwave.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Edge Worker Container

> Run ML worker scripts on edge devices with managed lifecycle, hot-reload, and health monitoring

## Overview

Edge Core manages a dedicated worker container (`cyberwave-worker-{env_uuid[:8]}`) on each edge device. Worker scripts run inside this container with access to the Zenoh data bus, cached model weights, and all environment twin data.

<Info>
  One worker container runs per edge device (not per twin). Workers can consume
  data from all twins in the environment simultaneously.
</Info>

## Worker Directory

Place Python worker scripts in the edge config workers directory:

| Platform | Path                    |
| -------- | ----------------------- |
| All      | `~/.cyberwave/workers/` |

## Model Requirements

Declare model dependencies in `cyberwave.yml` inside the workers directory:

```yaml theme={null}
models:
  - yolov8n
  - background-subtraction
```

Edge Core pre-downloads listed models before starting the worker container. Models are also auto-detected from `cw.models.load("...")` calls in worker Python files.

### Picking the model format — *stub*

Each catalog model declares an **edge runtime** that tells the worker how to load the checkpoint. The runtime selector lives in the model editor (`/models` → "Add Model" or "Edit"), and is also accepted as a typed `edge_runtime` field on `POST /api/v1/mlmodels` and `PUT /api/v1/mlmodels/{uuid}`.

The well-known runtimes mirror the loaders in the Cyberwave Python SDK's `cyberwave.models.runtimes` registry:

| Runtime          | Extension                      | Loader                                                             |
| ---------------- | ------------------------------ | ------------------------------------------------------------------ |
| `ultralytics`    | `.pt`                          | YOLOv5/8/11 via the Ultralytics package                            |
| `onnxruntime`    | `.onnx`                        | ONNX Runtime (CPU, CUDA EP on the GPU image)                       |
| `opencv`         | `.xml` / `.caffemodel`         | OpenCV Haar / DNN models                                           |
| `tflite`         | `.tflite`                      | TensorFlow Lite                                                    |
| `torch`          | `.pt` / `.pth`                 | TorchScript / raw PyTorch                                          |
| `tensorrt`       | `.engine` / `.trt`             | TensorRT engines (GPU image only)                                  |
| `whisper_cpp`    | `.gguf` / `.bin`               | Whisper.cpp speech-to-text on CPU edge devices                     |
| `faster_whisper` | CTranslate2 weights (HF cache) | faster-whisper STT on edge CPU/GPU                                 |
| `hailo`          | `.hef`                         | Hailo accelerator (`hailo_platform`) on Pi 5 + AI HAT+ / Pi AI Kit |

Custom values are accepted via the editor's "Other" entry — used today for framework-specific identifiers like `sam2`, `sam3`, and `depth_anything_v2` that don't have an SDK loader yet but still need to round-trip through the catalog.

`GET /api/v1/mlmodels/edge-runtimes` returns the current well-known list (no auth required) so external tools can mirror the dropdown without hard-coding it.

> stub: ONNX YOLO postprocessing applies per-class non-max suppression with a default IoU threshold of `0.7` so swapping `yolov8s.pt` for `yolov8s.onnx` produces the same number of boxes per object instead of a cluster of overlapping anchors. Override per-call via `model.predict(frame, iou=0.5)` (stricter) or `iou=1.0` (raw output, no suppression).

> stub: NMS-free detectors (YOLO26's default one-to-one head, YOLOv10) emit `[max_det, 6] = [x1, y1, x2, y2, conf, class_id]`; the SDK's ONNX runtime detects this layout automatically and parses it without the anchor-decoding step. The `iou` knob is ignored on that path — suppression is folded into the model graph. End-to-end pose / segmentation exports are not decoded yet; re-export those tasks with `end2end=False` until the e2e pose branch lands.

### YOLO ONNX catalog — *stub*

The public catalog ships ONNX twins for the most common YOLO entries — Cyberwave hosts the exported weights at `https://static.cyberwave.com/ml_models/{name}.onnx` so edge nodes can fetch them with no Ultralytics CDN dependency. Roughly 2× faster on CPU than the `.pt` build:

| Family | Sizes         | Tasks                      |
| ------ | ------------- | -------------------------- |
| YOLOv8 | n, s          | detect; plus pose (n only) |
| YOLO26 | n, s, m, l, x | detect, pose               |

Pick `Edge Runtime: ONNX Runtime` in the model editor (or `metadata.edge_runtime = "onnxruntime"` via the API) when registering a private fine-tune that you've exported via `YOLO("…").export(format="onnx")`. The SDK's `OnnxRuntime` reads `names` and `kpt_shape` from the ONNX `custom_metadata_map`, so labelled detections and keypoints work out of the box.

Segmentation, OBB, and classification ONNX exports are not yet supported by the SDK runtime — use the `.pt` (`ultralytics`) variants for those tasks.

### YOLO Hailo catalog — *stub*

For Hailo-accelerated edge nodes (Raspberry Pi 5 + AI HAT+ and the older Pi AI Kit) the catalog also ships pre-compiled YOLO HEFs straight from the Hailo Model Zoo. Each upstream YOLO checkpoint gets two sibling rows — one per Hailo architecture — because HEFs are hardware-locked:

| Slug                         | Family        | Arch               | Edge target  |
| ---------------------------- | ------------- | ------------------ | ------------ |
| `yolov6n_h8` / `yolov6n_h8l` | YOLOv6 Nano   | Hailo-8 / Hailo-8L | 26 / 13 TOPS |
| `yolov8n_h8` / `yolov8n_h8l` | YOLOv8 Nano   | Hailo-8 / Hailo-8L | 26 / 13 TOPS |
| `yolov8s_h8` / `yolov8s_h8l` | YOLOv8 Small  | Hailo-8 / Hailo-8L | 26 / 13 TOPS |
| `yolov8m_h8` / `yolov8m_h8l` | YOLOv8 Medium | Hailo-8 / Hailo-8L | 26 / 13 TOPS |

Slugs use the `_h8` / `_h8l` suffix on purpose: `cw.models.load("yolov8s_h8")` is enough to route through the SDK's `hailo` runtime — no extra arguments needed. Pick the row that matches your edge target; a Hailo-8 HEF will refuse to run on a Hailo-8L (and vice versa).

If you pick the wrong sibling, Edge Core catches it before the download starts: a Gate-3 preflight probes the connected accelerator with `hailortcli fw-control identify` and refuses to fetch a HEF whose `metadata.hw_arch` disagrees with the device, naming the correct sibling in the error message. The preflight is silent when `hailortcli` is not installed on the host — install it via `apt install hailo-all` (or the matching HailoRT package) to enable the check.

### Edge speech-to-text — *stub*

Seeded Whisper.cpp STT models can run from `Audio Track -> Call Model` workflows on Raspberry Pi 4-class devices. The generated worker passes the audio chunk and seeded `download_url` into the local `whisper_cpp` runtime, so the first run downloads missing weights into the Cyberwave model cache. Downstream nodes can use `text`, `transcribed_text`, and `segments`.

English GGML checkpoints: **Tiny EN**, **Base EN**, and **Small EN** (Q5\_1). Multilingual checkpoints: **Tiny Multilingual** and **Base Multilingual** — hybrid deployment with `edge_runtime: whisper_cpp` on edge missions and whisper cloud-node fallback (`openai/whisper-tiny` / `openai/whisper-base`) on cloud missions. Use multilingual models when the spoken language is unknown or mixed.

Hybrid **Faster Whisper** catalog entries (`faster_whisper` runtime) run on-device when the workflow mission executes on the edge; cloud missions use the whisper cloud node with the same OpenAI model aliases (`openai/whisper-tiny` → `tiny.en`, etc.). Prefer **Faster Whisper Tiny EN** for real-time English streams; use **Whisper Tiny Multilingual Q5\_1** for multilingual edge STT without CTranslate2.

### Model weights resolution — *stub*

For each required model, Edge Core resolves weights in this order:

1. **Local cache (intact).** If `~/.cyberwave/models/{model_id}/...` is present and its SHA-256 matches the manifest, it is used directly.
2. **Cyberwave-hosted signed URL.** If the catalog entry has a checkpoint mirror, Edge Core fetches a signed URL via `GET /api/v1/mlmodels/{uuid}/weights` and downloads from our private bucket.
3. **Upstream weights URL.** If no Cyberwave mirror exists, Edge Core falls back to the public `download_url` from the catalog (e.g. an official Ultralytics release).
4. **Stale cache fallback.** If every download attempt fails but the local file is intact, Edge Core returns the cached file with a warning. This keeps workers running across transient network failures and on permanently air-gapped sites.

Operators on air-gapped sites can pre-stage weights by copying them to `~/.cyberwave/models/{model_id}/`. Edge Core computes a SHA-256, infers the runtime from the file extension (`.pt`, `.onnx`, `.engine`/`.trt`, `.tflite`, `.pth`, `.xml`), and writes a sidecar `metadata.json` on the next worker start. To update a pre-staged model, simply overwrite the file in place — Edge Core re-stamps the manifest from disk on the next call (no re-download attempted). Pre-staged files are never auto-overwritten by catalog updates; to force a re-download from Cyberwave, evict the model directory (`rm -rf ~/.cyberwave/models/{model_id}`).

## CLI Commands

```bash theme={null}
cyberwave worker start      # Start the worker container
cyberwave worker stop       # Stop the worker container
cyberwave worker restart    # Restart (re-scans workers, re-downloads models)
cyberwave worker status     # Show container state and loaded workers
cyberwave worker health     # Show detailed restart history and circuit-breaker state
cyberwave worker logs       # Stream worker container logs
```

<Tip>
  These commands are also available via `cyberwave-edge-core worker …` if you prefer to use the edge-core CLI directly.
</Tip>

## Hot-Reload on File Changes

Edge Core monitors the workers directory every \~15 seconds. When `.py` files are added, removed, or modified, the worker container is automatically restarted with the updated set of workers.

A minimum cool-down of 10 seconds between successive automatic restarts prevents rapid churn when files are written incrementally.

## Health Monitoring

Edge Core continuously monitors the worker container:

* **Restart accounting**: every restart is recorded with timestamp and reason.
* **Circuit-breaker**: after 5 restarts in 5 minutes, automatic restarts are suppressed until the window clears. Run `worker health` to inspect the state.
* **Spontaneous exit detection**: if the container exits without a deliberate restart, a warning is logged.

## Performance Tuning — *stub*

### Model warm-up

The worker runtime automatically runs two dummy inferences on each loaded model at startup to eliminate cold-start latency (JIT compilation, memory allocation). Cold vs warm latency is logged.

You can also warm up models explicitly:

```python theme={null}
model = cw.models.load("yolov8n")
cold_ms, warm_ms = model.warm_up(input_shape=(640, 640, 3))
```

### Frame resolution scaling

Set `CYBERWAVE_WORKER_INPUT_RESOLUTION` to downscale incoming frames before they reach your worker hooks. This reduces inference time on constrained devices without changing the camera driver's publish resolution.

```bash theme={null}
export CYBERWAVE_WORKER_INPUT_RESOLUTION=640x480
```

### Shared memory transport

Zenoh shared-memory (SHM) transport offers zero-copy frame delivery between the camera driver and worker containers on the same host. Edge Core leaves `ZENOH_SHARED_MEMORY` **disabled by default** because SHM between Docker containers requires them to share an IPC namespace via `--ipc=host`, which weakens container isolation and has historically been a source of instability in production.

To opt in, set `ZENOH_SHARED_MEMORY=true` in the edge-core process environment **and** ensure every Cyberwave container is launched with `--ipc=host`. Edge Core then propagates the flag to both driver and worker containers through the same env-builder, keeping the two sides in lock-step.

## GPU Access

Edge Core detects the NVIDIA container runtime and passes `--gpus all` to the worker container when available.

### Image variants — *stub*

The worker image is published in three variants on Docker Hub:

| Tag                                      | Base                                     | Accelerator                           | Architectures                | Use when                                                                                      |
| ---------------------------------------- | ---------------------------------------- | ------------------------------------- | ---------------------------- | --------------------------------------------------------------------------------------------- |
| `cyberwaveos/edge-ml-worker:<tag>`       | `ubuntu:24.04`                           | CPU (`onnxruntime`)                   | `linux/amd64`, `linux/arm64` | No GPU available, or inference is light enough for CPU.                                       |
| `cyberwaveos/edge-ml-worker:<tag>-gpu`   | `nvidia/cuda:12.6.3-runtime-ubuntu24.04` | NVIDIA (`onnxruntime-gpu`)            | `linux/amd64`                | Edge device has an NVIDIA GPU and `nvidia-container-toolkit` installed.                       |
| `cyberwaveos/edge-ml-worker:<tag>-hailo` | `cyberwaveos/edge-ml-worker:<tag>`       | Hailo-8 / Hailo-8L (`hailo_platform`) | `linux/arm64`                | Edge device has a Hailo accelerator exposed at `/dev/hailo0` (e.g. Raspberry Pi 5 + AI HAT+). |

The `-hailo` variant compiles HailoRT (`libhailort` + `hailortcli` + `pyhailort`) from the MIT-licensed upstream source at [`hailo-ai/hailort`](https://github.com/hailo-ai/hailort) at a pinned tag. The LGPL-2.1 GStreamer plugin (`hailonet`) is left out of the build. The companion kernel driver ([`hailo-ai/hailort-drivers`](https://github.com/hailo-ai/hailort-drivers)) is installed on the host (e.g. via `apt install hailo-all` on Raspberry Pi OS), not in the container — its minor version must match the userspace lib baked into the image.

All three variants ship the same Python API. PyTorch/Ultralytics models pick the device automatically; ONNX models gain `CUDAExecutionProvider` only on the `-gpu` variant; `.hef` files load through the `hailo` runtime only on the `-hailo` variant.

Edge Core selects the variant automatically based on host capabilities:

* **NVIDIA container runtime detected** → appends `-gpu` to the configured worker image tag.
* **`/dev/hailo0` present and no GPU runtime** → appends `-hailo`, adds `--device /dev/hailo0:/dev/hailo0:rwm` and `--group-add hailo` (on HailoRT \< 4.20 hosts), and sets `CYBERWAVE_REQUIRED_DEVICES=/dev/hailo0` so the image's Gate-4 entrypoint fails fast if the passthrough is missing.
* **Neither** → uses the CPU base image.

GPU takes precedence over Hailo when both are available on the same host; mixing them in a single deployment is not supported. If a variant cannot be pulled (e.g. the Hailo image isn't published for your channel yet), Edge Core falls back to the CPU image and logs the demotion.