Alerts - Cyberwave Docs

STUB DOCUMENT: This page is intentionally minimal and will be expanded with deeper technical details in a future update.

Alerts notify operators that action is needed. They are displayed prominently in the UI (environment view, twin detail, etc.). Alerts have a category:

technical (default): operational or hardware events (e.g. robot stuck, calibration needed).
business: business-process events (e.g. order delayed, SLA threshold exceeded).

Examples:

A robot needs calibration.
A robot got stuck and needs remote takeover.
A sensor reading is out of expected range.
An order has exceeded its SLA deadline.

Model

Every alert belongs to a workspace and must be attached to at least one of: twin, environment, or workflow.

Field	Type	Notes
`name`	string	Human-readable title
`description`	text	Optional details
`alert_type`	string	Machine-readable code (e.g. `calibration_needed`, `robot_stuck`)
`severity`	enum	`info`, `warning`, `error`, `critical` (default: `warning`)
`status`	enum	`active`, `acknowledged`, `resolved`, `silenced` (default: `active`)
`source_type`	enum	`edge`, `simulation`, `cloud`, `workflow` (default: `edge`)
`category`	enum	`technical`, `business` (default: `technical`)

Lifecycle

Active: new alert, requires attention.
Acknowledged: an operator has seen it but the issue is not yet fixed.
Resolved: the root cause has been addressed (by edge device or operator).
Silenced: suppressed workspace-wide without resolving the root cause.

Idempotent resolve and silence

The POST /api/v1/alerts/{uuid}/resolve and POST /api/v1/alerts/{uuid}/silence endpoints are idempotent:

Resolving an already-resolved alert returns 200 with the alert body (no-op).
Resolving a silenced alert is allowed — it transitions the alert to resolved.
Silencing an already-silenced alert returns 200 with the alert body (no-op).
Silencing a resolved alert returns 400.

This means edge drivers and operator UIs can safely call resolve without checking current status first, avoiding race conditions when multiple actors act on the same alert concurrently.

Workflow-scoped alerts

Alerts produced by a workflow’s send_alert node carry a workflow_uuid, and Alert.workflow is set on the backend. List alerts for a single workflow with:

GET /api/v1/alerts?workflow_uuid={workflow_uuid}

Generated edge workers call client.publish_alert(..., workflow_uuid=WORKFLOW_UUID, workflow_node_uuid=..., workflow_execution_uuid=...). The SDK forwards workflow_uuid as a top-level field (sets the FK) and merges workflow_node_uuid / workflow_execution_uuid into metadata for full provenance.

Source attribution: `metadata.source_chain`

STUB: Behavior is implemented; this section will be expanded with screenshots and the full kind catalogue.

Every alert raised by a workflow’s send_alert node carries an ordered source_chain under metadata describing each upstream node that fed into the decision. The chain is built generically — any node emitter that opts in by parking a _source_summary on its output dict contributes an entry, so the mechanism works equally well for camera-perception, audio-track, alert-triggered, manual, scheduled, or any future trigger source. Each entry includes:

kind — short identifier (camera_frame, audio_track, alert_trigger, manual_trigger, schedule_trigger, call_model, detection_event_gate, conditional).
node_uuid — the workflow node that contributed the entry.
Kind-specific fields. Examples: twin_uuid + sensor for camera/audio triggers; model_uuid + model_name + a capped detections_sample for call_model; mode + matched_classes + cooldown_seconds for detection_event_gate.

Example payload for a camera_frame → call_model → send_alert workflow:

{
  "metadata": {
    "workflow_uuid": "...",
    "workflow_node_uuid": "...",
    "workflow_execution_uuid": "...",
    "source_chain": [
      {
        "kind": "camera_frame",
        "node_uuid": "...",
        "twin_uuid": "...",
        "sensor": "front",
        "frame_ts": 1746651234.5
      },
      {
        "kind": "call_model",
        "node_uuid": "...",
        "model_uuid": "...",
        "model_name": "YOLOv8n",
        "modality": "image",
        "output_format": "detections",
        "detections_total": 2,
        "detections_sample": [
          { "label": "person", "confidence": 0.91, "bbox_pixels": [12, 34, 56, 78] }
        ]
      }
    ]
  }
}

Notes:

The chain is purely additive. Adding _source_summary does not change dedupe_hash (computed over name + description + alert_type + severity + status + twin_uuid), so dedupe behavior is unchanged.
detections_sample is capped to keep alert metadata bounded; the original detections_total is preserved.
User-supplied static metadata keys always win on conflict — the source chain only fills in metadata['source_chain'] when not already provided.

Edge Core system alerts

STUB: This section will be expanded with the full alert type catalogue.

Edge Core automatically raises technical alerts for operational issues. These are category: technical, source_type: edge unless noted otherwise.

`alert_type`	Severity	Source	Trigger
`driver_start_failure`	`error`	`edge`	Driver container cannot reach a stable running state
`driver_restart_loop`	`error`	`edge`	Driver restarts too frequently (circuit-breaker tripped)
`driver_health`	`warning`	`edge`	Driver container stopped unexpectedly
`model_download_failure`	`warning`	`edge`	Required ML model could not be downloaded
`worker_start_failure`	`warning`	`edge`	Worker container failed to start
`edge_core_restart`	`info`	`cloud`	Backend has accepted an edge-core restart request and is tracking the lifecycle (see below)

`edge_core_restart` lifecycle alert

POST /api/v1/edges/{uuid}/restart-core creates a single edge_core_restart alert that tracks the restart end-to-end. The alert is scoped to the environment of the first bound twin (alerts always need a twin / environment / workflow anchor), with source_type: cloud because the request originates from the backend, not from the edge. The current phase is carried on metadata.phase:

Phase	Written by	Meaning
`requested`	Backend, on accepting the API call	MQTT restart command queued
`in_progress`	Edge-core, when it picks the command off MQTT	Restart actually running
`completed`	Edge-core, after a successful relaunch	Happy path; alert is resolved
`failed`	Edge-core, if `_perform_edge_core_restart` raises	Restart attempt failed; alert resolved with `phase: failed`
`timed_out`	Backend reaper (`reap_stuck_edge_core_restart_alerts`)	Alert stuck in `requested` / `in_progress` past 5 min — typically the MQTT command never landed or edge-core crashed mid-restart

The same alert_uuid is returned in the API response and included in the MQTT command payload, so edge-core can transition the alert without a lookup.

Restart-driven pre-resolution

When restart-core is accepted, the backend also pre-resolves any active alerts on the requesting edge’s bound twins whose root cause a clean container relaunch genuinely fixes:

driver_start_failure
driver_restart_loop
worker_start_failure

This keeps the workbench from carrying stale failure noise that the operator’s restart just made irrelevant. Each pre-resolved alert is annotated with:

{
  "metadata": {
    "resolved_by_restart_request_id": "<request_id>",
    "resolved_by_restart_alert_uuid": "<edge_core_restart uuid>"
  }
}

and the edge_core_restart alert’s own metadata gets pre_resolved_alert_uuids: [...] so the audit trail links both ways. Other alert types are deliberately not pre-resolved (a restart doesn’t actually fix them, and silently closing them would lie to the operator). The authoritative allow-list and excluded set live on EDGE_CORE_RESTART_RESOLVABLE_ALERT_TYPES — change the constant and this section together. The response schema is EdgeCoreRestartResponseSchema. Note that alert_uuid is null when no environment can be resolved for the edge (typically: no bound twin yet); the restart still happens, just untracked.

Frontend display contract

Read this before adding a new alert_type. Every alert that the backend or an edge driver can raise must either render correctly with the generic alert card, or have a documented specialised renderer linked from this section. Skipping this step is how we end up with overlapping alert types that nobody can untangle six months later.

The workbench renders every alert through a single AlertCard (cyberwave-frontend/components/alerts/alert.tsx). The generic path uses only these fields and needs nothing else from the producer:

Field	Display
`severity`	Left border colour + leading icon (`info` / `warning` / `error` / `critical` — see `alert-display.ts`)
`status`	Status pill (`Active` / `Acknowledged` / `Resolved` / `Silenced`). `resolved` and `silenced` also dim the card to 60% opacity.
`source_type`	Small outline badge
`alert_type`	Small mono-font outline badge — this is what tells the operator which producer raised the alert, so make it specific and stable.
`name`	Card title
`description`	Body text. URLs are auto-linked.
`media`	Inline image (`.png`/`.gif`) or autoplaying muted video (`.mp4`).
`created_at`	Relative timestamp with absolute on hover.
`metadata.buttons`	Generic action buttons that publish back on `cyberwave/twin/{twin_uuid}/command` as a `button` command — see Metadata buttons.

The new edge_core_restart alert deliberately uses only the generic path. The lifecycle is encoded in metadata.phase, so any UI that wants to surface “Restart in progress vs. completed vs. failed” can read that field without a dedicated component, and the audit trail (request_id, pre_resolved_alert_uuids, previous_phase, timed_out_at) is plain JSON. Specialised renderers exist for a handful of historical alert types that need bespoke interactions; each one is a known cost, not a pattern to copy:

`alert_type`	Specialised behaviour
`robot_setup`	Spinner replaces the severity icon while `status` is `active` / `acknowledged`
`robot_setup_done`	Green success panel with a check mark and amber/green accent
`calibration_needed`	Inline `Next` / `Complete` / `Restart calibration` actions that publish to `cyberwave/twin/{twin_uuid}/command` instead of using metadata buttons
`camera_default_device`	`Default is fine` action that writes the chosen device into twin metadata and resolves the alert
`driver_starting`	Phase-aware spinner with rewritten copy (`Downloading driver image…` / `Installing driver image…` / `Starting driver container…`) driven by `metadata.phase`, plus a byte/percent suffix when the pull is mid-flight — see `driver_starting` progress metadata

Adding a new `alert_type`

Before you introduce a new alert_type:

Check this page. If an existing type fits — even loosely — extend its metadata instead of forking a new code.
Default to the generic path. Pick a sensible severity, write a clear name + description, and put any structured state on metadata. The generic card will render it correctly.
Only add a specialised renderer when the interaction itself is novel (e.g. a calibration step that needs custom MQTT commands). Generic metadata.buttons cover most operator-confirms-something flows without new code.
Document the type in this page before merging — both the catalogue row above and, if specialised, the table in this section. A new alert_type constant that does not have a row here should not pass review.

`driver_starting` progress metadata

Edge-core writes byte-aggregated docker pull progress directly onto the active driver_starting alert’s metadata, so the workbench renders a live "Downloading driver image (cyberwaveos/ugv-driver:dev) — 745 MB of 1.55 GB (47%)" line without any extra round-trip. The fields are:

Field	Type	Meaning
`phase`	string	Lifecycle marker. Pull walks `pull_started → downloading → installing → pull_complete → pull_stream_finished`, then `container_starting → driver_running` once the container is up. Failure phases are `pull_spawn_failed` / `pull_timed_out` / `pull_exit_error`.
`image`	string	The driver image being pulled.
`progress_percent`	int (0–100)	Integer percent of `downloaded_bytes / total_bytes`. `0` until the first `docker pull` byte-bar lands; `100` at `pull_complete`.
`downloaded_bytes` / `total_bytes`	int	Raw byte totals aggregated across all layers.
`downloaded_human` / `total_human`	string \| null	Same values rendered in SI units (`"745 MB"`, `"1.55 GB"`). `null` before docker emits any byte-bar (e.g. when every layer is `Already exists`).
`layers_total` / `layers_complete`	int	Number of layers seen so far and number with `Pull complete`. Only used to caption the brief `installing` phase between the last byte landing and `Status: Downloaded …`.
`last_docker_pull_line`	string	Most recent raw `docker pull` line (truncated to 500 chars), kept for debugging.

The frontend renderer is in cyberwave-frontend/components/alerts/alert.tsx — search for getDriverStartingDisplayText.

MQTT

Edge devices create and resolve alerts via MQTT. Topic pattern:

cyberwave/twin/{twin_uuid}/alert

​Model

​Lifecycle

​Idempotent resolve and silence

​Workflow-scoped alerts

​Source attribution: metadata.source_chain

​Edge Core system alerts

​edge_core_restart lifecycle alert

​Restart-driven pre-resolution

​Frontend display contract

​Adding a new alert_type

​driver_starting progress metadata

​MQTT

Model

Lifecycle

Idempotent resolve and silence

Workflow-scoped alerts

Source attribution: `metadata.source_chain`

Edge Core system alerts

`edge_core_restart` lifecycle alert

Restart-driven pre-resolution

Frontend display contract

Adding a new `alert_type`

`driver_starting` progress metadata

MQTT