STUB DOCUMENT: This page is intentionally minimal and will be expanded with deeper technical details in a future update.
Alerts notify operators that action is needed. They are displayed prominently in the UI (environment view, twin detail, etc.).
Alerts have a category:
- technical (default): operational or hardware events (e.g. robot stuck, calibration needed).
- business: business-process events (e.g. order delayed, SLA threshold exceeded).
Examples:
- A robot needs calibration.
- A robot got stuck and needs remote takeover.
- A sensor reading is out of expected range.
- An order has exceeded its SLA deadline.
Model
Every alert belongs to a workspace and must be attached to at least one of: twin, environment, or workflow.
| Field | Type | Notes |
|---|
name | string | Human-readable title |
description | text | Optional details |
alert_type | string | Machine-readable code (e.g. calibration_needed, robot_stuck) |
severity | enum | info, warning, error, critical (default: warning) |
status | enum | active, acknowledged, resolved, silenced (default: active) |
source_type | enum | edge, simulation, cloud, workflow (default: edge) |
category | enum | technical, business (default: technical) |
Lifecycle
- Active: new alert, requires attention.
- Acknowledged: an operator has seen it but the issue is not yet fixed.
- Resolved: the root cause has been addressed (by edge device or operator).
- Silenced: suppressed workspace-wide without resolving the root cause.
Idempotent resolve and silence
The POST /api/v1/alerts/{uuid}/resolve and POST /api/v1/alerts/{uuid}/silence endpoints are idempotent:
- Resolving an already-resolved alert returns
200 with the alert body (no-op).
- Resolving a silenced alert is allowed — it transitions the alert to resolved.
- Silencing an already-silenced alert returns
200 with the alert body (no-op).
- Silencing a resolved alert returns
400.
This means edge drivers and operator UIs can safely call resolve without checking current status first, avoiding race conditions when multiple actors act on the same alert concurrently.
Workflow-scoped alerts
Alerts produced by a workflow’s send_alert node carry a workflow_uuid, and Alert.workflow is set on the backend. List alerts for a single workflow with:
GET /api/v1/alerts?workflow_uuid={workflow_uuid}
Generated edge workers call client.publish_alert(..., workflow_uuid=WORKFLOW_UUID, workflow_node_uuid=..., workflow_execution_uuid=...). The SDK forwards workflow_uuid as a top-level field (sets the FK) and merges workflow_node_uuid / workflow_execution_uuid into metadata for full provenance.
STUB: Behavior is implemented; this section will be expanded with screenshots and the full kind catalogue.
Every alert raised by a workflow’s send_alert node carries an ordered source_chain under metadata describing each upstream node that fed into the decision. The chain is built generically — any node emitter that opts in by parking a _source_summary on its output dict contributes an entry, so the mechanism works equally well for camera-perception, audio-track, alert-triggered, manual, scheduled, or any future trigger source.
Each entry includes:
kind — short identifier (camera_frame, audio_track, alert_trigger, manual_trigger, schedule_trigger, call_model, detection_event_gate, conditional).
node_uuid — the workflow node that contributed the entry.
- Kind-specific fields. Examples:
twin_uuid + sensor for camera/audio triggers; model_uuid + model_name + a capped detections_sample for call_model; mode + matched_classes + cooldown_seconds for detection_event_gate.
Example payload for a camera_frame → call_model → send_alert workflow:
{
"metadata": {
"workflow_uuid": "...",
"workflow_node_uuid": "...",
"workflow_execution_uuid": "...",
"source_chain": [
{
"kind": "camera_frame",
"node_uuid": "...",
"twin_uuid": "...",
"sensor": "front",
"frame_ts": 1746651234.5
},
{
"kind": "call_model",
"node_uuid": "...",
"model_uuid": "...",
"model_name": "YOLOv8n",
"modality": "image",
"output_format": "detections",
"detections_total": 2,
"detections_sample": [
{ "label": "person", "confidence": 0.91, "bbox_pixels": [12, 34, 56, 78] }
]
}
]
}
}
Notes:
- The chain is purely additive. Adding
_source_summary does not change dedupe_hash (computed over name + description + alert_type + severity + status + twin_uuid), so dedupe behavior is unchanged.
detections_sample is capped to keep alert metadata bounded; the original detections_total is preserved.
- User-supplied static
metadata keys always win on conflict — the source chain only fills in metadata['source_chain'] when not already provided.
Edge Core system alerts
STUB: This section will be expanded with the full alert type catalogue.
Edge Core automatically raises technical alerts for operational issues. These are category: technical, source_type: edge unless noted otherwise.
alert_type | Severity | Source | Trigger |
|---|
driver_start_failure | error | edge | Driver container cannot reach a stable running state |
driver_restart_loop | error | edge | Driver restarts too frequently (circuit-breaker tripped) |
driver_health | warning | edge | Driver container stopped unexpectedly |
model_download_failure | warning | edge | Required ML model could not be downloaded |
worker_start_failure | warning | edge | Worker container failed to start |
edge_core_restart | info | cloud | Backend has accepted an edge-core restart request and is tracking the lifecycle (see below) |
edge_core_restart lifecycle alert
POST /api/v1/edges/{uuid}/restart-core creates a single edge_core_restart alert that tracks the restart end-to-end. The alert is scoped to the environment of the first bound twin (alerts always need a twin / environment / workflow anchor), with source_type: cloud because the request originates from the backend, not from the edge.
The current phase is carried on metadata.phase:
| Phase | Written by | Meaning |
|---|
requested | Backend, on accepting the API call | MQTT restart command queued |
in_progress | Edge-core, when it picks the command off MQTT | Restart actually running |
completed | Edge-core, after a successful relaunch | Happy path; alert is resolved |
failed | Edge-core, if _perform_edge_core_restart raises | Restart attempt failed; alert resolved with phase: failed |
timed_out | Backend reaper (reap_stuck_edge_core_restart_alerts) | Alert stuck in requested / in_progress past 5 min — typically the MQTT command never landed or edge-core crashed mid-restart |
The same alert_uuid is returned in the API response and included in the MQTT command payload, so edge-core can transition the alert without a lookup.
Restart-driven pre-resolution
When restart-core is accepted, the backend also pre-resolves any active alerts on the requesting edge’s bound twins whose root cause a clean container relaunch genuinely fixes:
driver_start_failure
driver_restart_loop
worker_start_failure
This keeps the workbench from carrying stale failure noise that the operator’s restart just made irrelevant. Each pre-resolved alert is annotated with:
{
"metadata": {
"resolved_by_restart_request_id": "<request_id>",
"resolved_by_restart_alert_uuid": "<edge_core_restart uuid>"
}
}
and the edge_core_restart alert’s own metadata gets pre_resolved_alert_uuids: [...] so the audit trail links both ways.
Other alert types are deliberately not pre-resolved (a restart doesn’t actually fix them, and silently closing them would lie to the operator). The authoritative allow-list and excluded set live on EDGE_CORE_RESTART_RESOLVABLE_ALERT_TYPES — change the constant and this section together.
The response schema is EdgeCoreRestartResponseSchema. Note that alert_uuid is null when no environment can be resolved for the edge (typically: no bound twin yet); the restart still happens, just untracked.
Frontend display contract
Read this before adding a new alert_type. Every alert that the backend or an edge driver can raise must either render correctly with the generic alert card, or have a documented specialised renderer linked from this section. Skipping this step is how we end up with overlapping alert types that nobody can untangle six months later.
The workbench renders every alert through a single AlertCard (cyberwave-frontend/components/alerts/alert.tsx). The generic path uses only these fields and needs nothing else from the producer:
| Field | Display |
|---|
severity | Left border colour + leading icon (info / warning / error / critical — see alert-display.ts) |
status | Status pill (Active / Acknowledged / Resolved / Silenced). resolved and silenced also dim the card to 60% opacity. |
source_type | Small outline badge |
alert_type | Small mono-font outline badge — this is what tells the operator which producer raised the alert, so make it specific and stable. |
name | Card title |
description | Body text. URLs are auto-linked. |
media | Inline image (.png/.gif) or autoplaying muted video (.mp4). |
created_at | Relative timestamp with absolute on hover. |
metadata.buttons | Generic action buttons that publish back on cyberwave/twin/{twin_uuid}/command as a button command — see Metadata buttons. |
The new edge_core_restart alert deliberately uses only the generic path. The lifecycle is encoded in metadata.phase, so any UI that wants to surface “Restart in progress vs. completed vs. failed” can read that field without a dedicated component, and the audit trail (request_id, pre_resolved_alert_uuids, previous_phase, timed_out_at) is plain JSON.
Specialised renderers exist for a handful of historical alert types that need bespoke interactions; each one is a known cost, not a pattern to copy:
alert_type | Specialised behaviour |
|---|
robot_setup | Spinner replaces the severity icon while status is active / acknowledged |
robot_setup_done | Green success panel with a check mark and amber/green accent |
calibration_needed | Inline Next / Complete / Restart calibration actions that publish to cyberwave/twin/{twin_uuid}/command instead of using metadata buttons |
camera_default_device | Default is fine action that writes the chosen device into twin metadata and resolves the alert |
driver_starting | Phase-aware spinner with rewritten copy (Downloading driver image… / Installing driver image… / Starting driver container…) driven by metadata.phase, plus a byte/percent suffix when the pull is mid-flight — see driver_starting progress metadata |
Adding a new alert_type
Before you introduce a new alert_type:
- Check this page. If an existing type fits — even loosely — extend its
metadata instead of forking a new code.
- Default to the generic path. Pick a sensible
severity, write a clear name + description, and put any structured state on metadata. The generic card will render it correctly.
- Only add a specialised renderer when the interaction itself is novel (e.g. a calibration step that needs custom MQTT commands). Generic
metadata.buttons cover most operator-confirms-something flows without new code.
- Document the type in this page before merging — both the catalogue row above and, if specialised, the table in this section. A new
alert_type constant that does not have a row here should not pass review.
Edge-core writes byte-aggregated docker pull progress directly onto the active driver_starting alert’s metadata, so the workbench renders a live "Downloading driver image (cyberwaveos/ugv-driver:dev) — 745 MB of 1.55 GB (47%)" line without any extra round-trip. The fields are:
| Field | Type | Meaning |
|---|
phase | string | Lifecycle marker. Pull walks pull_started → downloading → installing → pull_complete → pull_stream_finished, then container_starting → driver_running once the container is up. Failure phases are pull_spawn_failed / pull_timed_out / pull_exit_error. |
image | string | The driver image being pulled. |
progress_percent | int (0–100) | Integer percent of downloaded_bytes / total_bytes. 0 until the first docker pull byte-bar lands; 100 at pull_complete. |
downloaded_bytes / total_bytes | int | Raw byte totals aggregated across all layers. |
downloaded_human / total_human | string | null | Same values rendered in SI units ("745 MB", "1.55 GB"). null before docker emits any byte-bar (e.g. when every layer is Already exists). |
layers_total / layers_complete | int | Number of layers seen so far and number with Pull complete. Only used to caption the brief installing phase between the last byte landing and Status: Downloaded …. |
last_docker_pull_line | string | Most recent raw docker pull line (truncated to 500 chars), kept for debugging. |
The frontend renderer is in cyberwave-frontend/components/alerts/alert.tsx — search for getDriverStartingDisplayText.
MQTT
Edge devices create and resolve alerts via MQTT. Topic pattern:
cyberwave/twin/{twin_uuid}/alert