A structured action is a preset that tells the Playground how to shape the prompt and how to parse the model response so the result comes back as machine-readable JSON (points, bounding boxes, segmentation masks) instead of a free-form string. You pick one when calling the PlaygroundDocumentation Index
Fetch the complete documentation index at: https://docs.cyberwave.com/llms.txt
Use this file to discover all available pages before exploring further.
/run endpoint, either from the
UI, the HTTP API, or the Python SDK:
The catalog below is the single source of truth for structured
actions. It is authored in
cyberwave-backend/src/lib/structured_actions.py, mirrored in the
Python SDK at cyberwave.mlmodels.actions, and served live at
GET /api/v1/mlmodels/structured-actions so the frontend, SDKs, and
this doc page all agree.The catalog
structured_task | output_format | Renders as | Use it for |
|---|---|---|---|
free | text | Raw string | Any prompt you want passed through unchanged. No output parsing. |
caption | text | Raw string | One-sentence description of the input image. |
detect_points | points | Dots overlaid on the image | ”Where is X?” — object localisation, keypoint tasks. |
detect_boxes | boxes | Bounding boxes overlaid on the image | Classical object detection. |
segment | masks | Tinted silhouettes overlaid on the image | Instance segmentation. |
structured_task=.... For example, detect_points expands
prompt="cups" into:
<point> tags, PaliGemma’s <loc####> tokens) use
their native grounding syntax and the backend passes your raw prompt
through unchanged when it doesn’t know how to rewrite it for that
provider.
Output schemas
points — output_format: "points"
point—[y, x]normalized to0..1000.label— optional human-readable tag.
<PointOverlay /> component.
Render on an image (Python): cw.save_annotated_image(img, result, "out.png").
boxes — output_format: "boxes"
box_2d—[ymin, xmin, ymax, xmax]normalized to0..1000.
masks — output_format: "masks"
box_2d— region the mask applies to.mask— base64-encoded PNG (may be prefixed withdata:image/png;base64,). Luminance of the PNG defines the silhouette withinbox_2d.
Which models support which action?
Capability is derived from the model’s metadata, so the list stays in sync as you seed new entries. In code:src/lib/structured_actions.py::_supports_any_spatial):
- The model must take an image as input (
can_take_image_as_input: true). - Output format must be
json(or unset). - Any one of:
- metadata declares
point_formatorbounding_box_format, - tags include
spatial-reasoning,spatial-reasoner,grounding,visual-grounding,pointing, orobject-detection, model_external_idstarts withgemini-robotics-er,molmo, orpaligemma.
- metadata declares
caption is supported by any image-capable model; free is supported by
every text-capable model (default).
Parsing hints
The backend is lenient with provider output:- Strips triple-backtick code fences (
json ...) some models wrap JSON in. - Falls back to the first
[...]or{...}block when the payload has surrounding prose. - Returns
rawon every response so you can debug when parsing fails.
MLModelRunResult.output is already parsed
JSON. When you call the HTTP API, parse as standard JSON and branch on
output_format:
Annotating images with the result
Once you have the output, you usually want to bake it onto the image so you can email it, archive it, or attach it to an audit log. The SDK ships a one-liner for that:save_annotated_image:
- Renders each point / box / mask onto the original image.
- Writes a PNG tEXt chunk keyed
cyberwave.runwith the raw JSON output, the model UUID/slug, and the structured task. - The result is self-describing — any consumer can recover the parsed output from the image alone:
embed_metadata=False to opt out of the embedded JSON (e.g. when
sending the image to a third party).
Extending the catalog
- Add a new
StructuredActiontocyberwave-backend/src/lib/structured_actions.py::STRUCTURED_ACTIONS. - If the output needs a new parse branch, update
MLModelPlaygroundService._parse_output. - Mirror the action in
cyberwave-sdks/cyberwave-python/cyberwave/mlmodels/actions.py. The SDK testtests/test_mlmodels_actions.py::TestBackendAlignmentwill fail until the two files agree. - Add a frontend overlay if the new
output_formatis visual. - Update this page.
Minimum path for adding a new model / provider / output format
- New model, existing output format:
- Seed or create the model with backend-computed metadata
(
playground_kind,allowed_structured_tasks,execution_surfaces,sdk_load_id) and add tests. - No new frontend overlay or SDK type adapter should be needed.
- Seed or create the model with backend-computed metadata
(
- New provider wire format, existing structured action:
- Add one adapter in
cyberwave-backend/src/lib/mlmodel_output_adapters.py. - Keep provider-specific parsing out of API routers and services.
- Add one adapter in
- New output format:
- Add the backend schema/action first.
- Add a frontend renderer only if the format is genuinely new.
- Update the SDK only if the format can map cleanly onto
PredictionResult; otherwise preserve the raw structured payload.
playground_kind, allowed_structured_tasks,
execution_surfaces, and sdk_load_id from the backend instead of
recomputing them locally.