Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.cyberwave.com/llms.txt

Use this file to discover all available pages before exploring further.

Robot datasets first. Cyberwave currently focuses on robot manipulation and navigation datasets (LeRobot, RLDS, Cyberwave Parquet, and similar time-series formats). Image classification, video, audio, and multimodal dataset formats are detected on import but full support — playback, conversion, and training — is coming in a future release.

Overview

Cyberwave supports importing datasets from two sources:
  • HuggingFace Hub — Import directly by repository ID (lazy, no multi-GB copy)
  • Zip Upload — Upload a pre-packaged dataset archive (max 10 GB)
Format detection happens automatically on the server side and sets the source_format field on the dataset record.

Supported formats

Robot datasets (full support)

These formats are fully supported for import, playback, conversion, and ML training.
Formatsource_format valueDescription
LeRobot v3lerobot3Latest LeRobot format — parquet episodes + meta/info.json
LeRobot v2.1lerobot21Legacy LeRobot format with splits (normalised to v3 on export)
RLDS / TFDSrldsTensorFlow Datasets format (Open-X-Embodiment style)
Cyberwave Parquetcyberwave_parquetNative format for datasets generated directly on Cyberwave
HDF5hdf5Robomimic / ACT / ALOHA style datasets
ZarrzarrDiffusion Policy / UMI style datasets
GR00Tgr00tNVIDIA Isaac GR00T (LeRobot v2 + embodiment metadata)
RoboDMrobodmBerkeley .vla format
MCAPmcapROS2 CDR + Foxglove Protobuf
ROS bagrosbagROS1 .bag / ROS2 SQLite3

Other dataset types (detection only — full support coming later)

Cyberwave detects these formats and stores the source_format value, but playback, conversion, and training pipelines for them are not yet available.
Source formatDescription
coco_detectionCOCO object detection JSON
yolov4 / yolov5YOLO format datasets
voc_detectionPascal VOC
kitti_detectionKITTI
image_classification_directory_treeImageNet-style class folders
image_directory / video_directory / media_directoryPlain media folders
image_segmentation_directoryPer-pixel masks alongside images
cvat_image / cvat_videoCVAT exports
openlabel_image / openlabel_videoOpenLABEL exports
bdd, csv, dicom, geojson, geotiffSpecialty formats
unknownDetector could not classify the layout

Import from HuggingFace Hub

  1. Navigate to your workspace and click Import Dataset
  2. Select HuggingFace Hub as the source
  3. Enter the repository ID (e.g. lerobot/pusht)
  4. Click Import
HuggingFace imports are lazy by default: Cyberwave queries the HF API for tags, card data, and the file list, and for LeRobot datasets reads only the small meta/*.json files. The dataset card appears immediately with episode counts, FPS, robot type, and cameras — without copying the multi-GB payload. Frames are fetched on demand when a training run or visualisation needs them. The metadata.import.materialized flag stays false until a follow-up materialisation step runs.
For private HuggingFace repositories, ensure your organisation has configured the HuggingFace token in the deployment settings.

Upload a zip file

  1. Navigate to your workspace and click Import Dataset
  2. Select Upload Zip as the source
  3. Select your zip file (max 10 GB)
  4. Click Upload
The system automatically detects the format after the upload completes.

Monitor progress

After starting an import you can track it via:
  • The dataset list view (status indicator)
  • The dataset detail page (metadata.upload_progress)

API reference

POST /datasets/import/init

Initialise a dataset import. Returns a signed URL for zip uploads, or immediately starts an HF import.

POST /datasets/import/complete

Complete a zip upload import after the file has been uploaded to the signed URL.
See the API Reference for full details.