Documentation Index
Fetch the complete documentation index at: https://docs.cyberwave.com/llms.txt
Use this file to discover all available pages before exploring further.
Robot datasets first. Cyberwave currently focuses on robot manipulation and navigation datasets (LeRobot, RLDS, Cyberwave Parquet, and similar time-series formats). Image classification, video, audio, and multimodal dataset formats are detected on import but full support — playback, conversion, and training — is coming in a future release.
Overview
Cyberwave supports importing datasets from two sources:- HuggingFace Hub — Import directly by repository ID (lazy, no multi-GB copy)
- Zip Upload — Upload a pre-packaged dataset archive (max 10 GB)
source_format field on the dataset record.
Supported formats
Robot datasets (full support)
These formats are fully supported for import, playback, conversion, and ML training.| Format | source_format value | Description |
|---|---|---|
| LeRobot v3 | lerobot3 | Latest LeRobot format — parquet episodes + meta/info.json |
| LeRobot v2.1 | lerobot21 | Legacy LeRobot format with splits (normalised to v3 on export) |
| RLDS / TFDS | rlds | TensorFlow Datasets format (Open-X-Embodiment style) |
| Cyberwave Parquet | cyberwave_parquet | Native format for datasets generated directly on Cyberwave |
| HDF5 | hdf5 | Robomimic / ACT / ALOHA style datasets |
| Zarr | zarr | Diffusion Policy / UMI style datasets |
| GR00T | gr00t | NVIDIA Isaac GR00T (LeRobot v2 + embodiment metadata) |
| RoboDM | robodm | Berkeley .vla format |
| MCAP | mcap | ROS2 CDR + Foxglove Protobuf |
| ROS bag | rosbag | ROS1 .bag / ROS2 SQLite3 |
Other dataset types (detection only — full support coming later)
Cyberwave detects these formats and stores thesource_format value, but playback, conversion, and training pipelines for them are not yet available.
| Source format | Description |
|---|---|
coco_detection | COCO object detection JSON |
yolov4 / yolov5 | YOLO format datasets |
voc_detection | Pascal VOC |
kitti_detection | KITTI |
image_classification_directory_tree | ImageNet-style class folders |
image_directory / video_directory / media_directory | Plain media folders |
image_segmentation_directory | Per-pixel masks alongside images |
cvat_image / cvat_video | CVAT exports |
openlabel_image / openlabel_video | OpenLABEL exports |
bdd, csv, dicom, geojson, geotiff | Specialty formats |
unknown | Detector could not classify the layout |
Import from HuggingFace Hub
- Navigate to your workspace and click Import Dataset
- Select HuggingFace Hub as the source
- Enter the repository ID (e.g.
lerobot/pusht) - Click Import
meta/*.json files. The dataset card appears immediately with episode counts, FPS, robot type, and cameras — without copying the multi-GB payload. Frames are fetched on demand when a training run or visualisation needs them. The metadata.import.materialized flag stays false until a follow-up materialisation step runs.
For private HuggingFace repositories, ensure your organisation has configured the HuggingFace token in the deployment settings.
Upload a zip file
- Navigate to your workspace and click Import Dataset
- Select Upload Zip as the source
- Select your zip file (max 10 GB)
- Click Upload
Monitor progress
After starting an import you can track it via:- The dataset list view (status indicator)
- The dataset detail page (
metadata.upload_progress)
API reference
POST /datasets/import/init
Initialise a dataset import. Returns a signed URL for zip uploads, or immediately starts an HF import.
POST /datasets/import/complete
Complete a zip upload import after the file has been uploaded to the signed URL.