Robot datasets first. Cyberwave currently focuses on robot manipulation and navigation datasets (LeRobot, RLDS, Cyberwave Parquet, and similar time-series formats). Image classification, video, audio, and multimodal dataset formats are detected on import but full support — playback, conversion, and training — is coming in a future release.
Overview
Cyberwave supports importing datasets from two sources:- HuggingFace Hub — Import directly by repository ID (lazy, no multi-GB copy)
- Zip Upload — Upload a pre-packaged dataset archive (max 10 GB)
source_format field on the dataset record.
Supported formats
Robot datasets (full support)
These formats are fully supported for import, playback, conversion, and ML training.| Format | source_format value | Description |
|---|---|---|
| LeRobot v3 | lerobot3 | Latest LeRobot format — parquet episodes + meta/info.json |
| LeRobot v2.1 | lerobot21 | Legacy LeRobot format with splits (normalised to v3 on export) |
| RLDS / TFDS | rlds | TensorFlow Datasets format (Open-X-Embodiment style) |
| Cyberwave Parquet | cyberwave_parquet | Native format for datasets generated directly on Cyberwave |
| HDF5 | hdf5 | Robomimic / ACT / ALOHA style datasets |
| Zarr | zarr | Diffusion Policy / UMI style datasets |
| GR00T | gr00t | NVIDIA Isaac GR00T (LeRobot v2 + embodiment metadata) |
| RoboDM | robodm | Berkeley .vla format |
| MCAP | mcap | ROS2 CDR + Foxglove Protobuf |
| ROS bag | rosbag | ROS1 .bag / ROS2 SQLite3 |
Other dataset types (detection only — full support coming later)
Cyberwave detects these formats and stores thesource_format value, but playback, conversion, and training pipelines for them are not yet available.
| Source format | Description |
|---|---|
coco_detection | COCO object detection JSON |
yolov4 / yolov5 | YOLO format datasets |
voc_detection | Pascal VOC |
kitti_detection | KITTI |
image_classification_directory_tree | ImageNet-style class folders |
image_directory / video_directory / media_directory | Plain media folders |
image_segmentation_directory | Per-pixel masks alongside images |
cvat_image / cvat_video | CVAT exports |
openlabel_image / openlabel_video | OpenLABEL exports |
bdd, csv, dicom, geojson, geotiff | Specialty formats |
unknown | Detector could not classify the layout |
Import from HuggingFace Hub
- Navigate to your workspace and click Import Dataset
- Select HuggingFace Hub as the source
- Enter the repository ID (e.g.
lerobot/pusht) - Click Import
meta/*.json files. The dataset card appears immediately with episode counts, FPS, robot type, and cameras — without copying the multi-GB payload. Frames are fetched on demand when a training run or visualisation needs them. The metadata.import.materialized flag stays false until a follow-up materialisation step runs.
For private HuggingFace repositories, ensure your organisation has configured the HuggingFace token in the deployment settings.
Upload a zip file
- Navigate to your workspace and click Import Dataset
- Select Upload Zip as the source
- Select your zip file (max 10 GB)
- Click Upload
Monitor progress
After starting an import you can track it via:- The dataset list view (status indicator)
- The dataset detail page (
metadata.upload_progress)
API reference
POST /datasets/import/init
Initialise a dataset import. Returns a signed URL for zip uploads, or immediately starts an HF import.
POST /datasets/import/complete
Complete a zip upload import after the file has been uploaded to the signed URL.