Import Datasets - Cyberwave Docs

Robot datasets first. Cyberwave currently focuses on robot manipulation and navigation datasets (LeRobot, RLDS, Cyberwave Parquet, and similar time-series formats). Image classification, video, audio, and multimodal dataset formats are detected on import but full support — playback, conversion, and training — is coming in a future release.

Overview

Cyberwave supports importing datasets from two sources:

HuggingFace Hub — Import directly by repository ID (lazy, no multi-GB copy)
Zip Upload — Upload a pre-packaged dataset archive (max 10 GB)

Format detection happens automatically on the server side and sets the source_format field on the dataset record.

Supported formats

Robot datasets (full support)

These formats are fully supported for import, playback, conversion, and ML training.

Format	`source_format` value	Description
LeRobot v3	`lerobot3`	Latest LeRobot format — parquet episodes + `meta/info.json`
LeRobot v2.1	`lerobot21`	Legacy LeRobot format with splits (normalised to v3 on export)
RLDS / TFDS	`rlds`	TensorFlow Datasets format (Open-X-Embodiment style)
Cyberwave Parquet	`cyberwave_parquet`	Native format for datasets generated directly on Cyberwave
HDF5	`hdf5`	Robomimic / ACT / ALOHA style datasets
Zarr	`zarr`	Diffusion Policy / UMI style datasets
GR00T	`gr00t`	NVIDIA Isaac GR00T (LeRobot v2 + embodiment metadata)
RoboDM	`robodm`	Berkeley .vla format
MCAP	`mcap`	ROS2 CDR + Foxglove Protobuf
ROS bag	`rosbag`	ROS1 .bag / ROS2 SQLite3

Other dataset types (detection only — full support coming later)

Cyberwave detects these formats and stores the source_format value, but playback, conversion, and training pipelines for them are not yet available.

Source format	Description
`coco_detection`	COCO object detection JSON
`yolov4` / `yolov5`	YOLO format datasets
`voc_detection`	Pascal VOC
`kitti_detection`	KITTI
`image_classification_directory_tree`	ImageNet-style class folders
`image_directory` / `video_directory` / `media_directory`	Plain media folders
`image_segmentation_directory`	Per-pixel masks alongside images
`cvat_image` / `cvat_video`	CVAT exports
`openlabel_image` / `openlabel_video`	OpenLABEL exports
`bdd`, `csv`, `dicom`, `geojson`, `geotiff`	Specialty formats
`unknown`	Detector could not classify the layout

Import from HuggingFace Hub

Navigate to your workspace and click Import Dataset
Select HuggingFace Hub as the source
Enter the repository ID (e.g. lerobot/pusht)
Click Import

HuggingFace imports are lazy by default: Cyberwave queries the HF API for tags, card data, and the file list, and for LeRobot datasets reads only the small meta/*.json files. The dataset card appears immediately with episode counts, FPS, robot type, and cameras — without copying the multi-GB payload. Frames are fetched on demand when a training run or visualisation needs them. The metadata.import.materialized flag stays false until a follow-up materialisation step runs.

For private HuggingFace repositories, ensure your organisation has configured the HuggingFace token in the deployment settings.

Upload a zip file

Navigate to your workspace and click Import Dataset
Select Upload Zip as the source
Select your zip file (max 10 GB)
Click Upload

The system automatically detects the format after the upload completes.

Monitor progress

After starting an import you can track it via:

The dataset list view (status indicator)
The dataset detail page (metadata.upload_progress)

API reference

POST /datasets/import/init

Initialise a dataset import. Returns a signed URL for zip uploads, or immediately starts an HF import.

POST /datasets/import/complete

Complete a zip upload import after the file has been uploaded to the signed URL.

See the API Reference for full details.

​Overview

​Supported formats

​Robot datasets (full support)

​Other dataset types (detection only — full support coming later)

​Import from HuggingFace Hub

​Upload a zip file

​Monitor progress

​API reference

POST /datasets/import/init

POST /datasets/import/complete

Overview

Supported formats

Robot datasets (full support)

Other dataset types (detection only — full support coming later)

Import from HuggingFace Hub

Upload a zip file

Monitor progress

API reference