Documentation Index
Fetch the complete documentation index at: https://docs.cyberwave.com/llms.txt
Use this file to discover all available pages before exploring further.
Overview
SmolVLA is a lightweight Vision-Language-Action model optimized for edge deployment. Cyberwave supports fine-tuning SmolVLA models on your custom datasets using the LeRobot v3 dataset format.SmolVLA models use the LeRobot v3 dataset format for training, which differs from the TFDS format used by OpenVLA models. The platform handles format conversion automatically.
Model Selection
When starting a new training:- Navigate to AI → Training in your environment
- Select SmolVLA from the available model architectures
- The platform will automatically use the LeRobot training pipeline
Only models with
is_trainable: true appear in the training model selection. SmolVLA is pre-configured as trainable.Dataset Conversion
When you start training with a SmolVLA model, the platform:- Joins your episode parquet files into a single dataset
- Converts the OpenVLA-format parquet to LeRobot v3 format
- Handles camera role mapping (primary, wrist, secondary)
- Encodes video frames using AV1 codec (configurable)
Training Parameters
SmolVLA training supports:| Parameter | Description | Default |
|---|---|---|
fps | Target frames per second | 30 |
use_videos | Store frames as MP4 videos | true |
vcodec | Video codec | libsvtav1 |
num_cameras | Number of camera streams (1-3) | 1 |
Camera Configuration
Camera roles are mapped to LeRobot conventions:primary→observation.images.primarywrist→observation.images.wristsecondary→observation.images.secondary
Training Workflow
Deployment
After training completes:- Deploy the trained model as a controller policy
- Assign the VLA controller to your robot twin
- Use natural language prompts to control the robot
Related
- Train VLA Models — Complete training tutorial
- ML Models Overview — Model capabilities and providers
- Deploy Models — Deployment options