Documentation Index
Fetch the complete documentation index at: https://docs.cyberwave.com/llms.txt
Use this file to discover all available pages before exploring further.
Community tutorial. Contributed by Siddharth Mehta through the Cyberwave Builder Program. Built and verified on the author’s own SO-101 setup; your mileage may vary depending on hardware revision and environment.
Introduction
What if you could teach a robot to make a sandwich just by showing it a few times? That’s exactly what we built: a system where you physically demonstrate the task through teleoperation, and a 450M-parameter VLA model learns to replicate it autonomously. This tutorial walks through the entire pipeline: hardware setup, Cyberwave integration, demonstration recording, SmolVLA training, and deployment.Architecture
Two short diagrams: the runtime stack that links the arm to your Python app, and the data pipeline that turns demonstrations into a deployed policy. Runtime stack Data pipelinePrerequisites
- SO-101 dual robotic arms (leader + follower)
- Raspberry Pi 4 with Cyberwave Edge Core installed
- Logitech C275 USB webcam
- Cyberwave account with API key
- Python 3.10+, Google Colab (for training)
Step 1: Set up Cyberwave
Install the Edge Core on Raspberry Pi
Set up the Cyberwave environment
Before calibrating, you need a Cyberwave environment with the SO-101 arm and camera twins added and paired to the hardware. The full reference is in SO-101 Get Started: Set Up the Cyberwave Environment.- In the Cyberwave dashboard, click New Environment and give it a name (e.g. “Sandwich Bot”).
- Click Add from Catalog, search for SO101, and add it to the environment. Position it to roughly match your physical setup.
- Click Add from Catalog again, search for Standard Camera, and add it.
- Open the camera twin in Edit Mode, set Dock to Twin to the SO101 twin, and set Parent Root to wrist. The camera will now follow the arm’s end-effector during teleoperation and recording.
- Pair the hardware by following the terminal prompts from
cyberwave edge install: select your environment, select the SO101 twin (the driver auto-installs), then repeat for the camera twin.
Calibrate the arms in the product UI
Calibration teaches the software where each joint’s zero position is, what its valid movement range is, and how the physical arm maps to the software model. Without it, joint commands won’t translate correctly to hardware. For the full reference, see SO-101 Get Started.- Open the Cyberwave dashboard and navigate to your environment.
- Select the SO101 twin. You’ll see an option to Calibrate both arms (leader and follower).
- Click Calibrate and follow the on-screen prompts.
- Manually move every joint of the leader arm through its full range when prompted.
- Repeat for the follower arm.
- Once both arms are calibrated, the platform will confirm calibration is complete.
Calibrate the arms via the CLI (alternative)
If you’d rather skip the dashboard, run calibration from inside the driver container:Connect via the Python SDK
Step 2: Understand the Cyberwave SDK
Read joint state
For the SO-101 driver, read joint state by subscribing to live updates withsubscribe_joints():
Send joint commands
Commands work with standard joint names:Capture camera frames
Usecapture_frame() on the robot twin, with sensor_id set to the wrist camera you docked in Step 1. The SDK supports several output formats:
"numpy", "pil", "bytes", or a temp JPEG file path).
Step 3: Record demonstrations
With teleoperation running, the author’srecord_demos.py records synchronized joint states and camera frames while you move the leader arm:
- Joint positions at 30 Hz via
subscribe_joints()(see Read joint state) - Wrist-camera frames at 30 Hz via the Cyberwave SDK (see Capture camera frames)
- Output written to
data/episodes/episode_NNNN/asstates.npy,actions.npy,timestamps.npy, plus aframes/directory of JPEGs and anepisode.jsonmanifest
Step 4: Convert to LeRobot format
LeRobot expects a specific dataset structure with parquet files, video, and metadata. The author’sconvert_dataset.py walks every recorded episode, transcodes the JPEG sequence to MP4 with ffmpeg, and writes parquet plus the LeRobot meta/ files:
--upload.
Step 5: Fine-tune SmolVLA
SmolVLA is a 450M-parameter VLA pre-trained on SO100 / SO101 community data. Fine-tuning on your specific task takes ~1.5 hours on a T4 GPU:Step 6: Deploy for inference
Run the trained model in a closed loop using the author’srun_sandwich_vla.py:
- Captures a camera frame via the Cyberwave SDK.
- Reads the current joint state via
subscribe_joints(). - Feeds both plus the language instruction to SmolVLA.
- The model outputs 6 joint targets.
- Sends them to the robot via
joints.set(). - Repeats at 10 Hz.
Conclusion
A complete imitation learning pipeline, from teleoperation to autonomous execution, can be built in under three weeks using Cyberwave and SmolVLA. The key insight is that modern VLA models need remarkably few demonstrations when the base model is pre-trained on similar robots.Built by Siddharth Mehta as part of the Cyberwave Builder Program.