> ## Documentation Index
> Fetch the complete documentation index at: https://docs.cyberwave.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Building an Autonomous Sandwich-Making Robot with Cyberwave & SmolVLA

> A step-by-step tutorial on training a Vision-Language-Action model for real-world robotic manipulation using the SO-101 arm and the Cyberwave platform.

<Info>
  **Community tutorial.** Contributed by **Siddharth Mehta** through the
  [Cyberwave Builder Program](https://cyberwave.com/builders). Built and
  verified on the author's own SO-101 setup; your mileage may vary depending on
  hardware revision and environment.
</Info>

<Tip>
  **Reference implementation.** Full source for the helper scripts referenced
  below (`record_demos.py`, `convert_dataset.py`, training, inference, and
  config) lives in the author's repo:
  [siddharth270/Cyberwave-RoboticChef](https://github.com/siddharth270/Cyberwave-RoboticChef).
  Clone it to follow along, or use the snippets in each step as a guide for your
  own implementation.
</Tip>

## Introduction

What if you could teach a robot to make a sandwich just by showing it a few times? That's exactly what we built: a system where you physically demonstrate the task through teleoperation, and a 450M-parameter VLA model learns to replicate it autonomously.

This tutorial walks through the entire pipeline: hardware setup, Cyberwave integration, demonstration recording, SmolVLA training, and deployment.

## Architecture

Two short diagrams: the **runtime stack** that links the arm to your Python app, and the **data pipeline** that turns demonstrations into a deployed policy.

**Runtime stack**

```mermaid theme={null}
flowchart LR
  hw["SO-101 + wrist cam"] <--> edge["Raspberry Pi 4<br/>Cyberwave Edge"]
  edge <--> cloud["Cyberwave Cloud<br/>(twins, MQTT)"]
  cloud <--> app["Python app<br/>(Cyberwave SDK)"]
```

**Data pipeline**

```mermaid theme={null}
flowchart LR
  rec["Record demos<br/>(teleop, 30 Hz)"] --> conv["Convert to<br/>LeRobot dataset"]
  conv --> train["Fine-tune SmolVLA<br/>(Colab T4)"]
  train --> infer["Closed-loop<br/>inference @ 10 Hz"]
```

## Prerequisites

* SO-101 dual robotic arms (leader + follower)
* Raspberry Pi 4 with Cyberwave Edge Core installed
* Logitech C275 USB webcam
* Cyberwave account with API key
* Python 3.10+, Google Colab (for training)

## Step 1: Set up Cyberwave

### Install the Edge Core on Raspberry Pi

```bash theme={null}
ssh your_user@raspberry_pi_ip
curl -fsSL https://cyberwave.com/install.sh | bash
sudo cyberwave edge install
```

Follow the prompts to log in and select your environment. The CLI will install a systemd service and a Docker driver for the SO-101.

### Set up the Cyberwave environment

Before calibrating, you need a Cyberwave environment with the SO-101 arm and camera twins added and paired to the hardware. The full reference is in [SO-101 Get Started: Set Up the Cyberwave Environment](/hardware/so101/get-started#step-1-set-up-the-cyberwave-environment).

1. In the Cyberwave dashboard, click **New Environment** and give it a name (e.g. *"Sandwich Bot"*).
2. Click **Add from Catalog**, search for **SO101**, and add it to the environment. Position it to roughly match your physical setup.
3. Click **Add from Catalog** again, search for **Standard Camera**, and add it.
4. Open the camera twin in **Edit Mode**, set **Dock to Twin** to the **SO101** twin, and set **Parent Root** to **wrist**. The camera will now follow the arm's end-effector during teleoperation and recording.
5. Pair the hardware by following the terminal prompts from `cyberwave edge install`: select your environment, select the SO101 twin (the driver auto-installs), then repeat for the camera twin.

<Tip>
  The wrist-mounted camera is the visual input SmolVLA learns from. Lock the
  dock position now: any later change invalidates the demonstrations you'll
  record in Step 3.
</Tip>

### Calibrate the arms in the product UI

Calibration teaches the software where each joint's zero position is, what its valid movement range is, and how the physical arm maps to the software model. Without it, joint commands won't translate correctly to hardware. For the full reference, see [SO-101 Get Started](/hardware/so101/get-started#step-3-calibrate-the-arms).

<Warning>
  You must calibrate each arm individually. For this dual-arm setup, complete
  calibration for both the leader and the follower.
</Warning>

1. Open the **Cyberwave dashboard** and navigate to your environment.
2. Select the SO101 twin. You'll see an option to **Calibrate** both arms (leader and follower).
3. Click **Calibrate** and follow the on-screen prompts.
4. **Manually move every joint** of the leader arm through its full range when prompted.
5. Repeat for the **follower arm**.
6. Once both arms are calibrated, the platform will confirm calibration is complete.

<Tip>
  Move each joint slowly and through its full range. Accurate calibration
  directly improves control precision during teleoperation and the quality of
  the demonstrations you'll record in Step 3.
</Tip>

### Calibrate the arms via the CLI (alternative)

If you'd rather skip the dashboard, run calibration from inside the driver container:

```bash theme={null}
# Follower (12V arm)
docker exec -it $(docker ps -q --filter name=cyberwave-driver) \
    python -m scripts.cw_calibrate --type follower --port /dev/ttyACM1 --id follower1

# Leader (5V arm)
docker exec -it $(docker ps -q --filter name=cyberwave-driver) \
    python -m scripts.cw_calibrate --type leader --port /dev/ttyACM0 --id leader1
```

### Connect via the Python SDK

```python theme={null}
from cyberwave import Cyberwave

cw = Cyberwave(api_key="your_api_key")
cw.affect("live")  # Essential: starts MQTT connection

robot = cw.twin(
    "the-robot-studio/so101",
    twin_id="your_twin_id",
    environment_id="your_env_id",
)
```

<Tip>
  **Key discovery.** `cw.affect("live")` must be called before any robot
  interaction. Without it, MQTT never connects and all joint commands silently
  fail.
</Tip>

## Step 2: Understand the Cyberwave SDK

### Read joint state

For the SO-101 driver, read joint state by subscribing to live updates with `subscribe_joints()`:

```python theme={null}
import threading

positions = {}
lock = threading.Lock()

SDK_KEY_TO_NAME = {
    "_1": "shoulder_pan", "_2": "shoulder_lift",
    "_3": "elbow_flex", "_4": "wrist_flex",
    "_5": "wrist_roll", "_6": "gripper",
}

def on_joint_update(data):
    if data.get("source_type") != "edge_leader":
        return
    with lock:
        for key, val in data["positions"].items():
            name = SDK_KEY_TO_NAME.get(key)
            if name:
                positions[name] = float(val)

robot.subscribe_joints(on_joint_update)
```

### Send joint commands

Commands work with standard joint names:

```python theme={null}
robot.joints.set("shoulder_pan", 30, degrees=True)
robot.joints.set("gripper", 10, degrees=True)
```

### Capture camera frames

Use `capture_frame()` on the robot twin, with `sensor_id` set to the wrist camera you docked in Step 1. The SDK supports several output formats:

```python theme={null}
import time
time.sleep(3)  # Let the WebRTC stream warm up on first connect

jpg_bytes = robot.capture_frame("bytes", sensor_id="wrist_cam")
frame     = robot.capture_frame("numpy", sensor_id="wrist_cam")  # for ML inference
```

See the [Python SDK reference](/tools/python-sdk#frame-capture) for all supported output formats (`"numpy"`, `"pil"`, `"bytes"`, or a temp JPEG file path).

## Step 3: Record demonstrations

With teleoperation running, the author's [`record_demos.py`](https://github.com/siddharth270/Cyberwave-RoboticChef/blob/main/scripts/record_demos.py) records synchronized joint states and camera frames while you move the leader arm:

```bash theme={null}
python scripts/record_demos.py \
    --task "Make a sandwich with bread, cheese, and bread on top" \
    --episodes 10
```

Each episode captures:

* Joint positions at 30 Hz via `subscribe_joints()` (see [Read joint state](#read-joint-state))
* Wrist-camera frames at 30 Hz via the Cyberwave SDK (see [Capture camera frames](#capture-camera-frames))
* Output written to `data/episodes/episode_NNNN/` as `states.npy`, `actions.npy`, `timestamps.npy`, plus a `frames/` directory of JPEGs and an `episode.json` manifest

<Tip>
  Record 10 to 15 consistent demonstrations. Vary the starting positions
  slightly between episodes for better generalization.
</Tip>

## Step 4: Convert to LeRobot format

LeRobot expects a specific dataset structure with parquet files, video, and metadata. The author's [`convert_dataset.py`](https://github.com/siddharth270/Cyberwave-RoboticChef/blob/main/scripts/convert_dataset.py) walks every recorded episode, transcodes the JPEG sequence to MP4 with `ffmpeg`, and writes parquet plus the LeRobot `meta/` files:

```bash theme={null}
python scripts/convert_dataset.py \
    --episodes-dir data/episodes \
    --output-dir data/processed \
    --repo-id your_username/sandwich-bot-demos
```

The output looks like this:

```
data/processed/
├── data/chunk-000/episode_000000.parquet     # state + action per frame
├── videos/chunk-000/observation.images.image/episode_000000.mp4
└── meta/{info,stats,episodes,tasks}.{json,jsonl}
```

To upload directly to the Hugging Face Hub after conversion, add `--upload`.

<Tip>
  If LeRobot's loader rejects the dataset, rebuild with the native API for
  guaranteed compatibility. See
  [LeRobotDataset.create](https://github.com/huggingface/lerobot) and adapt the
  same `state` / `action` / `observation.images.image` keys the converter
  writes.
</Tip>

## Step 5: Fine-tune SmolVLA

SmolVLA is a 450M-parameter VLA pre-trained on SO100 / SO101 community data. Fine-tuning on your specific task takes \~1.5 hours on a T4 GPU:

```bash theme={null}
lerobot-train \
    --policy.path=lerobot/smolvla_base \
    --dataset.repo_id=your_username/sandwich-bot-demos \
    --dataset.root=data/rebuilt \
    --batch_size=8 \
    --steps=5000 \
    --policy.device=cuda \
    --rename_map='{"observation.images.image": "observation.images.camera1"}'
```

## Step 6: Deploy for inference

Run the trained model in a closed loop using the author's [`run_sandwich_vla.py`](https://github.com/siddharth270/Cyberwave-RoboticChef/blob/main/scripts/run_sandwich_vla.py):

```bash theme={null}
python scripts/run_sandwich_vla.py --mode sequence --device mps --hz 10
```

The inference loop:

1. Captures a camera frame via the Cyberwave SDK.
2. Reads the current joint state via `subscribe_joints()`.
3. Feeds both plus the language instruction to SmolVLA.
4. The model outputs 6 joint targets.
5. Sends them to the robot via `joints.set()`.
6. Repeats at 10 Hz.

## Conclusion

A complete imitation learning pipeline, from teleoperation to autonomous execution, can be built in under three weeks using Cyberwave and SmolVLA. The key insight is that modern VLA models need remarkably few demonstrations when the base model is pre-trained on similar robots.

***

*Built by Siddharth Mehta as part of the Cyberwave Builder Program.*
