Skip to main content

Physical AI Agents

A physical AI agent is an embodied intelligent system designed to interact directly with the physical world. It is equipped with sensory capabilities for perception, cognitive intelligence for reasoning and planning, and actuation systems for executing precise physical actions in dynamic environments. This is not just a robot with software. A traditional robot follows programmed instructions. A physical AI agent perceives, reasons, plans, acts, and adapts in a continuous loop, informed by real-time feedback from the environment. To learn more about physical AI agents and the cyberwave stack, refer to this starter kit here.

Robotic Hardware

In this builder program, developers will build Physical AI systems using two robotic platforms: SO101 Robotic Arm Designed for:
  • object manipulation
  • pick & place
  • assembly
  • interaction with physical items
Don’t have your own SO101 arms? You can also use them via shared access to our robotic lab arms.
UGV Beast Rover Designed for:
  • locomotion
  • environment monitoring
  • inspection
  • autonomous navigation
Together, these platforms allow builders to explore a wide spectrum of Physical AI applications.

Major Physical AI Use Cases

Physical AI systems often fall into two foundational capability categories: manipulation and locomotion. Most of the real-world robotics application is built on one or both of these, they are the building blocks of how robots interact with the physical world.

Manipulation

Manipulation is the ability of a robot to physically interact with objects in its environment, grasping, moving, placing, assembling, or transforming them. It is the core capability behind any task that requires a robot to do something with its hands. This is the primary capability of the SO101 robotic arms. Why manipulation matters: Most human work involves interacting with objects. Manufacturing, healthcare, food preparation, logistics, and lab work are all fundamentally manipulation tasks.

Typical Manipulation Tasks

  • Pick and place
  • Sorting objects by type, size, or colour
  • Stacking and assembling components
  • Packaging and palletising products
  • Tool use and handovers

Manipulation Pipeline

A typical manipulation system follows this architecture:
Input (camera / depth sensor)

Object Detection — identify what's in the workspace

Pose Estimation — determine each object's position and orientation

Task Planning — decide what to do and in what order

Grasp Planning — compute how the gripper should approach and hold the object

Motion Planning — generate a collision-free trajectory for the arm

Arm Execution — move the arm along the planned path

Grasp Feedback — verify success and correct if needed

Locomotion

Locomotion is the ability of a robot to move through and navigate its environment whether that’s a warehouse floor, an outdoor field, or a construction site. It is the core capability behind any task that requires a robot to go somewhere. This capability is provided by the UGV Beast Rover. Why locomotion matters: A robot that can only stay in one place is limited to the tasks within arm’s reach. Locomotion unlocks an entirely new class of applications: inspection, delivery, patrol, monitoring, mapping where the robot needs to autonomously traverse space, avoid obstacles, and adapt to changing environments. Combined with perception and AI, mobile robots become autonomous agents capable of operating across large, dynamic areas without human intervention.

Typical Locomotion Tasks

  • Security patrol and perimeter monitoring
  • Warehouse delivery and transport
  • Infrastructure inspection
  • Environmental monitoring and mapping
  • Search and rescue operations

Locomotion Pipeline

Input (LiDAR / camera / IMU)

Environment Perception — understand surroundings via sensors

Localisation — determine the robot's position in the world

Mapping — build or update a map of the environment

Path Planning — compute an efficient route to the goal

Motion Control — translate the plan into wheel/motor commands

Robot Navigation — execute movement along the path

Sensor Feedback — continuously update perception and correct course

Idea Templates

These templates help builders generate new ideas quickly based on a recurring pattern.

Template 1: Sorting System

Problem pattern: A set of mixed objects needs to be classified by type, size, colour, or condition and placed into the correct destination. This is one of the most common manipulation tasks, it appears in nearly every industry where items need to be separated before processing. Why it matters: Sorting is repetitive, error-prone, and often time-sensitive for humans. A vision-guided robotic arm can sort faster, more consistently, and around the clock. Architecture:
Camera input → Object detection → Classification (type/size/colour) → Grasp planning → Pick → Place into correct bin
Each step is modular: you can swap the classifier (rule-based, ML model, VLM) without changing the rest of the pipeline. Examples: medication sorting, recycling sorting, warehouse SKU sorting, coin/currency sorting, seed sorting, defective part rejection Robot: SO101 arm How it can be implemented with Cyberwave: Approach: Imitation Learning + VLA
Create Digital Twin of sorting environment

Connect SO101 Arm with Cyberwave platform

Record demonstrations of sorting task (teleoperate arm)

Collect dataset (camera images + robot states + actions + instructions)

Train a Vision-Language-Action (VLA) model on the dataset

Deploy trained model to the robot through Cyberwave

Provide language prompt (e.g., "Sort medicines into trays")

VLA model processes vision + instruction

Robot executes sorting actions with the SO101 arm

Template 2: Monitoring System

Problem pattern: A physical space needs to be observed regularly to detect changes, anomalies, or hazards. Instead of deploying static cameras everywhere, a mobile robot patrols the area and captures data from multiple vantage points. Why it matters: Static sensor networks are expensive, have blind spots, and can’t adapt. A mobile robot can cover large, changing environments on a schedule or on-demand, and flag issues in real time. Architecture:
Scheduled/triggered patrol → Sensor capture (camera/LiDAR/thermal) → Anomaly detection → Alert/report
The detection layer can range from simple change detection (comparing frames over time) to ML-based anomaly classification. Examples: warehouse monitoring, crop health monitoring, construction site inspection, security patrol, pipeline/infrastructure inspection, solar farm inspection Robot: UGV rover How it can be implemented with Cyberwave: Approach: Monitoring Agent with UGV Rover
Create digital twin of the monitoring environment

Connect UGV Beast Rover to the Cyberwave platform

Connect a controller 

Create a monitoring workflow in Cyberwave

UGV patrols and collects sensor data (camera, IMU, telemetry)

Stream input data from the rover to the workflow

Pass the data to an LLM / vision model for analysis

Detect anomalies or events (intrusions, hazards, unusual activity)

Trigger alerts, reports, or automated actions

Robot responds or continues patrol based on workflow logic

Template 3: Assembly Assistant

Problem pattern: Multiple parts must be picked, oriented, and combined in a specific sequence to produce a finished product. The robot follows a recipe: a fixed or AI-planned sequence of manipulation steps. Why it matters: Assembly tasks require precision, repeatability, and often specific sequencing. Robots can maintain consistent quality across thousands of repetitions and handle parts too small or too fast for human hands. Architecture:
Part detection → Sequence planning → Pick part A → Orient → Place/attach → Pick part B → Orient → Attach → Verify assembly
The sequence can be hardcoded for fixed products or generated by an LLM/planner for flexible assembly. Examples: toy assembly, electronics assembly, packaging and kitting, sandwich/meal assembly, LEGO construction, furniture part pre-assembly Robot: SO101 arm

Template 4: Delivery System

Problem pattern: Objects must be transported between two or more locations within a facility. The robot is attached a payload at point A and delivers it to point B, navigating through a shared space with obstacles and people. Why it matters: Internal logistics - moving things between rooms, floors, or stations is one of the biggest time sinks in hospitals, warehouses, offices, and factories. Autonomous delivery frees up human workers for higher-value tasks. Architecture:
Pickup request → Navigate to source → Load payload → Plan route to destination → Navigate (with obstacle avoidance) → Deliver → Confirm dropoff
Can be extended with multi-stop routes, priority queues, and fleet coordination for multi-robot delivery systems. Examples: hospital supply delivery, warehouse inter-station transport, office mail/package delivery, restaurant food delivery, lab sample transport Robot: UGV rover

Template 5: Interactive Demonstrator

Problem pattern: A robot demonstrates physical concepts, performs for an audience, or engages in real-time interaction with a human. The goal is communication, education, or entertainment rather than production. Why it matters: Robots that interact with people in classrooms, museums, retail spaces, or events make abstract concepts tangible. They’re also a powerful way to prototype human-robot interaction patterns before deploying in production settings. Architecture:
User input (voice/gesture/text) → Intent recognition → Action planning → Robot performance → Feedback/response
Typically uses a VLM or LLM for natural language understanding and can incorporate gesture recognition, speech synthesis, and expressive motion. Examples: classroom teaching robots, museum exhibit guides, trade show demonstrators, robotic game players, art and drawing robots, rehabilitation exercise coaches Robot: SO101 arm or UGV Beast Rover (depending on whether the demo is manipulation or movement-based)

Template 6: Quality Inspection

Problem pattern: Products or components on a line need to be visually inspected for defects, damage, or non-conformance before moving to the next stage. The robot examines each item and flags or rejects failures. Why it matters: Human visual inspection is inconsistent: fatigue, lighting, and speed all introduce errors. A camera-equipped robot arm can inspect at consistent quality, at speed, and log every result for traceability. Architecture:
Camera capture → Defect detection (ML model) → Classification (pass/fail/category) → Accept or reject (pick to reject bin)
Examples: PCB solder joint inspection, food quality grading, packaging seal verification, paint/surface defect detection, pharmaceutical label verification Robot: SO101 arm

Template 7: Mapping and Survey

Problem pattern: A physical space needs to be digitised, creating a 2D or 3D map, measuring dimensions, or documenting current state. The robot systematically traverses the space and captures spatial data. Why it matters: Manual surveying is slow and requires specialised skill. A robot with LiDAR or depth cameras can autonomously map a space in a fraction of the time, producing consistent, repeatable results. Architecture:
Navigation → SLAM (simultaneous localisation and mapping) → Point cloud / map generation → Post-processing → Export
Examples: warehouse layout mapping, construction progress documentation, real estate floorplan generation, agricultural field mapping, disaster area assessment Robot: UGV rover

Template 8: Pick-and-Handover

Problem pattern: A robot picks up an object and hands it to a human (or receives an object from a human). This requires understanding human intent, timing, and safe force control. Why it matters: Human-robot handover is a fundamental interaction pattern for assistive robots, collaborative manufacturing, and service robots. Getting it right, safe, natural, well-timed is a key challenge in Physical AI. Architecture:
Object detection → Grasp planning → Pick → Human detection / hand tracking → Approach → Handover (force-controlled release) → Confirm
Examples: surgical tool handover, warehouse collaborative picking, assistive robots for elderly/disabled, retail item handoff, lab equipment passing Robot: SO101 arm

Example Idea Categories

Physical AI systems can be applied across many industries. Below are several categories along with example problems and project ideas. Each idea follows the same core structure:
Input → Perception → Reasoning → Robot Action → Feedback
Builders can adapt these ideas using the SO101 Arm or the UGV Beast Rover.

Education

Education | Robot: SO101 Arm Education often lacks interactive physical demonstrations for concepts in science, robotics, and computer science. Robots can demonstrate ideas through physical interaction with objects.
  • AI Chess Playing Arm — plays chess by detecting board state and moving pieces
  • Algorithm Demonstration Robot — demonstrates sorting algorithms physically by detecting, classifying, and rearranging coloured blocks. vision → LLM planner → pick/place skill
  • Physics Experiment Robot — performs experiments around centre of mass, balance, and stacking stability
  • Robotics Training Assistant — helps students learn pick, rotate, and place fundamentals hands-on

Healthcare

Healthcare | Robot: SO101 Arm + UGV Rover Healthcare systems contain many repetitive physical tasks that can be automated from sorting medication to transporting supplies between departments.
  • Medication Sorting Assistant — identifies medication packs and sorts them by dosage schedule. detect pack → read label → place in tray (SO101)
  • Lab Sample Organizer — sorts laboratory sample tubes by label into the correct rack positions (SO101)
  • Hospital Delivery Rover — transports supplies between departments. pickup → navigate corridor → deliver (UGV)
  • Patient Monitoring Rover — patrols hospital corridors to detect emergencies and alert staff (UGV)
  • Surgical Tool Organizer — arranges surgical tools during preparation (SO101)

Cooking and Food Automation

Cooking and Food Automation | Robot: SO101 Arm Food preparation contains many repetitive steps, picking, sorting, assembling, and plating that are well-suited to robotic manipulation.
  • Sandwich Assembly Robot — picks bread, adds ingredients, and assembles sandwiches. pick → layer → assemble
  • Ingredient Sorting Robot — organises ingredients by type (vegetables, fruits, spices)
  • Drink Mixing Robot — picks bottles, pours liquids, and serves glasses
  • Meal Plating Assistant — arranges food items on plates for consistent presentation
  • Kitchen Cleanup Assistant — moves used utensils into wash trays

Finance

Finance | Robot: SO101 Arm Financial institutions still process physical items like currency, coins, and documents. Robotic arms can automate these high-volume, precision-sensitive tasks.
  • Coin Sorting Robot — sorts coins by denomination using vision classification
  • Cash Counting Assistant — stacks and organises currency bundles
  • Document Processing Robot — scans and organises physical documents into sorted trays
  • Fraud Detection Demo Robot — detects counterfeit currency using vision models

Retail

Retail | Robot: SO101 Arm + UGV Rover Retail stores constantly manage inventory, restocking shelves, checking stock levels, and organising products across large floor areas.
  • Shelf Restocking Assistant — places products in correct shelf locations (SO101)
  • Shelf Inspection Rover — patrols store aisles to detect empty shelves and misplaced items (UGV)
  • Smart Inventory Counter — scans products on shelves for automated stock counts (UGV)
  • Product Sorting System — organises items in back-of-store warehouses (SO101)

Agriculture

Agriculture | Robot: UGV Beast Rover Farms require continuous monitoring across large, open areas, a natural fit for mobile robots equipped with cameras and sensors.
  • Crop Health Monitoring Rover — scans crops to detect disease, nutrient deficiency, or pest damage
  • Soil Sampling Robot — collects soil samples at predetermined points for lab analysis
  • Pest Detection Rover — identifies pests using vision models and flags affected areas
  • Irrigation Inspection Rover — checks irrigation systems for leaks, blockages, and pressure drops

Security and Surveillance

Security and Surveillance | Robot: UGV Beast Rover Large facilities require continuous monitoring that’s impractical with fixed cameras alone. Mobile patrol robots provide adaptive, comprehensive coverage.
  • Autonomous Patrol Robot — patrols property on a schedule and detects anomalies
  • Intrusion Detection Rover — detects unauthorised people in restricted areas
  • Night Surveillance Robot — patrols using thermal cameras for low-visibility conditions
  • Smart Alarm Robot — responds to triggered alarms by navigating to the source and reporting

Logistics and Warehousing

Logistics and Warehousing | Robot: UGV Rover + SO101 Arm Warehouses require efficient movement and monitoring of goods combining mobile robots for transport with arms for sorting and packing.
  • Warehouse Delivery Robot — transports packages between stations (UGV)
  • Inventory Scanning Rover — scans barcodes on shelves for automated inventory tracking (UGV)
  • Package Sorting Arm — sorts parcels into destination bins (SO101)
  • Automated Packing Station — picks items and packs them into boxes (SO101)

Smart Homes

Smart Homes | Robot: SO101 Arm Homes contain many small repetitive tasks, tidying, retrieving, and organising that are ideal for a desk-scale robotic arm.
  • Desk Organizer Robot — sorts and arranges items on a desk
  • Object Retrieval Assistant — fetches a requested item from a known set of locations
  • Laundry Sorting Robot — separates clothes by type or colour
  • Toy Cleanup Robot — picks up and organises scattered toys
  • Laundry Folding Assistant - folds clothes automatically using two robotic arms working together

Construction and Infrastructure

Construction and Infrastructure | Robot: UGV Rover Construction sites require regular inspection and monitoring for safety, progress, and compliance often across large, hazardous areas.
  • Construction Inspection Rover — detects hazards and unsafe conditions on site
  • Equipment Monitoring Rover — checks machinery status and reports anomalies
  • Safety Compliance Robot — scans the site for PPE violations and restricted area breaches
  • Progress Monitoring Robot — records construction progress with timestamped photos and video

Creative and Entertainment

Creative and Entertainment | Robot: SO101 Arm Interactive experiences in museums, events, classrooms, and art installations can be enhanced with robots that perform, create, and engage.
  • Robotic Artist — draws pictures or paints using a pen/brush attached to the gripper
  • Interactive Museum Robot — performs live demonstrations for visitors
  • Robotic Game Player — plays board games (chess, checkers) by detecting state and moving pieces

Architecture of Physical AI Systems

Most Physical AI systems follow a similar layered architecture. This architecture is useful because it separates perception, intelligence, and execution.
Input

Perception

World Model

Reasoning / Planning

Skill or Navigation Policy

Robot Execution

Feedback & Replanning
This structure is common across all industrial robotics systems.

Layer 1: Input

This layer captures the trigger or instruction that initiates a task. Inputs can come from:
  • voice commands
  • text instructions
  • API requests
  • sensor triggers
  • scheduled routines
Example input:
"Pick up the red cube and place it in the tray"
or
"Patrol the warehouse aisle"
Input systems may include:
  • microphones
  • buttons
  • mobile apps
  • web interfaces
  • automated triggers

Layer 2: Perception

The robot must observe and understand the environment. Sensors commonly used:
  • RGB cameras
  • depth cameras
  • LIDAR
  • ultrasonic sensors
  • microphones
Perception models extract useful information from sensor data. Examples include:
  • object detection
  • semantic segmentation
  • pose estimation
  • visual SLAM
  • audio recognition
Example perception output:
red_cube → position (0.32, 0.41, 0.12)
blue_cube → position (0.45, 0.29, 0.10)
table_surface detected
For mobile robots, perception may also include:
obstacle detected
corridor detected
open path available

Layer 3: World Model

Raw perception data is converted into a structured representation of the environment. This is called the world model. A world model represents:
  • object locations
  • robot position
  • environment layout
  • task context
Example representation:
objects:
  red_cube:
    position: (0.25, 0.32, 0.10)

  tray:
    position: (0.55, 0.20, 0.12)

robot:
  arm_position: (0.10, 0.15, 0.40)
For mobile robots:
map:
  corridor
  obstacles
  docking station
robot_position:
  (x,y)
This structured information allows reasoning systems to make decisions.

Layer 4: Reasoning / Planning

This layer decides what the robot should do next. Approaches include:
  • LLM reasoning
  • rule-based logic
  • symbolic planning
  • behavior trees
  • task graphs
Example reasoning output:
Plan:
1 locate red cube
2 move arm to cube
3 grasp cube
4 move to tray
5 release cube
For locomotion:
Plan:
1 navigate to aisle 3
2 scan shelves
3 return to charging dock

Layer 5: Skill or Navigation Policy

Robots rarely generate raw motor commands directly. Instead, they use skills (for manipulation) or navigation policies (for locomotion). These are reusable capabilities that the robot can execute. Manipulation skills:
pick
place
push
stack
rotate
align
Locomotion skills:
navigate_to
scan_area
patrol
dock
avoid_obstacles
Skills act as an abstraction layer between reasoning and low-level control.

Layer 6: Robot Execution

At this stage the robot hardware performs actions. For the SO101 arm:
move_to(x,y,z)
close_gripper()
move_to(target)
open_gripper()
For the UGV rover:
navigate_to(location)
rotate(angle)
capture_image()
scan_environment()
This layer interacts with:
  • motors
  • actuators
  • grippers
  • wheels

Layer 7: Feedback and Replanning

Real-world environments are unpredictable. Robots must constantly verify whether actions succeed. Examples:
did the grasp succeed?
did the rover reach the destination?
is the object still visible?
If an action fails, the system may:
  • retry
  • update the world model
  • replan the task
This feedback loop is critical for reliable robotics systems.

AI Approaches Used in Physical AI

Different AI techniques can power different layers of the system.

Classical Robotics

Traditional robotics pipelines combine:
vision
motion planning
control systems
Common frameworks:
  • ROS
  • MoveIt
  • OpenCV

LLM-Based Planning

Large Language Models can help interpret instructions and generate plans. Example:
User: "Sort the blocks by color"
LLM produces:
1 detect blocks
2 classify colors
3 pick block
4 place in color bin
LLMs are useful for:
  • natural language interfaces
  • task decomposition
  • skill orchestration

Imitation Learning

Robots learn from demonstrations. Workflow:
human demonstrates task
robot records trajectory
model learns policy
Applications:
  • manipulation
  • repetitive tasks
  • assembly
Popular approaches include:
  • behavior cloning
  • ACT models
  • diffusion policies

Reinforcement Learning

Robots learn through reward-based optimization. Example reward:
successful grasp = +1
dropped object = -1
RL is often used for:
  • dexterous manipulation
  • locomotion control
  • dynamic tasks

Vision Language Action Models

These models map:
vision + language → robot actions
Examples include:
  • RT-2
  • OpenVLA
These models aim to create general-purpose robot intelligence.