Skip to main content
Call Model (call_model) nodes that select an edge or hybrid speech-to-text catalog model run inference on-device via cw.models.load(...).predict(...). Cloud-only STT uses client.mlmodels.run(...) and does not require edge STT packages.
Prompting for STT models is optional biasing — see Prompting Call Model.

Catalog models and dependencies

Each STT seed sets one SDK extra in metadata.edge_package. Edge-sync surfaces it in model_requirements.
ModelRuntimeEdge extraInstall
Whisper Tiny EN Q5_1whisper_cppml-sttpip install 'cyberwave[ml-stt]'
Whisper Base EN Q5_1whisper_cppml-sttsame
Whisper Small EN Q5_1whisper_cppml-sttsame
Whisper Tiny Multilingual Q5_1whisper_cppml-sttsame (+ cloud fallback)
Whisper Base Multilingual Q5_1whisper_cppml-sttsame (+ cloud fallback)
Faster Whisper Tiny ENfaster_whisperml-stt-fasterpip install 'cyberwave[ml-stt-faster]'
Faster Whisper Base ENfaster_whisperml-stt-fastersame
Faster Whisper Small ENfaster_whisperml-stt-fastersame
Do not point catalog metadata at ml-all. Pick the extra for the runtime you use.

Weight paths (edge_model_path)

Model familyCache / load pathNotes
whisper.cppFile path to GGML binary, e.g. models/whisper/ggml-tiny.en-q5_1.bindownload_url in metadata; first run fetches into edge model cache
faster-whisperCache key = model_external_id (e.g. tiny.en); metadata.edge_model_path is documentation onlyEdge Core pre-downloads via catalog model_external_id; CTranslate2 cache under /app/models/tiny.en/
Generated workers emit:
model = cw.models.load(
    "tiny.en",
    runtime="faster_whisper",
    faster_whisper_model_id="tiny.en",
    download_root="tiny.en",
    ...
)

Compile server

When a workflow has run_on_edge: true and a Call Model node references an on-device STT model, compile verifies imports:
RuntimeImport checkedbase.txt package
whisper_cpppywhispercpp.model.Modelpywhispercpp==1.4.1
faster_whisperfaster_whisper.WhisperModelfaster-whisper>=1.1.1
Failure example:
This workflow uses Call Model with faster-whisper but faster-whisper is not installed on the compile server. Install faster-whisper (or cyberwave[ml-stt-faster]) on Django before compiling for edge.
Rebuild Django after updating requirements/base.txt — see Edge workflow dependencies.

Edge worker

  1. Install the catalog model’s extra (ml-stt or ml-stt-faster).
  2. Sync workflow — edge-sync lists model_requirements with edge_model_path.
  3. First inference downloads weights (or uses pre-staged files under ~/.cyberwave/models/).
Audio input: int16 PCM @ 16 kHz mono (or WAV bytes). The worker passes sample_rate_hz, channels, and optional language / task / vad_filter (Faster Whisper built-in VAD when enabled).

Choosing whisper.cpp vs Faster Whisper

whisper.cpp (ml-stt)Faster Whisper (ml-stt-faster)
HardwareRaspberry Pi 4 classPi 4+; Jetson/GPU for Base/Small
WeightsQuantized GGML (.bin)CTranslate2 cache dir
LatencyGood for Tiny on CPUTiny EN optimized for real-time
Built-in VADNo (use Audio Assistant upstream)Yes (builtin_vad_filter on catalog)

Typical wiring

audio_track → audio_assistant → wake_word_engine (optional)
  → call_model (STT catalog model)
  → fuzzy_matcher ← twin
  → virtual_controller
Wire Call Model audio input from upstream audio key. Wire result (text) into Fuzzy Matcher’s Uncertain String (query).

Edge dependencies

Full matrix (all nodes + compile server)

Audio in Workflows

PCM format and pipelines

Fuzzy Matcher

Map STT text to commands