MIT/ast-finetuned-audioset-10-10-0.4593). The model outputs 527 AudioSet class probabilities (multi-label sigmoid). No fine-tuning is required—scenarios map to existing AudioSet label strings.
Strict scenario isolation
Only labels belonging to the selected scenario are evaluated. For example, with Glass Break active, a passing police siren is ignored even if the model detects it strongly.Scenarios (sub-modalities)
| Scenario | Monitored AudioSet labels (substring match) |
|---|---|
| Glass Break / Falling Objects | Glass, Shatter, Smash crash, Thump, Crash, Bang |
| Constant Alarm (Siren) | Alarm, Car alarm, Smoke detector, Siren, Ambulance (siren), Police car (siren) |
| Help! / Screaming | Screaming, Yell, Shout, Children shouting |
| Custom | User-defined comma-separated label substrings |
Parameters
| Parameter | Default | Description |
|---|---|---|
confidence_threshold | 0.40 | Alert when any scenario label exceeds this probability |
output_buffer_preset | speech-to-text | Analysis window size (4 s recommended) |
output_buffer_size_s | 4.0 | Custom window length (seconds, min 1 s) |
event_cooldown_s | 5.0 | Minimum seconds between alerts |
Input / output
Input: Same as VA—audio adapted to PCM S16LE int16 @ 16 kHz mono.
Interim output (window filling / no alert):
event_detected to a send_alert node or conditional gate (see Alerts).