playbook/antigravity-awesome-skills/skills/monte-carlo-push-ingestion/references/anomaly-detection.md

88 lines
3.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Anomaly Detection for Push-Ingested Data
Push volume and freshness data feeds the same anomaly detectors as the pull model.
The detectors don't activate immediately — they need enough historical data to learn
expected behavior before they can alert on deviations.
## Recommended push frequency: hourly
- Push at most **once per hour** — pushing more frequently produces unpredictable detector
behavior because the training pipeline aggregates data into hourly buckets
- Push **consistently** — gaps of more than a few days delay activation or deactivate
previously-active detectors
## Freshness detector
The freshness detector learns how often a table is updated and fires when it has not been
updated for longer than expected.
**What it trains on**: consecutive differences (`delta_sec`) between `last_update_time`
values across pushes. A push only counts if `last_update_time` actually changed.
**Requirements to activate:**
| Requirement | Value |
|---|---|
| Minimum samples | 7 pushes where `last_update_time` changed (or coverage ≥ 0.8 for slow tables) |
| Minimum coverage | 0.15 (= `median_update_secs × n_samples / 22 days`) |
| Training window | 35 days |
| Supported update cycle | 5 minutes 7.7 days |
| Minimum table age | ~14 days on older warehouses |
**Deactivation triggers:**
- No push for **14 days**`"no recent data"`
- Gap > 7 days in last 14 days, for fast tables (median update ≤ 26.4 hours) → `"gap of over a week in last 2 weeks"`
## Volume detector (Volume Change + Unchanged Size)
Detects unexpected spikes/drops in row count or byte count.
**Requirements to activate:**
| Requirement | Value |
|---|---|
| Minimum samples (daily) | 10 |
| Minimum samples (subdaily, ~12x/day) | 48 |
| Minimum samples (weekly) | 5 |
| Minimum coverage | 0.30 (= `N × median_update_secs / 42 days`) |
| Training window | 42 days |
| Minimum table age | 5 days |
| Regularity check | 75th/25th percentile of update intervals ≥ 0.2 |
**Deactivation**: No hard gap limit, but coverage degrades as the 42-day window advances
without new data. Eventually drops below 0.3 and deactivates.
## Summary table
| | Freshness | Volume Change / Unchanged Size |
|---|---|---|
| Recommended frequency | Hourly | Hourly |
| Maximum frequency | Once per hour | Once per hour |
| Training window | 35 days | 42 days |
| Minimum samples | 7 | 10 (daily) / 48 (subdaily) / 5 (weekly) |
| Minimum coverage | 0.15 | 0.30 |
| Hard deactivation gap | 14 days | No (coverage degrades) |
| Fast-table gap warning | 7 days in last 14 | N/A |
## What to tell customers
When a customer asks "why isn't my anomaly detection working?":
1. **Check detector status** in the MC UI or via GraphQL (`getTable.thresholds.freshness.status`).
A `"training"` status means not enough data yet. `"inactive"` means a deactivation
condition was hit — check the reason code.
2. **Verify push frequency** — are they pushing exactly once per hour? Both too-fast and
too-slow rates cause problems.
3. **Verify that `last_update_time` changes** — for freshness to accumulate training samples,
each push must carry a *different* `last_update_time` than the previous one. If the table
hasn't actually updated, the push still arrives but doesn't advance the sample count.
4. **Set realistic expectations** — freshness detectors need about 12 weeks of hourly pushes.
Volume detectors need 10+ days for daily tables, up to 42 days for subdaily tables.
Anomaly detection is not instant.
5. **Don't push gaps and then resume** — if a customer pauses pushes for a week and then
resumes, the freshness detector may deactivate. They should keep pushing even when the
table hasn't changed (just repeat the same `last_update_time`) to maintain coverage,
even though that specific push won't count as a new freshness sample.