playbook/antigravity-awesome-skills/skills/monte-carlo-monitor-creation/references/validation-monitor.md

405 lines
15 KiB
Markdown

# Validation Monitor Reference
Detailed reference for building `createValidationMonitorMac` tool calls.
## When to Use
Use a validation monitor when the user wants to:
- Check that specific fields are never null
- Validate that values are within an allowed set (e.g., status in 'active', 'pending', 'inactive')
- Enforce referential integrity (field values exist in another table)
- Apply row-level business rules (e.g., "amount must be positive")
- Combine multiple conditions with AND/OR logic
---
## Getting the Logic Right: Conditions Match INVALID Data
This is the single most confusing aspect of validation monitors and the number one source of mistakes. **Conditions describe what INVALID data looks like -- the data you want to be alerted about.** They do NOT describe what valid data looks like.
Think of it this way: the monitor scans rows and fires an alert when it finds rows matching the condition. So the condition must match the BAD rows.
| User wants | Condition should match | Common mistake |
|------------|----------------------|----------------|
| "id should never be null" | id IS NULL (alert when null found) | id IS NOT NULL (would alert on every valid row) |
| "status must be in [active, pending]" | status NOT IN [active, pending] (alert on unexpected values) | status IN [active, pending] (would alert on valid rows) |
| "amount must be positive" | amount IS NEGATIVE (alert on bad values) | amount > 0 (would alert on valid rows) |
| "email must not be empty" | email IS NULL **OR** email = '' (alert on missing) | email IS NOT NULL (would alert on valid rows) |
**Before building any condition, ask yourself: "If a row matches this condition, is the row INVALID?" If the answer is no, the logic is backwards.**
---
## Pre-Step: Verify Field Existence
Before constructing the `alert_condition`, verify that every field name you plan to reference exists in the table's column list. This is the number two source of validation monitor failures -- referencing columns that do not exist or are misspelled.
1. You should already have the column list from `getTable` with `include_fields: true` (done in Step 2 of the main skill).
2. For every field name in your planned conditions, confirm it appears in the column list exactly as spelled (field names are case-sensitive on most warehouses).
3. If a field does not exist, stop and ask the user to clarify the correct column name. Do not guess.
---
## Required Parameters
| Parameter | Type | Description |
|-----------|------|-------------|
| `name` | string | Unique identifier for the monitor. Use a descriptive slug (e.g., `orders_not_null_check`). |
| `description` | string | Human-readable description of what the monitor checks. |
| `table` | string | Table MCON (preferred) or `database:schema.table` format. If not MCON, also pass `warehouse`. |
| `alert_condition` | object | Condition tree defining when to alert (see Alert Condition Structure below). |
## Optional Parameters
| Parameter | Type | Description |
|-----------|------|-------------|
| `warehouse` | string | Warehouse name or UUID. Required if `table` is not an MCON. |
| `domain_id` | string (uuid) | Domain UUID (use `getDomains` to list). |
---
## Alert Condition Structure
The top level of `alert_condition` must always be a GROUP node. This GROUP contains one or more conditions combined with AND or OR logic.
```json
{
"type": "GROUP",
"operator": "AND",
"conditions": [...]
}
```
### Condition Types
There are four condition types: UNARY, BINARY, SQL, and GROUP.
#### UNARY (single-value checks)
Used for predicates that operate on a single field with no comparison value.
```json
{
"type": "UNARY",
"predicate": {"name": "null", "negated": false},
"value": [{"type": "FIELD", "field": "column_name"}]
}
```
- `predicate.name` -- the predicate to apply (see Predicates Reference below).
- `predicate.negated` -- set to `true` to invert the predicate (e.g., `null` with `negated: true` means "is NOT null").
- `value` -- an array with a single value descriptor (usually a FIELD reference).
#### BINARY (comparison checks)
Used for predicates that compare a field against a value.
```json
{
"type": "BINARY",
"predicate": {"name": "greater_than", "negated": false},
"left": [{"type": "FIELD", "field": "column_name"}],
"right": [{"type": "LITERAL", "literal": "0"}]
}
```
- `left` -- the left-hand side of the comparison (typically a FIELD reference).
- `right` -- the right-hand side (typically a LITERAL value, SQL expression, or FIELD reference).
- Both `left` and `right` are arrays of value descriptors.
#### SQL (custom SQL expression)
Used for complex conditions that are difficult to express with UNARY/BINARY nodes. The SQL expression should evaluate to true for INVALID rows.
```json
{
"type": "SQL",
"sql": "amount > 0 AND amount < 1000000"
}
```
#### GROUP (nested conditions)
Used to combine multiple conditions with AND or OR logic. Groups can be nested.
```json
{
"type": "GROUP",
"operator": "OR",
"conditions": [
{"type": "UNARY", "...": "..."},
{"type": "BINARY", "...": "..."}
]
}
```
---
## Value Types
Value descriptors appear in the `value`, `left`, and `right` arrays of UNARY and BINARY conditions.
| Type | Field | Description | Example |
|------|-------|-------------|---------|
| `FIELD` | `"field": "column_name"` | References a column in the table. | `{"type": "FIELD", "field": "user_id"}` |
| `LITERAL` | `"literal": "value"` | A static value (always a string, even for numbers). | `{"type": "LITERAL", "literal": "100"}` |
| `SQL` | `"sql": "SELECT ..."` | A SQL expression or subquery. | `{"type": "SQL", "sql": "SELECT MAX(id) FROM ref_table"}` |
---
## Predicates Reference
Before building conditions, call `getValidationPredicates` to get the full list of supported predicates for the connected warehouse. The list below covers common predicates but may not be exhaustive.
### Unary Predicates
These predicates take no comparison value -- they check a property of the field itself.
| Predicate | Description | Example use |
|-----------|-------------|-------------|
| `null` | Field value is null. | Alert on null ids. |
| `is_negative` | Field value is negative. | Alert on negative amounts. |
| `is_between_0_and_1` | Field value is between 0 and 1 (inclusive). | Alert on rates that should be percentages (0-100). |
| `is_future_date` | Field value is a date/timestamp in the future. | Alert on future-dated records. |
| `is_uuid` | Field value matches UUID format. | Alert on non-UUID values in a UUID field (use with `negated: true`). |
### Binary Predicates
These predicates compare a field against a value.
| Predicate | Right-hand side | Description | Example use |
|-----------|----------------|-------------|-------------|
| `equal` | Single LITERAL | Field equals the given value. | Alert when `status` equals `'deleted'`. |
| `greater_than` | Single LITERAL | Field is greater than the given value. | Alert when `discount_pct` exceeds 100. |
| `less_than` | Single LITERAL | Field is less than the given value. | Alert when `quantity` is below 0. |
| `in_set` | Multiple LITERALs | Field value is in the given set. | Alert when `status` is in an invalid set (see example below). |
| `contains` | Single LITERAL | Field value contains the given substring. | Alert when `email` contains `'test@'`. |
| `starts_with` | Single LITERAL | Field value starts with the given prefix. | Alert when `phone` starts with `'000'`. |
| `between` | Two LITERALs | Field value is between the two given values (inclusive). | Alert when `score` is between 0 and 10 (if that range is invalid). |
### Using `negated` to Invert Predicates
Any predicate can be inverted by setting `"negated": true` in the predicate object. This is essential for "must be in set" validations:
- **"status must be in [active, pending]"** becomes `in_set` with values `["active", "pending"]` and `negated: true` -- meaning "alert when status is NOT in [active, pending]".
- **"id must not be null"** becomes `null` with `negated: false` -- meaning "alert when id IS null" (no inversion needed since the condition already matches invalid data).
---
## Examples
### Alert when id is null
Verify that `id` exists in the table schema from `getTable` before proceeding.
```json
{
"name": "orders_id_not_null",
"description": "Alert when order id is null",
"table": "MCON++a1b2c3d4-e5f6-7890-abcd-ef1234567890++1++1++analytics:core.orders",
"alert_condition": {
"type": "GROUP",
"operator": "AND",
"conditions": [
{
"type": "UNARY",
"predicate": {"name": "null", "negated": false},
"value": [{"type": "FIELD", "field": "id"}]
}
]
}
}
```
The condition matches rows where `id` IS NULL -- these are the invalid rows we want to be alerted about.
### Alert when status is not in allowed set
Verify that `status` exists in the table schema from `getTable` before proceeding.
```json
{
"name": "orders_status_allowed_values",
"description": "Alert when order status is outside the allowed set",
"table": "MCON++a1b2c3d4-e5f6-7890-abcd-ef1234567890++1++1++analytics:core.orders",
"alert_condition": {
"type": "GROUP",
"operator": "AND",
"conditions": [
{
"type": "BINARY",
"predicate": {"name": "in_set", "negated": true},
"left": [{"type": "FIELD", "field": "status"}],
"right": [
{"type": "LITERAL", "literal": "active"},
{"type": "LITERAL", "literal": "pending"},
{"type": "LITERAL", "literal": "inactive"}
]
}
]
}
}
```
Note `negated: true` -- the predicate is `in_set`, but we want to alert when the value is NOT in the set. This catches any unexpected status values.
### Alert when amount is negative
Verify that `amount` exists in the table schema from `getTable` before proceeding.
```json
{
"name": "orders_positive_amount",
"description": "Alert when order amount is negative",
"table": "MCON++a1b2c3d4-e5f6-7890-abcd-ef1234567890++1++1++analytics:core.orders",
"alert_condition": {
"type": "GROUP",
"operator": "AND",
"conditions": [
{
"type": "UNARY",
"predicate": {"name": "is_negative", "negated": false},
"value": [{"type": "FIELD", "field": "amount"}]
}
]
}
}
```
The condition matches rows where `amount` is negative -- these are the invalid rows.
### Combined conditions: null OR negative
Verify that both `amount` and `quantity` exist in the table schema from `getTable` before proceeding.
```json
{
"name": "orders_amount_quality",
"description": "Alert when amount is null or quantity is negative",
"table": "MCON++a1b2c3d4-e5f6-7890-abcd-ef1234567890++1++1++analytics:core.orders",
"alert_condition": {
"type": "GROUP",
"operator": "OR",
"conditions": [
{
"type": "UNARY",
"predicate": {"name": "null", "negated": false},
"value": [{"type": "FIELD", "field": "amount"}]
},
{
"type": "UNARY",
"predicate": {"name": "is_negative", "negated": false},
"value": [{"type": "FIELD", "field": "quantity"}]
}
]
}
}
```
The OR operator means an alert fires if either condition matches -- the row has a null amount OR a negative quantity.
### Between check with nested AND/OR
Verify that `score` and `status` exist in the table schema from `getTable` before proceeding.
```json
{
"name": "records_score_validation",
"description": "Alert when score is outside 0-100 range for active records",
"table": "MCON++a1b2c3d4-e5f6-7890-abcd-ef1234567890++1++1++warehouse:metrics.records",
"alert_condition": {
"type": "GROUP",
"operator": "AND",
"conditions": [
{
"type": "BINARY",
"predicate": {"name": "equal", "negated": false},
"left": [{"type": "FIELD", "field": "status"}],
"right": [{"type": "LITERAL", "literal": "active"}]
},
{
"type": "BINARY",
"predicate": {"name": "between", "negated": true},
"left": [{"type": "FIELD", "field": "score"}],
"right": [
{"type": "LITERAL", "literal": "0"},
{"type": "LITERAL", "literal": "100"}
]
}
]
}
}
```
This uses `between` with `negated: true` to alert when score is outside the 0-100 range, but only for active records (the AND operator requires both conditions to match).
### Referential integrity with SQL subquery
Verify that `customer_id` exists in the table schema from `getTable` before proceeding.
```json
{
"name": "orders_valid_customer",
"description": "Alert when customer_id does not exist in customers table",
"table": "MCON++a1b2c3d4-e5f6-7890-abcd-ef1234567890++1++1++analytics:core.orders",
"alert_condition": {
"type": "GROUP",
"operator": "AND",
"conditions": [
{
"type": "SQL",
"sql": "customer_id IS NOT NULL AND customer_id NOT IN (SELECT id FROM analytics.core.customers)"
}
]
}
}
```
The SQL condition type is useful for referential integrity checks that require subqueries. The `customer_id IS NOT NULL` guard avoids alerting on null values (which should be caught by a separate null check if needed).
### Contains and starts_with checks
Verify that `email` and `phone` exist in the table schema from `getTable` before proceeding.
```json
{
"name": "contacts_format_validation",
"description": "Alert when email contains test data or phone has invalid prefix",
"table": "MCON++a1b2c3d4-e5f6-7890-abcd-ef1234567890++1++1++warehouse:crm.contacts",
"alert_condition": {
"type": "GROUP",
"operator": "OR",
"conditions": [
{
"type": "BINARY",
"predicate": {"name": "contains", "negated": false},
"left": [{"type": "FIELD", "field": "email"}],
"right": [{"type": "LITERAL", "literal": "@test.example.com"}]
},
{
"type": "BINARY",
"predicate": {"name": "starts_with", "negated": false},
"left": [{"type": "FIELD", "field": "phone"}],
"right": [{"type": "LITERAL", "literal": "000"}]
}
]
}
}
```
---
## Fallback: Custom SQL Monitor
If `createValidationMonitorMac` fails -- for example because a referenced column does not exist yet in the live table, or the warehouse does not support a particular predicate -- fall back to `createCustomSqlMonitorMac` with an explicit SQL query instead.
A custom SQL monitor lets you express any validation logic as a SQL query that returns rows or a count. This is always available as a backup when the structured validation condition tree cannot express what you need or encounters an API error.
When falling back:
1. Translate the intended validation logic into a SQL query.
2. The SQL should select rows that violate the rule (matching the same "conditions match INVALID data" principle).
3. Use `createCustomSqlMonitorMac` with the translated query.
4. Inform the user that you used a custom SQL monitor as a fallback and explain why.