405 lines
15 KiB
Markdown
405 lines
15 KiB
Markdown
# Validation Monitor Reference
|
|
|
|
Detailed reference for building `createValidationMonitorMac` tool calls.
|
|
|
|
## When to Use
|
|
|
|
Use a validation monitor when the user wants to:
|
|
|
|
- Check that specific fields are never null
|
|
- Validate that values are within an allowed set (e.g., status in 'active', 'pending', 'inactive')
|
|
- Enforce referential integrity (field values exist in another table)
|
|
- Apply row-level business rules (e.g., "amount must be positive")
|
|
- Combine multiple conditions with AND/OR logic
|
|
|
|
---
|
|
|
|
## Getting the Logic Right: Conditions Match INVALID Data
|
|
|
|
This is the single most confusing aspect of validation monitors and the number one source of mistakes. **Conditions describe what INVALID data looks like -- the data you want to be alerted about.** They do NOT describe what valid data looks like.
|
|
|
|
Think of it this way: the monitor scans rows and fires an alert when it finds rows matching the condition. So the condition must match the BAD rows.
|
|
|
|
| User wants | Condition should match | Common mistake |
|
|
|------------|----------------------|----------------|
|
|
| "id should never be null" | id IS NULL (alert when null found) | id IS NOT NULL (would alert on every valid row) |
|
|
| "status must be in [active, pending]" | status NOT IN [active, pending] (alert on unexpected values) | status IN [active, pending] (would alert on valid rows) |
|
|
| "amount must be positive" | amount IS NEGATIVE (alert on bad values) | amount > 0 (would alert on valid rows) |
|
|
| "email must not be empty" | email IS NULL **OR** email = '' (alert on missing) | email IS NOT NULL (would alert on valid rows) |
|
|
|
|
**Before building any condition, ask yourself: "If a row matches this condition, is the row INVALID?" If the answer is no, the logic is backwards.**
|
|
|
|
---
|
|
|
|
## Pre-Step: Verify Field Existence
|
|
|
|
Before constructing the `alert_condition`, verify that every field name you plan to reference exists in the table's column list. This is the number two source of validation monitor failures -- referencing columns that do not exist or are misspelled.
|
|
|
|
1. You should already have the column list from `getTable` with `include_fields: true` (done in Step 2 of the main skill).
|
|
2. For every field name in your planned conditions, confirm it appears in the column list exactly as spelled (field names are case-sensitive on most warehouses).
|
|
3. If a field does not exist, stop and ask the user to clarify the correct column name. Do not guess.
|
|
|
|
---
|
|
|
|
## Required Parameters
|
|
|
|
| Parameter | Type | Description |
|
|
|-----------|------|-------------|
|
|
| `name` | string | Unique identifier for the monitor. Use a descriptive slug (e.g., `orders_not_null_check`). |
|
|
| `description` | string | Human-readable description of what the monitor checks. |
|
|
| `table` | string | Table MCON (preferred) or `database:schema.table` format. If not MCON, also pass `warehouse`. |
|
|
| `alert_condition` | object | Condition tree defining when to alert (see Alert Condition Structure below). |
|
|
|
|
## Optional Parameters
|
|
|
|
| Parameter | Type | Description |
|
|
|-----------|------|-------------|
|
|
| `warehouse` | string | Warehouse name or UUID. Required if `table` is not an MCON. |
|
|
| `domain_id` | string (uuid) | Domain UUID (use `getDomains` to list). |
|
|
|
|
---
|
|
|
|
## Alert Condition Structure
|
|
|
|
The top level of `alert_condition` must always be a GROUP node. This GROUP contains one or more conditions combined with AND or OR logic.
|
|
|
|
```json
|
|
{
|
|
"type": "GROUP",
|
|
"operator": "AND",
|
|
"conditions": [...]
|
|
}
|
|
```
|
|
|
|
### Condition Types
|
|
|
|
There are four condition types: UNARY, BINARY, SQL, and GROUP.
|
|
|
|
#### UNARY (single-value checks)
|
|
|
|
Used for predicates that operate on a single field with no comparison value.
|
|
|
|
```json
|
|
{
|
|
"type": "UNARY",
|
|
"predicate": {"name": "null", "negated": false},
|
|
"value": [{"type": "FIELD", "field": "column_name"}]
|
|
}
|
|
```
|
|
|
|
- `predicate.name` -- the predicate to apply (see Predicates Reference below).
|
|
- `predicate.negated` -- set to `true` to invert the predicate (e.g., `null` with `negated: true` means "is NOT null").
|
|
- `value` -- an array with a single value descriptor (usually a FIELD reference).
|
|
|
|
#### BINARY (comparison checks)
|
|
|
|
Used for predicates that compare a field against a value.
|
|
|
|
```json
|
|
{
|
|
"type": "BINARY",
|
|
"predicate": {"name": "greater_than", "negated": false},
|
|
"left": [{"type": "FIELD", "field": "column_name"}],
|
|
"right": [{"type": "LITERAL", "literal": "0"}]
|
|
}
|
|
```
|
|
|
|
- `left` -- the left-hand side of the comparison (typically a FIELD reference).
|
|
- `right` -- the right-hand side (typically a LITERAL value, SQL expression, or FIELD reference).
|
|
- Both `left` and `right` are arrays of value descriptors.
|
|
|
|
#### SQL (custom SQL expression)
|
|
|
|
Used for complex conditions that are difficult to express with UNARY/BINARY nodes. The SQL expression should evaluate to true for INVALID rows.
|
|
|
|
```json
|
|
{
|
|
"type": "SQL",
|
|
"sql": "amount > 0 AND amount < 1000000"
|
|
}
|
|
```
|
|
|
|
#### GROUP (nested conditions)
|
|
|
|
Used to combine multiple conditions with AND or OR logic. Groups can be nested.
|
|
|
|
```json
|
|
{
|
|
"type": "GROUP",
|
|
"operator": "OR",
|
|
"conditions": [
|
|
{"type": "UNARY", "...": "..."},
|
|
{"type": "BINARY", "...": "..."}
|
|
]
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Value Types
|
|
|
|
Value descriptors appear in the `value`, `left`, and `right` arrays of UNARY and BINARY conditions.
|
|
|
|
| Type | Field | Description | Example |
|
|
|------|-------|-------------|---------|
|
|
| `FIELD` | `"field": "column_name"` | References a column in the table. | `{"type": "FIELD", "field": "user_id"}` |
|
|
| `LITERAL` | `"literal": "value"` | A static value (always a string, even for numbers). | `{"type": "LITERAL", "literal": "100"}` |
|
|
| `SQL` | `"sql": "SELECT ..."` | A SQL expression or subquery. | `{"type": "SQL", "sql": "SELECT MAX(id) FROM ref_table"}` |
|
|
|
|
---
|
|
|
|
## Predicates Reference
|
|
|
|
Before building conditions, call `getValidationPredicates` to get the full list of supported predicates for the connected warehouse. The list below covers common predicates but may not be exhaustive.
|
|
|
|
### Unary Predicates
|
|
|
|
These predicates take no comparison value -- they check a property of the field itself.
|
|
|
|
| Predicate | Description | Example use |
|
|
|-----------|-------------|-------------|
|
|
| `null` | Field value is null. | Alert on null ids. |
|
|
| `is_negative` | Field value is negative. | Alert on negative amounts. |
|
|
| `is_between_0_and_1` | Field value is between 0 and 1 (inclusive). | Alert on rates that should be percentages (0-100). |
|
|
| `is_future_date` | Field value is a date/timestamp in the future. | Alert on future-dated records. |
|
|
| `is_uuid` | Field value matches UUID format. | Alert on non-UUID values in a UUID field (use with `negated: true`). |
|
|
|
|
### Binary Predicates
|
|
|
|
These predicates compare a field against a value.
|
|
|
|
| Predicate | Right-hand side | Description | Example use |
|
|
|-----------|----------------|-------------|-------------|
|
|
| `equal` | Single LITERAL | Field equals the given value. | Alert when `status` equals `'deleted'`. |
|
|
| `greater_than` | Single LITERAL | Field is greater than the given value. | Alert when `discount_pct` exceeds 100. |
|
|
| `less_than` | Single LITERAL | Field is less than the given value. | Alert when `quantity` is below 0. |
|
|
| `in_set` | Multiple LITERALs | Field value is in the given set. | Alert when `status` is in an invalid set (see example below). |
|
|
| `contains` | Single LITERAL | Field value contains the given substring. | Alert when `email` contains `'test@'`. |
|
|
| `starts_with` | Single LITERAL | Field value starts with the given prefix. | Alert when `phone` starts with `'000'`. |
|
|
| `between` | Two LITERALs | Field value is between the two given values (inclusive). | Alert when `score` is between 0 and 10 (if that range is invalid). |
|
|
|
|
### Using `negated` to Invert Predicates
|
|
|
|
Any predicate can be inverted by setting `"negated": true` in the predicate object. This is essential for "must be in set" validations:
|
|
|
|
- **"status must be in [active, pending]"** becomes `in_set` with values `["active", "pending"]` and `negated: true` -- meaning "alert when status is NOT in [active, pending]".
|
|
- **"id must not be null"** becomes `null` with `negated: false` -- meaning "alert when id IS null" (no inversion needed since the condition already matches invalid data).
|
|
|
|
---
|
|
|
|
## Examples
|
|
|
|
### Alert when id is null
|
|
|
|
Verify that `id` exists in the table schema from `getTable` before proceeding.
|
|
|
|
```json
|
|
{
|
|
"name": "orders_id_not_null",
|
|
"description": "Alert when order id is null",
|
|
"table": "MCON++a1b2c3d4-e5f6-7890-abcd-ef1234567890++1++1++analytics:core.orders",
|
|
"alert_condition": {
|
|
"type": "GROUP",
|
|
"operator": "AND",
|
|
"conditions": [
|
|
{
|
|
"type": "UNARY",
|
|
"predicate": {"name": "null", "negated": false},
|
|
"value": [{"type": "FIELD", "field": "id"}]
|
|
}
|
|
]
|
|
}
|
|
}
|
|
```
|
|
|
|
The condition matches rows where `id` IS NULL -- these are the invalid rows we want to be alerted about.
|
|
|
|
### Alert when status is not in allowed set
|
|
|
|
Verify that `status` exists in the table schema from `getTable` before proceeding.
|
|
|
|
```json
|
|
{
|
|
"name": "orders_status_allowed_values",
|
|
"description": "Alert when order status is outside the allowed set",
|
|
"table": "MCON++a1b2c3d4-e5f6-7890-abcd-ef1234567890++1++1++analytics:core.orders",
|
|
"alert_condition": {
|
|
"type": "GROUP",
|
|
"operator": "AND",
|
|
"conditions": [
|
|
{
|
|
"type": "BINARY",
|
|
"predicate": {"name": "in_set", "negated": true},
|
|
"left": [{"type": "FIELD", "field": "status"}],
|
|
"right": [
|
|
{"type": "LITERAL", "literal": "active"},
|
|
{"type": "LITERAL", "literal": "pending"},
|
|
{"type": "LITERAL", "literal": "inactive"}
|
|
]
|
|
}
|
|
]
|
|
}
|
|
}
|
|
```
|
|
|
|
Note `negated: true` -- the predicate is `in_set`, but we want to alert when the value is NOT in the set. This catches any unexpected status values.
|
|
|
|
### Alert when amount is negative
|
|
|
|
Verify that `amount` exists in the table schema from `getTable` before proceeding.
|
|
|
|
```json
|
|
{
|
|
"name": "orders_positive_amount",
|
|
"description": "Alert when order amount is negative",
|
|
"table": "MCON++a1b2c3d4-e5f6-7890-abcd-ef1234567890++1++1++analytics:core.orders",
|
|
"alert_condition": {
|
|
"type": "GROUP",
|
|
"operator": "AND",
|
|
"conditions": [
|
|
{
|
|
"type": "UNARY",
|
|
"predicate": {"name": "is_negative", "negated": false},
|
|
"value": [{"type": "FIELD", "field": "amount"}]
|
|
}
|
|
]
|
|
}
|
|
}
|
|
```
|
|
|
|
The condition matches rows where `amount` is negative -- these are the invalid rows.
|
|
|
|
### Combined conditions: null OR negative
|
|
|
|
Verify that both `amount` and `quantity` exist in the table schema from `getTable` before proceeding.
|
|
|
|
```json
|
|
{
|
|
"name": "orders_amount_quality",
|
|
"description": "Alert when amount is null or quantity is negative",
|
|
"table": "MCON++a1b2c3d4-e5f6-7890-abcd-ef1234567890++1++1++analytics:core.orders",
|
|
"alert_condition": {
|
|
"type": "GROUP",
|
|
"operator": "OR",
|
|
"conditions": [
|
|
{
|
|
"type": "UNARY",
|
|
"predicate": {"name": "null", "negated": false},
|
|
"value": [{"type": "FIELD", "field": "amount"}]
|
|
},
|
|
{
|
|
"type": "UNARY",
|
|
"predicate": {"name": "is_negative", "negated": false},
|
|
"value": [{"type": "FIELD", "field": "quantity"}]
|
|
}
|
|
]
|
|
}
|
|
}
|
|
```
|
|
|
|
The OR operator means an alert fires if either condition matches -- the row has a null amount OR a negative quantity.
|
|
|
|
### Between check with nested AND/OR
|
|
|
|
Verify that `score` and `status` exist in the table schema from `getTable` before proceeding.
|
|
|
|
```json
|
|
{
|
|
"name": "records_score_validation",
|
|
"description": "Alert when score is outside 0-100 range for active records",
|
|
"table": "MCON++a1b2c3d4-e5f6-7890-abcd-ef1234567890++1++1++warehouse:metrics.records",
|
|
"alert_condition": {
|
|
"type": "GROUP",
|
|
"operator": "AND",
|
|
"conditions": [
|
|
{
|
|
"type": "BINARY",
|
|
"predicate": {"name": "equal", "negated": false},
|
|
"left": [{"type": "FIELD", "field": "status"}],
|
|
"right": [{"type": "LITERAL", "literal": "active"}]
|
|
},
|
|
{
|
|
"type": "BINARY",
|
|
"predicate": {"name": "between", "negated": true},
|
|
"left": [{"type": "FIELD", "field": "score"}],
|
|
"right": [
|
|
{"type": "LITERAL", "literal": "0"},
|
|
{"type": "LITERAL", "literal": "100"}
|
|
]
|
|
}
|
|
]
|
|
}
|
|
}
|
|
```
|
|
|
|
This uses `between` with `negated: true` to alert when score is outside the 0-100 range, but only for active records (the AND operator requires both conditions to match).
|
|
|
|
### Referential integrity with SQL subquery
|
|
|
|
Verify that `customer_id` exists in the table schema from `getTable` before proceeding.
|
|
|
|
```json
|
|
{
|
|
"name": "orders_valid_customer",
|
|
"description": "Alert when customer_id does not exist in customers table",
|
|
"table": "MCON++a1b2c3d4-e5f6-7890-abcd-ef1234567890++1++1++analytics:core.orders",
|
|
"alert_condition": {
|
|
"type": "GROUP",
|
|
"operator": "AND",
|
|
"conditions": [
|
|
{
|
|
"type": "SQL",
|
|
"sql": "customer_id IS NOT NULL AND customer_id NOT IN (SELECT id FROM analytics.core.customers)"
|
|
}
|
|
]
|
|
}
|
|
}
|
|
```
|
|
|
|
The SQL condition type is useful for referential integrity checks that require subqueries. The `customer_id IS NOT NULL` guard avoids alerting on null values (which should be caught by a separate null check if needed).
|
|
|
|
### Contains and starts_with checks
|
|
|
|
Verify that `email` and `phone` exist in the table schema from `getTable` before proceeding.
|
|
|
|
```json
|
|
{
|
|
"name": "contacts_format_validation",
|
|
"description": "Alert when email contains test data or phone has invalid prefix",
|
|
"table": "MCON++a1b2c3d4-e5f6-7890-abcd-ef1234567890++1++1++warehouse:crm.contacts",
|
|
"alert_condition": {
|
|
"type": "GROUP",
|
|
"operator": "OR",
|
|
"conditions": [
|
|
{
|
|
"type": "BINARY",
|
|
"predicate": {"name": "contains", "negated": false},
|
|
"left": [{"type": "FIELD", "field": "email"}],
|
|
"right": [{"type": "LITERAL", "literal": "@test.example.com"}]
|
|
},
|
|
{
|
|
"type": "BINARY",
|
|
"predicate": {"name": "starts_with", "negated": false},
|
|
"left": [{"type": "FIELD", "field": "phone"}],
|
|
"right": [{"type": "LITERAL", "literal": "000"}]
|
|
}
|
|
]
|
|
}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Fallback: Custom SQL Monitor
|
|
|
|
If `createValidationMonitorMac` fails -- for example because a referenced column does not exist yet in the live table, or the warehouse does not support a particular predicate -- fall back to `createCustomSqlMonitorMac` with an explicit SQL query instead.
|
|
|
|
A custom SQL monitor lets you express any validation logic as a SQL query that returns rows or a count. This is always available as a backup when the structured validation condition tree cannot express what you need or encounters an API error.
|
|
|
|
When falling back:
|
|
|
|
1. Translate the intended validation logic into a SQL query.
|
|
2. The SQL should select rows that violate the rule (matching the same "conditions match INVALID data" principle).
|
|
3. Use `createCustomSqlMonitorMac` with the translated query.
|
|
4. Inform the user that you used a custom SQL monitor as a fallback and explain why.
|