playbook/antigravity-awesome-skills/skills/monte-carlo-push-ingestion/references/validation.md

4.9 KiB

Validating Pushed Data

All verification queries use the GraphQL API key at https://api.getmontecarlo.com/graphql.


Resolve a table's MCON and fullTableId

Before running most queries you need either the mcon or fullTableId.

fullTableId format: <database>:<schema>.<table> — e.g. analytics:public.orders

query GetTable($fullTableId: String!, $dwId: UUID!) {
  getTable(fullTableId: $fullTableId, dwId: $dwId) {
    mcon
    fullTableId
    displayName
  }
}

Variables:

{
  "fullTableId": "analytics:public.orders",
  "dwId": "<warehouse-uuid>"
}

Verify metadata (schema + columns)

query GetTableMetadata($mcon: String!) {
  getTable(mcon: $mcon) {
    mcon
    fullTableId
    versions {
      edges {
        node {
          fields {
            name
            fieldType
          }
        }
      }
    }
  }
}

Check that the fields list matches your pushed schema.


Verify volume and freshness metrics

Use getMetricsV4 to fetch row counts and last-modified timestamps:

query GetMetrics(
  $mcon: String!
  $metricName: String!
  $startTime: DateTime!
  $endTime: DateTime!
) {
  getMetricsV4(
    dwId: null
    mcon: $mcon
    metricName: $metricName
    startTime: $startTime
    endTime: $endTime
  ) {
    metricsJson
  }
}

Variables (row count):

{
  "mcon": "<table-mcon>",
  "metricName": "total_row_count",
  "startTime": "2024-03-01T00:00:00Z",
  "endTime": "2024-03-02T00:00:00Z"
}

metricsJson is a JSON string. Parse it and look for value and measurementTimestamp (camelCase) in each data point.

Other useful metric names:

  • "total_row_count" — row count
  • "total_byte_count" — byte size
  • "total_row_count_last_changed_on" — Unix epoch float of when the row count last changed

Verify table lineage

query GetTableLineage($mcon: String!) {
  getTableLineage(mcon: $mcon, direction: "upstream", hops: 1) {
    connectedNodes {
      mcon
      displayName
      objectType
    }
    flattenedEdges {
      directlyConnectedMcons
    }
  }
}

Check that your expected source tables appear in connectedNodes or flattenedEdges[].directlyConnectedMcons.


Verify column lineage

query GetColumnLineage($mcon: String!, $column: String!) {
  getDerivedTablesPartialLineage(mcon: $mcon, column: $column, pageSize: 1000) {
    destinations {
      table { mcon displayName }
      columns { columnName }
    }
  }
}

Variables: mcon = source table MCON, column = source column name.

Check that each destination table and column appears in the response.


Verify query logs

query GetAggregatedQueries(
  $mcon: String!
  $queryType: String!
  $startTime: DateTime!
  $endTime: DateTime!
  $first: Int
  $after: String
) {
  getAggregatedQueries(
    mcon: $mcon
    queryType: $queryType
    startTime: $startTime
    endTime: $endTime
    first: $first
    after: $after
  ) {
    edges { node { queryHash queryCount lastSeen } }
    pageInfo { hasNextPage endCursor }
  }
}

Variables:

{
  "mcon": "<table-mcon>",
  "queryType": "read",
  "startTime": "2024-03-01T00:00:00Z",
  "endTime": "2024-03-02T00:00:00Z",
  "first": 100
}

Remember: query logs take up to 1 hour to process after push. If you see 0 results immediately after pushing, wait and try again.


Check detector thresholds (anomaly detection status)

query GetDetectorStatus($mcon: String!) {
  getTable(mcon: $mcon) {
    thresholds {
      freshness {
        lower { value }
        upper { value }
        status
      }
      size {
        lower { value }
        upper { value }
        status
      }
    }
  }
}

status will be "no data" or "inactive" on a newly-pushed table. Detectors need historical data to train — see references/anomaly-detection.md for requirements.


Table management operations

Delete push-ingested tables

Only works on push-ingested tables — pull-collected tables are excluded by default.

mutation DeletePushTables($mcons: [String!]!) {
  deletePushIngestedTables(mcons: $mcons) {
    success
    deletedCount
  }
}

Variables:

{
  "mcons": ["<mcon-1>", "<mcon-2>"]
}

Resolve MCONs first with getTable(fullTableId: ..., dwId: ...).


Python helper

import requests, json

GRAPHQL_URL = "https://api.getmontecarlo.com/graphql"

def graphql(query: str, variables: dict, key_id: str, key_token: str) -> dict:
    resp = requests.post(
        GRAPHQL_URL,
        json={"query": query, "variables": variables},
        headers={
            "x-mcd-id": key_id,
            "x-mcd-token": key_token,
            "Content-Type": "application/json",
        },
    )
    resp.raise_for_status()
    data = resp.json()
    if "errors" in data:
        raise RuntimeError(json.dumps(data["errors"], indent=2))
    return data["data"]