4.9 KiB
Validating Pushed Data
All verification queries use the GraphQL API key at https://api.getmontecarlo.com/graphql.
Resolve a table's MCON and fullTableId
Before running most queries you need either the mcon or fullTableId.
fullTableId format: <database>:<schema>.<table> — e.g. analytics:public.orders
query GetTable($fullTableId: String!, $dwId: UUID!) {
getTable(fullTableId: $fullTableId, dwId: $dwId) {
mcon
fullTableId
displayName
}
}
Variables:
{
"fullTableId": "analytics:public.orders",
"dwId": "<warehouse-uuid>"
}
Verify metadata (schema + columns)
query GetTableMetadata($mcon: String!) {
getTable(mcon: $mcon) {
mcon
fullTableId
versions {
edges {
node {
fields {
name
fieldType
}
}
}
}
}
}
Check that the fields list matches your pushed schema.
Verify volume and freshness metrics
Use getMetricsV4 to fetch row counts and last-modified timestamps:
query GetMetrics(
$mcon: String!
$metricName: String!
$startTime: DateTime!
$endTime: DateTime!
) {
getMetricsV4(
dwId: null
mcon: $mcon
metricName: $metricName
startTime: $startTime
endTime: $endTime
) {
metricsJson
}
}
Variables (row count):
{
"mcon": "<table-mcon>",
"metricName": "total_row_count",
"startTime": "2024-03-01T00:00:00Z",
"endTime": "2024-03-02T00:00:00Z"
}
metricsJson is a JSON string. Parse it and look for value and measurementTimestamp
(camelCase) in each data point.
Other useful metric names:
"total_row_count"— row count"total_byte_count"— byte size"total_row_count_last_changed_on"— Unix epoch float of when the row count last changed
Verify table lineage
query GetTableLineage($mcon: String!) {
getTableLineage(mcon: $mcon, direction: "upstream", hops: 1) {
connectedNodes {
mcon
displayName
objectType
}
flattenedEdges {
directlyConnectedMcons
}
}
}
Check that your expected source tables appear in connectedNodes or
flattenedEdges[].directlyConnectedMcons.
Verify column lineage
query GetColumnLineage($mcon: String!, $column: String!) {
getDerivedTablesPartialLineage(mcon: $mcon, column: $column, pageSize: 1000) {
destinations {
table { mcon displayName }
columns { columnName }
}
}
}
Variables: mcon = source table MCON, column = source column name.
Check that each destination table and column appears in the response.
Verify query logs
query GetAggregatedQueries(
$mcon: String!
$queryType: String!
$startTime: DateTime!
$endTime: DateTime!
$first: Int
$after: String
) {
getAggregatedQueries(
mcon: $mcon
queryType: $queryType
startTime: $startTime
endTime: $endTime
first: $first
after: $after
) {
edges { node { queryHash queryCount lastSeen } }
pageInfo { hasNextPage endCursor }
}
}
Variables:
{
"mcon": "<table-mcon>",
"queryType": "read",
"startTime": "2024-03-01T00:00:00Z",
"endTime": "2024-03-02T00:00:00Z",
"first": 100
}
Remember: query logs take up to 1 hour to process after push. If you see 0 results immediately after pushing, wait and try again.
Check detector thresholds (anomaly detection status)
query GetDetectorStatus($mcon: String!) {
getTable(mcon: $mcon) {
thresholds {
freshness {
lower { value }
upper { value }
status
}
size {
lower { value }
upper { value }
status
}
}
}
}
status will be "no data" or "inactive" on a newly-pushed table. Detectors need
historical data to train — see references/anomaly-detection.md for requirements.
Table management operations
Delete push-ingested tables
Only works on push-ingested tables — pull-collected tables are excluded by default.
mutation DeletePushTables($mcons: [String!]!) {
deletePushIngestedTables(mcons: $mcons) {
success
deletedCount
}
}
Variables:
{
"mcons": ["<mcon-1>", "<mcon-2>"]
}
Resolve MCONs first with getTable(fullTableId: ..., dwId: ...).
Python helper
import requests, json
GRAPHQL_URL = "https://api.getmontecarlo.com/graphql"
def graphql(query: str, variables: dict, key_id: str, key_token: str) -> dict:
resp = requests.post(
GRAPHQL_URL,
json={"query": query, "variables": variables},
headers={
"x-mcd-id": key_id,
"x-mcd-token": key_token,
"Content-Type": "application/json",
},
)
resp.raise_for_status()
data = resp.json()
if "errors" in data:
raise RuntimeError(json.dumps(data["errors"], indent=2))
return data["data"]