playbook/antigravity-awesome-skills/plugins/antigravity-awesome-skills-.../skills/hasdata/references/code-recipes.md

5.4 KiB

Code recipes — wiring HasData into your code

Ground rules

  • Base URL: https://api.hasdata.com. Header x-api-key on every request.
  • Methods: Scraper APIs are GET; Web Scraping is POST; Scraper Jobs use POST (submit) + GET (status/results) + DELETE (stop).
  • Key handling: read from env (HASDATA_API_KEY). Never hardcode, never log.
  • Timeouts: client timeout ≥ 300 s. HasData's deadline is 300 s; shorter clients get phantom failures while still being billed.
  • Retries: 429 and 5xx only with exponential backoff + jitter. Never retry 4xx.
  • Concurrency: cap at plan limit. Free tier = 1.
  • Success signal: sync APIs require body.requestMetadata.status === "ok". HTTP 200 alone isn't enough.

Status codes

Code Meaning Action
200 + status:"ok" OK Use body
401 Bad/missing key Fix — don't retry
403 Quota exhausted Don't retry
429 Concurrency cap Backoff + retry
500 Server error Retry

Python — minimal client

import os, requests

class HasData:
    BASE = "https://api.hasdata.com"

    def __init__(self, api_key=None, timeout=300):
        self.s = requests.Session()
        self.s.headers["x-api-key"] = api_key or os.environ["HASDATA_API_KEY"]
        self.timeout = timeout

    def get(self, path, **params):
        r = self.s.get(f"{self.BASE}{path}", params=params, timeout=self.timeout)
        r.raise_for_status()
        body = r.json()
        if body.get("requestMetadata", {}).get("status") != "ok":
            raise RuntimeError(f"hasdata not-ok: {body.get('requestMetadata')}")
        return body

    def post(self, path, body):
        r = self.s.post(f"{self.BASE}{path}", json=body, timeout=self.timeout)
        r.raise_for_status()
        return r.json()

hd = HasData()
serp = hd.get("/scrape/google/serp", q="coffee", num=20)["organicResults"]
md   = hd.post("/scrape/web", {"url": "https://example.com", "outputFormat": ["markdown"]})["markdown"]

Python — retry + bounded concurrency

import time, random
from concurrent.futures import ThreadPoolExecutor, as_completed
from requests import HTTPError

def with_retry(fn, attempts=5, base=1.0, cap=60.0):
    for i in range(attempts):
        try:
            return fn()
        except HTTPError as e:
            code = e.response.status_code
            if code == 429 or 500 <= code < 600:
                time.sleep(min(cap, base * 2 ** i) + random.random())
                continue
            raise
    raise RuntimeError("retry exhausted")

def scrape_many(urls, workers=5):
    out = {}
    with ThreadPoolExecutor(max_workers=workers) as ex:
        futs = {ex.submit(lambda u=u: hd.post("/scrape/web", {"url": u, "outputFormat": ["markdown"]})): u
                for u in urls}
        for f in as_completed(futs):
            try:
                out[futs[f]] = f.result().get("markdown")
            except Exception as e:
                out[futs[f]] = e
    return out

Cap workers at your plan's concurrency — anything higher just generates 429s.

TypeScript — minimal client

const BASE = "https://api.hasdata.com";
const KEY  = process.env.HASDATA_API_KEY!;

async function get<T = any>(path: string, params: Record<string, string | number> = {}): Promise<T> {
  const qs = new URLSearchParams(Object.entries(params).map(([k, v]) => [k, String(v)]));
  const r = await fetch(`${BASE}${path}?${qs}`, {
    headers: { "x-api-key": KEY },
    signal:  AbortSignal.timeout(300_000),
  });
  if (!r.ok) throw new Error(`HasData ${r.status} ${await r.text()}`);
  const body = await r.json() as any;
  if (body?.requestMetadata?.status && body.requestMetadata.status !== "ok") {
    throw new Error(`HasData not-ok: ${JSON.stringify(body.requestMetadata)}`);
  }
  return body as T;
}

async function post<T = any>(path: string, body: unknown): Promise<T> {
  const r = await fetch(`${BASE}${path}`, {
    method:  "POST",
    headers: { "x-api-key": KEY, "Content-Type": "application/json" },
    body:    JSON.stringify(body),
    signal:  AbortSignal.timeout(300_000),
  });
  if (!r.ok) throw new Error(`HasData ${r.status} ${await r.text()}`);
  return r.json() as Promise<T>;
}

// Bounded concurrency, no deps
async function pool<T, R>(items: T[], n: number, fn: (x: T) => Promise<R>) {
  const out: R[] = []; let i = 0;
  await Promise.all(Array.from({ length: n }, async () => {
    while (i < items.length) { const k = i++; out[k] = await fn(items[k]); }
  }));
  return out;
}

Pagination cheat sheet

Endpoint family Pagination
Google SERP / Light SERP / Bing start + num (max 100)
Google Maps Search start (steps of 20)
Yelp Search start (steps of 10)
Google Maps Reviews / Glassdoor / Airbnb nextPageToken
Indeed / YellowPages / Amazon Search start or page
Shopify Products page (with limit ≤ 250)
Scraper-Job results page + limit (max 100) until meta.currentPage >= meta.lastPage

Pre-ship checklist

  • Key from env, never logged.
  • All HTTP timeouts ≥ 300 s.
  • requestMetadata.status === "ok" checked on every sync response.
  • Backoff on 429 + 5xx; never on 4xx.
  • Concurrency capped at plan limit.
  • Job id (from submit response) persisted to durable storage immediately.
  • Webhooks paired with polling fallback.
  • Result files downloaded immediately on scraper.job.finished.