# Code recipes — wiring HasData into your code ## Ground rules - **Base URL:** `https://api.hasdata.com`. Header `x-api-key` on every request. - **Methods:** Scraper APIs are `GET`; Web Scraping is `POST`; Scraper Jobs use `POST` (submit) + `GET` (status/results) + `DELETE` (stop). - **Key handling:** read from env (`HASDATA_API_KEY`). Never hardcode, never log. - **Timeouts:** **client timeout ≥ 300 s.** HasData's deadline is 300 s; shorter clients get phantom failures while still being billed. - **Retries:** `429` and `5xx` only with exponential backoff + jitter. Never retry `4xx`. - **Concurrency:** cap at plan limit. Free tier = 1. - **Success signal:** sync APIs require `body.requestMetadata.status === "ok"`. HTTP 200 alone isn't enough. ## Status codes | Code | Meaning | Action | |---|---|---| | 200 + `status:"ok"` | OK | Use body | | 401 | Bad/missing key | Fix — don't retry | | 403 | Quota exhausted | Don't retry | | 429 | Concurrency cap | Backoff + retry | | 500 | Server error | Retry | ## Python — minimal client ```python import os, requests class HasData: BASE = "https://api.hasdata.com" def __init__(self, api_key=None, timeout=300): self.s = requests.Session() self.s.headers["x-api-key"] = api_key or os.environ["HASDATA_API_KEY"] self.timeout = timeout def get(self, path, **params): r = self.s.get(f"{self.BASE}{path}", params=params, timeout=self.timeout) r.raise_for_status() body = r.json() if body.get("requestMetadata", {}).get("status") != "ok": raise RuntimeError(f"hasdata not-ok: {body.get('requestMetadata')}") return body def post(self, path, body): r = self.s.post(f"{self.BASE}{path}", json=body, timeout=self.timeout) r.raise_for_status() return r.json() hd = HasData() serp = hd.get("/scrape/google/serp", q="coffee", num=20)["organicResults"] md = hd.post("/scrape/web", {"url": "https://example.com", "outputFormat": ["markdown"]})["markdown"] ``` ## Python — retry + bounded concurrency ```python import time, random from concurrent.futures import ThreadPoolExecutor, as_completed from requests import HTTPError def with_retry(fn, attempts=5, base=1.0, cap=60.0): for i in range(attempts): try: return fn() except HTTPError as e: code = e.response.status_code if code == 429 or 500 <= code < 600: time.sleep(min(cap, base * 2 ** i) + random.random()) continue raise raise RuntimeError("retry exhausted") def scrape_many(urls, workers=5): out = {} with ThreadPoolExecutor(max_workers=workers) as ex: futs = {ex.submit(lambda u=u: hd.post("/scrape/web", {"url": u, "outputFormat": ["markdown"]})): u for u in urls} for f in as_completed(futs): try: out[futs[f]] = f.result().get("markdown") except Exception as e: out[futs[f]] = e return out ``` Cap `workers` at your plan's concurrency — anything higher just generates `429`s. ## TypeScript — minimal client ```typescript const BASE = "https://api.hasdata.com"; const KEY = process.env.HASDATA_API_KEY!; async function get(path: string, params: Record = {}): Promise { const qs = new URLSearchParams(Object.entries(params).map(([k, v]) => [k, String(v)])); const r = await fetch(`${BASE}${path}?${qs}`, { headers: { "x-api-key": KEY }, signal: AbortSignal.timeout(300_000), }); if (!r.ok) throw new Error(`HasData ${r.status} ${await r.text()}`); const body = await r.json() as any; if (body?.requestMetadata?.status && body.requestMetadata.status !== "ok") { throw new Error(`HasData not-ok: ${JSON.stringify(body.requestMetadata)}`); } return body as T; } async function post(path: string, body: unknown): Promise { const r = await fetch(`${BASE}${path}`, { method: "POST", headers: { "x-api-key": KEY, "Content-Type": "application/json" }, body: JSON.stringify(body), signal: AbortSignal.timeout(300_000), }); if (!r.ok) throw new Error(`HasData ${r.status} ${await r.text()}`); return r.json() as Promise; } // Bounded concurrency, no deps async function pool(items: T[], n: number, fn: (x: T) => Promise) { const out: R[] = []; let i = 0; await Promise.all(Array.from({ length: n }, async () => { while (i < items.length) { const k = i++; out[k] = await fn(items[k]); } })); return out; } ``` ## Pagination cheat sheet | Endpoint family | Pagination | |---|---| | Google SERP / Light SERP / Bing | `start` + `num` (max 100) | | Google Maps Search | `start` (steps of 20) | | Yelp Search | `start` (steps of 10) | | Google Maps Reviews / Glassdoor / Airbnb | `nextPageToken` | | Indeed / YellowPages / Amazon Search | `start` or `page` | | Shopify Products | `page` (with `limit` ≤ 250) | | Scraper-Job results | `page` + `limit` (max 100) until `meta.currentPage >= meta.lastPage` | ## Pre-ship checklist - [ ] Key from env, never logged. - [ ] All HTTP timeouts ≥ 300 s. - [ ] `requestMetadata.status === "ok"` checked on every sync response. - [ ] Backoff on 429 + 5xx; never on 4xx. - [ ] Concurrency capped at plan limit. - [ ] Job `id` (from submit response) persisted to durable storage immediately. - [ ] Webhooks paired with polling fallback. - [ ] Result files downloaded immediately on `scraper.job.finished`.