playbook/antigravity-awesome-skills/skills/hasdata/references/jobs.md

3.1 KiB
Raw Blame History

Jobs APIs — Indeed & Glassdoor

Endpoint Returns
/scrape/indeed/listing Indeed search results
/scrape/indeed/job Single Indeed job detail
/scrape/glassdoor/listing Glassdoor search results
/scrape/glassdoor/job Single Glassdoor job (incl. salary band, company snippet)

All synchronous GET.

Indeed Listing

import requests

resp = requests.get(
    "https://api.hasdata.com/scrape/indeed/listing",
    headers={"x-api-key": API_KEY},
    params={
        "keyword":  "software engineer",
        "location": "New York, NY",
        "sort":     "date",
        "domain":   "www.indeed.com",
        "start":    0,
    },
    timeout=300,
)
Param Notes
keyword Required.
location Required.
sort date, relevance (default).
domain Country site — www.indeed.com, uk.indeed.com, de.indeed.com.
start Offset, steps of 10.

Response: jobs array with title, company, location, salary, description, postedAt, link, jobKey. Salary is free-form string — parse with regex.

Indeed Job

Pass jobKey from listing → returns full description, requirements, benefits, company URL.

Glassdoor Listing & Job

params = {"keyword": "software engineer", "location": "New York, NY", "sort": "recent"}
# pagination: pass back nextPageToken
Param Notes
keyword, location Required.
sort recent (default), relevant.
domain Country site.
nextPageToken Cursor pagination.

Patterns

Salary band

import re, statistics

def salary_band(role, location):
    page = requests.get(
        "https://api.hasdata.com/scrape/indeed/listing",
        headers={"x-api-key": API_KEY},
        params={"keyword": role, "location": location}, timeout=300,
    ).json()
    nums = [int(m.replace(",", ""))
            for j in page.get("jobs", [])
            for m in re.findall(r"\$([\d,]+)", j.get("salary") or "")]
    if not nums: return None
    return {"n": len(nums), "median": statistics.median(nums)}

Hiring velocity by company

from collections import Counter

page = indeed_listing(role, loc, sort="date")
Counter(j.get("company") for j in page.get("jobs", []))

Run weekly; sustained increases often precede earnings/PR signals.

Pagination differs

# Indeed: numeric start
for p in range(10):
    page = indeed_listing(kw, loc, start=p * 10)

# Glassdoor: cursor token
out, token = [], None
while True:
    page = glassdoor_listing(kw, loc, next_token=token)
    out.extend(page.get("jobs", []))
    token = page.get("nextPageToken")
    if not token: break

Gotchas

  • Salary is free-form string. Always regex-parse.
  • Indeed = numeric start (10), Glassdoor = token. Don't mix.
  • domain matters for non-US. uk.indeed.com, ca.indeed.com, etc.
  • Prefer the API + pagination for bulk. Reach for the matching Scraper Job only when you want webhook-driven fan-out across many keyword × location pairs without managing the polling loop yourself.