435 lines
31 KiB
Markdown
435 lines
31 KiB
Markdown
---
|
||
name: competitor-analysis
|
||
description: "Research competitors with Browserbase discovery, enrichment lanes, screenshots, matrices, and HTML reports."
|
||
license: MIT
|
||
compatibility: Requires the browse CLI (npm install -g browse) and BROWSERBASE_API_KEY env var
|
||
allowed-tools: Bash Agent AskUserQuestion
|
||
metadata:
|
||
author: browserbase
|
||
version: "0.2.0"
|
||
category: "marketing"
|
||
risk: "safe"
|
||
source: "official"
|
||
source_repo: "browserbase/skills"
|
||
source_type: "official"
|
||
date_added: "2026-06-19"
|
||
author: "Browserbase"
|
||
license_source: "https://github.com/browserbase/skills/blob/main/skills/competitor-analysis/LICENSE.txt"
|
||
tags:
|
||
- competitor-analysis
|
||
- browserbase
|
||
- market-research
|
||
- browser-automation
|
||
tools:
|
||
- claude-code
|
||
- codex-cli
|
||
- cursor
|
||
---
|
||
|
||
# Competitor Analysis
|
||
|
||
## When to Use
|
||
|
||
Use when the user needs structured competitor research with Browserbase discovery, enrichment lanes, screenshots, comparison matrices, and a final HTML report.
|
||
|
||
|
||
_Source: [browserbase/skills](https://github.com/browserbase/skills) (MIT)._
|
||
|
||
Analyze a user's competitors. Uses Browserbase Search API for discovery and a 4-lane Plan→Research→Synthesize pattern for enrichment — outputting an HTML report with overview, per-competitor deep dives, a side-by-side feature/pricing matrix, and a chronological mentions feed.
|
||
|
||
**Required**: `BROWSERBASE_API_KEY` env var and the `browse` CLI installed (`npm install -g browse`).
|
||
|
||
**First-run setup**: On the first run you'll be prompted to approve `browse cloud fetch`, `browse cloud search`, `cat`, `mkdir`, `sed`, etc. Select **"Yes, and don't ask again for: browse cloud fetch:\*"** (or equivalent) for each. To permanently approve, add these to your `~/.claude/settings.json` under `permissions.allow`:
|
||
```json
|
||
"Bash(browse:*)", "Bash(bunx:*)", "Bash(bun:*)", "Bash(node:*)",
|
||
"Bash(cat:*)", "Bash(mkdir:*)", "Bash(sed:*)", "Bash(head:*)", "Bash(tr:*)", "Bash(rm:*)"
|
||
```
|
||
|
||
**Path rules**: Always use full literal paths in Bash — NOT `~` or `$HOME`. Resolve the home directory once and use it everywhere. When building subagent prompts, replace `{SKILL_DIR}` with the full literal path.
|
||
|
||
**Output directory**: All output goes to `~/Desktop/{company_slug}_competitors_{YYYY-MM-DD}/`. This directory contains one `.md` file per competitor plus the generated HTML views and CSV.
|
||
|
||
**CRITICAL — Tool restrictions (applies to main agent AND all subagents)**:
|
||
- All web searches: use `browse cloud search`. NEVER WebSearch.
|
||
- All page fetches: use `browse cloud fetch --allow-redirects` (returns markdown by default; add `--format raw` if you need the original HTML, then pipe through `sed ... | tr -s ' \n'` to extract text). NEVER WebFetch. 1 MB response limit — fall back to `browse get markdown` (after `browse open <url> --remote`) for JS-heavy pages.
|
||
- All research output: subagents write **one markdown file per competitor** to `{OUTPUT_DIR}/{competitor-slug}.md` using bash heredoc. NEVER use the Write tool or `python3 -c`. See `references/example-research.md` for the file format.
|
||
- Report compilation: use `node {SKILL_DIR}/scripts/compile_report.mjs {OUTPUT_DIR} --user-company "{user_company}" --open` — generates `index.html`, `competitors/*.html`, `matrix.html`, `mentions.html`, `results.csv` in one step and opens overview.
|
||
- URL deduplication: `node {SKILL_DIR}/scripts/list_urls.mjs /tmp --prefix competitor`.
|
||
- **Subagents must use ONLY the Bash tool.**
|
||
- **Main agent NEVER reads raw discovery JSON batch files.**
|
||
|
||
**CRITICAL — Minimize permission prompts**:
|
||
- Subagents MUST batch ALL file writes into a SINGLE Bash call using chained heredocs.
|
||
- Batch ALL searches and ALL fetches into single Bash calls via `&&` chaining.
|
||
|
||
## Pipeline Overview
|
||
|
||
Follow these 8 steps in order. Do not skip or reorder.
|
||
|
||
1. **User Company Research** — Deeply understand the user's company, produce `precise_category` + `category_include_keywords` + `exclusion_list`
|
||
2. **Depth Mode + Seed Input** — Choose depth, accept optional seed competitor URLs
|
||
3. **Discovery (3 parallel waves)** — Wave A (alternatives), Wave B (precise category), Wave C (comparison-page graph via "X vs Y" title parsing)
|
||
4. **Gate** — `scripts/gate_candidates.mjs` fetches each candidate's hero text (via `browse cloud fetch`) and drops wrong-category URLs
|
||
5. **Confirm enrichment set with the user** — Present PASS / UNKNOWN / rejected-brand-matches via `AskUserQuestion`. User ticks the real ones, adds any the discovery missed. Skipping this step is wasteful because enrichment is expensive (25 subagents × depth budget) and the gate is imperfect (JS-heavy homepages, Cloudflare challenges, semantic-variant taglines)
|
||
6. **Deep Enrichment (5 subagents per competitor in deep/deeper modes)** — Marketing, Discussion, Social, News, Technical — each lane a separate subagent writing to `partials/`; then `merge_partials.mjs` consolidates. In deep/deeper modes, **Step 5d** adds a 6th Battle Card synthesis lane AFTER Step 5c fact-check completes — produces per-competitor Landmines / Objection Handlers / Talk Tracks grounded in cited evidence.
|
||
7. **Screenshots** — `capture_screenshots.mjs` via the `browse` CLI captures a 1280×800 homepage hero per competitor
|
||
8. **HTML Report** — Overview + per-competitor (with embedded hero screenshot + Battle Card card) + matrix + mentions views
|
||
|
||
---
|
||
|
||
## Step 0: Setup Output Directory
|
||
|
||
```bash
|
||
OUTPUT_DIR=~/Desktop/{company_slug}_competitors_{YYYY-MM-DD}
|
||
mkdir -p "$OUTPUT_DIR"
|
||
```
|
||
|
||
Replace `{company_slug}` with the user's company name (lowercase, hyphenated) and `{YYYY-MM-DD}` with today's date. Pass `{OUTPUT_DIR}` as a full literal path to every subagent.
|
||
|
||
Clean up discovery batch files from prior runs:
|
||
```bash
|
||
rm -f /tmp/competitor_discovery_batch_*.json
|
||
```
|
||
|
||
**Re-runs must start from a clean `$OUTPUT_DIR`.** `compile_report.mjs` ingests *every* `{slug}.md` in the directory, and `merge_partials.mjs` only overwrites the slugs in the current set — it never deletes ones dropped from a new enrichment set. Since the directory is keyed by date, a same-day re-run with a different competitor set would leave stale competitors in the overview, matrix, CSV, and screenshots. Either use a fresh directory or clear the prior per-competitor files first:
|
||
```bash
|
||
rm -f "$OUTPUT_DIR"/*.md && rm -rf "$OUTPUT_DIR"/partials "$OUTPUT_DIR"/screenshots
|
||
```
|
||
|
||
## Step 1: User Company Research
|
||
|
||
This step sets the baseline for what "competitor" means AND produces the verified data the Step 5b matrix will use for the `userCompany` row.
|
||
|
||
**Rule**: The user's company gets the same 5-lane research depth as competitors. Do NOT fill `userCompany` in matrix.json from memory — it will ship false claims to the user's own team. On a search-API run (user company Exa, 2026-04-23), skipping this step produced a matrix that claimed Exa had a "published uptime SLA" (there is no numeric public SLA — only a status page) and marked its MIT-licensed Python SDK as `open-source: false` (the repo is github.com/exa-labs/exa-py, LICENSE confirmed MIT). Both errors would have surfaced in the "Where you're winning" card as fabricated moats.
|
||
|
||
Process:
|
||
|
||
1. Ask the user for their company name or URL.
|
||
|
||
2. **Check for an existing profile** at `{SKILL_DIR}/profiles/{company-slug}.json`. If it exists, load it and confirm with the user: "I have your profile from {researched_at}. Still accurate?" — if yes, skip to Step 2 BUT still run the partial-lane enrichment below so matrix synthesis has fresh feature evidence.
|
||
The profile format is shared with `company-research` (same shape). If a user already has a profile saved under `company-research/profiles/`, you may copy it into this skill's profiles directory rather than re-researching.
|
||
|
||
3. **Run the full 5-lane enrichment on the user's company** — identical to the competitor pattern in Step 5. For each lane, spawn a Bash-only subagent that writes to `{OUTPUT_DIR}/partials/{user-slug}.{lane}.md`:
|
||
- **marketing** — tagline, positioning, pricing tiers, features, integrations, open-source components (SDK repos + licenses), regions offered, compliance (SOC 2 / HIPAA / trust portal URL)
|
||
- **technical** — REST + streaming API support (with docs URLs), SDK languages, MCP server URL, neural vs keyword retrieval modes, reranking / highlights / live-crawl specifics, published uptime SLA (actual %, not status page), third-party retrieval-quality benchmarks
|
||
- **discussion**, **social**, **news** — optional in quick mode, recommended in deep+
|
||
See `references/research-patterns.md` → "Self-Research" for sub-questions. Each finding MUST cite a URL.
|
||
|
||
4. Run `merge_partials.mjs` on the user's partials too — produces `{OUTPUT_DIR}/{user-slug}.md`, the canonical source Step 5b reads from for `userCompany` flags.
|
||
|
||
5. Synthesize into a profile: Company, Product, Existing Customers, Competitors (seed list), Use Cases, **precise_category**, **category_include_keywords**, **exclusion_list**. Do NOT include ICP — this skill doesn't need it.
|
||
- `precise_category`: one sentence describing the category. e.g., "AI web search API for agents with neural + keyword retrieval". Avoid vague words like "tools" / "platform".
|
||
- `category_include_keywords`: 8-15 phrases a direct competitor's marketing would likely contain (hero or title). Include semantic variants.
|
||
- `exclusion_list`: phrases that indicate a *different* category — used by the gate to reject false positives (e.g. `antidetect browser`, `scraping api`, `screenshot api`, `residential proxy`).
|
||
See `references/research-patterns.md` → "Synthesis Output" for the exact format and Exa as a worked example.
|
||
|
||
6. Present the profile + the user-company `.md` to the user for confirmation. Do not proceed until confirmed.
|
||
|
||
7. **Save the confirmed profile** to `{SKILL_DIR}/profiles/{company-slug}.json`.
|
||
|
||
## Step 2: Depth Mode + Seed Input
|
||
|
||
Ask clarifying questions via `AskUserQuestion` with checkboxes:
|
||
- **Known competitors?** Text area for URLs/names (optional — discovery will find more).
|
||
- **Depth mode?**
|
||
- `quick` — marketing surface only, many competitors, ~2-3 tool calls each
|
||
- `deep` — + external signal (mentions, reviews, news), ~5-8 tool calls each
|
||
- `deeper` — + public benchmarks + strategic diff vs user's company, ~10-15 tool calls each
|
||
- **Target count?** Rough number of competitors to research (e.g., 10 / 20 / 50).
|
||
|
||
This is the ONLY user interaction. After this, execute silently until the report is ready.
|
||
|
||
| Mode | Research per competitor | Best for |
|
||
|------|--------------------------|----------|
|
||
| `quick` | Lane 1 only (homepage + pricing) | Scanning ~30-50 competitors fast |
|
||
| `deep` | Lanes 1+2 | ~15-25 competitors with external signal |
|
||
| `deeper` | All 4 lanes (+ benchmarks + strategic diff) | ~5-15 competitors with full intel |
|
||
|
||
## Step 3: Discovery (3 parallel waves)
|
||
|
||
**Formula**: `ceil(target_count / 20)` queries per wave. Over-discover ~3x because the gate drops ~40-60%.
|
||
|
||
Evaluation on a search-API run shows all three waves are additive — skip any and you lose real competitors:
|
||
|
||
**Wave A — Generic alternatives** (broad; heavy aggregator noise, filtered out later)
|
||
- `"alternatives to {user_company}"`
|
||
- `"{user_company} competitors"`
|
||
|
||
**Wave B — Precise category** (uses `precise_category` from the profile)
|
||
- `"{precise_category}"` verbatim
|
||
- 2-3 queries composed from the most distinctive tokens (e.g. `"web search api for ai agents"`, `"retrieval API for LLMs"`)
|
||
|
||
**Wave C — Comparison-page graph** (highest precision)
|
||
- `"{user_company} vs"`
|
||
- `"{seed1} vs"`, `"{seed2} vs"`, `"{seed3} vs"` (seeds from the profile's `competitors` list)
|
||
- After the searches, run `scripts/extract_vs_names.mjs` to parse `"X vs Y"` patterns from result titles — this uniquely surfaces competitors that don't appear as URL hits.
|
||
|
||
**Process**:
|
||
1. Issue **3 parallel `browse cloud search` Bash calls** (one per wave) in a SINGLE message — NOT subagents. Each Bash call chains its 2-4 queries with `&&`. See `references/workflow.md` → "Discovery — parallel Bash, not subagents" for the exact recipe. Subagents are too heavy for a workload of 6-12 `browse cloud search` calls.
|
||
2. After all waves complete:
|
||
```bash
|
||
node {SKILL_DIR}/scripts/list_urls.mjs /tmp --prefix competitor > /tmp/competitor_urls.txt
|
||
node {SKILL_DIR}/scripts/extract_vs_names.mjs /tmp --prefix competitor \
|
||
--seed "{user_company},{seed1},{seed2},{seed3}" \
|
||
> /tmp/competitor_vs_names.jsonl
|
||
```
|
||
3. **Filter** `/tmp/competitor_urls.txt` — remove blog posts, news, AI-tool directories (seektool.ai, respan.ai, agentsindex.ai, toolradar.com, aitoolsatlas.ai, vibecodedthis.com, etc.), review aggregators (g2.com, capterra.com), databases (crunchbase.com, tracxn.com), user's own domain. See `references/workflow.md` for the full noise-domain list.
|
||
4. For `vs_names` entries that have a resolved `domain`, add them. For unresolved names, optionally run `browse cloud search "{name}" --num-results 3` and pick the top root domain.
|
||
5. Merge with user-provided seed URLs. Dedup by hostname → `/tmp/competitor_candidates.txt`.
|
||
|
||
## Step 4: Gate (category-fit filter)
|
||
|
||
Drop candidates whose marketing identifies them as a *different* category before enrichment burns tool calls on them.
|
||
|
||
```bash
|
||
cat /tmp/competitor_candidates.txt \
|
||
| node {SKILL_DIR}/scripts/gate_candidates.mjs \
|
||
--include "{profile.category_include_keywords joined with commas}" \
|
||
--exclude "{profile.exclusion_list joined with commas}" \
|
||
--concurrency 6 \
|
||
> /tmp/competitor_gated.jsonl
|
||
|
||
grep '"status":"PASS"' /tmp/competitor_gated.jsonl \
|
||
| node -e 'require("fs").readFileSync(0,"utf-8").split("\n").filter(Boolean).forEach(l => { try { console.log(JSON.parse(l).url); } catch {} })' \
|
||
> /tmp/competitor_passed.txt
|
||
```
|
||
|
||
The gate fetches each candidate's homepage via `browse cloud fetch --allow-redirects --format raw`, extracts the first 800 chars of visible text, and classifies position-aware: exclude in `<title>` → REJECT; include in `<title>` → PASS; hybrid title → hero200 tiebreak; otherwise fall through.
|
||
|
||
**Evaluated on a search-API run** with 12 mixed candidates: 7/7 real competitors passed, 4/4 wrong-category rejected, 1 known-hybrid edge case rejected.
|
||
|
||
## Step 4.5: Confirm enrichment set with the user
|
||
|
||
**This step is mandatory. Do NOT skip to enrichment just because the gate ran.**
|
||
|
||
Enrichment is expensive: 5 competitors × 5 lane-subagents = 25 subagents, ~10-15 minutes of wall clock, ~300 `browse cloud` calls. Running it on the wrong set wastes all of that. The gate also has known blind spots:
|
||
|
||
- **JS-heavy homepages** (e.g. Tavily, Firecrawl) — `browse cloud fetch` returns near-empty text, so keyword matching has nothing to match on → REJECT or UNKNOWN
|
||
- **Cloudflare challenge pages** (e.g. Perplexity) — title becomes "Just a moment..." → no category signal
|
||
- **Semantic variants** — "search foundation" / "retrieval backbone" don't lexically match a list centered on "search API"
|
||
- **Domain ambiguity** — `brave.com` (the browser) vs `api-dashboard.search.brave.com` (the actual API product) can confuse classification
|
||
|
||
The user almost always has domain knowledge the skill lacks. Ask them.
|
||
|
||
**Process** — the main agent:
|
||
|
||
1. Read `/tmp/competitor_gated.jsonl` and group rows:
|
||
- **PASS bucket**: everything with status=PASS.
|
||
- **UNKNOWN bucket**: status=UNKNOWN (fetch failed — always surface, these are the silent misses).
|
||
- **Rejected-brand bucket**: top ~10 REJECT rows whose title mentions a well-known brand pattern (e.g. contains the token from a user-supplied seed list, or appears frequently in the Wave C "X vs Y" graph).
|
||
|
||
2. Present the buckets to the user, one table per bucket, with URL + title + reason (for rejects).
|
||
|
||
3. Use `AskUserQuestion` with a checkbox list of all candidates across the three buckets, plus a free-text "add more" field. The prompt should be explicit:
|
||
> "Here are the gate's picks plus a few it was unsure about. Tick the ones that are real competitors in your space, and paste any URLs I missed (comma-separated). Enrichment will run on ONLY the ticked set."
|
||
|
||
4. Write the confirmed set to `/tmp/competitor_enrichment_set.txt` (one URL per line). This is the input for Step 5 — not `/tmp/competitor_passed.txt`.
|
||
|
||
**If the user doesn't respond** or explicitly says "just run it", fall back to `/tmp/competitor_passed.txt` as-is, but warn in chat that the run may waste budget on wrong-category hits.
|
||
|
||
**Exa test, 2026-04-24**: gate auto-passed 22 of 101 candidates but missed Tavily (generic title), Jina AI (semantic mismatch — "search foundation"), Firecrawl (JS-heavy fetch failure), and Perplexity (Cloudflare challenge). All four are real direct competitors. This step catches them.
|
||
|
||
## Step 5: Deep Enrichment
|
||
|
||
Two modes. See `references/workflow.md` for prompt templates and wave management. See `references/research-patterns.md` for the lane-by-lane methodology.
|
||
|
||
### Quick mode — single subagent per batch
|
||
- Input: `/tmp/competitor_enrichment_set.txt` (user-confirmed set from Step 4.5), ~8 competitors per subagent.
|
||
- One subagent runs Lane A only (marketing surface). 2-3 tool calls each.
|
||
- Writes directly to `{OUTPUT_DIR}/{slug}.md`.
|
||
|
||
### Deep / Deeper mode — 5 subagents PER competitor (parallel lane fan-out)
|
||
For each competitor, launch 5 parallel subagents, one per lane:
|
||
- **A. Marketing** (`marketing`): pricing, features, positioning, integrations, customers, team, funding, HQ. Owns canonical frontmatter.
|
||
- **B. Discussion** (`discussion`): Reddit, HN, forums, Dev.to, Hashnode. Broad queries beyond `site:` — also `"{competitor}" review 2026`, `"{competitor}" issues OR problems`, `"{competitor}" discussion`.
|
||
- **C. Social** (`social`): LinkedIn posts, YouTube videos, Twitter/X. Snippets only — do NOT fetch.
|
||
- **D. News & Comparisons** (`news`): TechCrunch, Verge, VentureBeat, Forbes, Businesswire, Substack, blog reviews. Every mention needs a date.
|
||
- **E. Technical & Benchmarks** (`technical`): GitHub benchmark repos/PRs, performance posts. Writes Benchmarks + technical Findings.
|
||
|
||
Budget per lane: deep = 5-8 tool calls, deeper = 10-15.
|
||
**Launch ALL competitor × lane subagents in a SINGLE Agent tool message.** For 10 competitors × 5 lanes = 50 parallel Agent calls in one message. Do NOT split into batches per competitor or per lane — wall clock collapses to the slowest single agent (~3-5 min). Splitting into 5 rounds of 10 cost 25 minutes of wall clock vs 5 minutes parallel on a real measured run; do not do it.
|
||
|
||
Each subagent writes a partial to `{OUTPUT_DIR}/partials/{slug}.{lane}.md`.
|
||
|
||
**Critical**: Pass the user's company name, product, and key features verbatim into every subagent prompt so the technical lane can do strategic diffing. Pass the full literal `{OUTPUT_DIR}` path to every subagent.
|
||
|
||
### Merge partials → canonical per-competitor file
|
||
After all subagents for all competitors complete:
|
||
```bash
|
||
node {SKILL_DIR}/scripts/merge_partials.mjs {OUTPUT_DIR}
|
||
```
|
||
Unions the 5 partials per competitor into one `{OUTPUT_DIR}/{slug}.md` — dedup'd Mentions (sorted by date desc), dedup'd Benchmarks, merged Findings, canonical frontmatter from the marketing lane.
|
||
|
||
### Synthesize the comparison matrix (write `matrix.json`)
|
||
|
||
**Subagents write `key_features` and `integrations` as prose**, not as pipe-separated atomic feature labels. So a naive `|`-split axis becomes one-blob-per-competitor with no overlap — the rendered matrix shows a useless diagonal.
|
||
|
||
The main agent fixes this by synthesizing a **shared taxonomy** across competitors and writing `{OUTPUT_DIR}/matrix.json`. `compile_report.mjs` auto-detects this file and renders the matrix from it instead of from the pipe split.
|
||
|
||
**Process** — main agent:
|
||
1. Read ALL `{slug}.md` files, INCLUDING the user's company file `{user-slug}.md` produced in Step 1. The user is competitor #0 for matrix purposes — treat with identical rigor.
|
||
2. Produce a canonical list of 12-20 *atomic* features — each must be a yes/no proposition a competitor either has or doesn't (e.g. "MCP server", "SOC 2", "Site crawler", "Reranker"). Avoid sentence-length features. Avoid features only one competitor has.
|
||
3. Produce a canonical list of 10-20 integrations (frameworks, marketplaces, SDK languages).
|
||
4. For each company INCLUDING THE USER, map each taxonomy entry to `true` / `false` based on the enrichment data in their `.md` file. **Every flag must be traceable to a Research Findings bullet with a cited URL.** If the user's file says "exa-py MIT-licensed (github.com/exa-labs/exa-py)", the Open-source feature is `true` with that URL as the source. If not mentioned, leave `false`.
|
||
5. Write the result to `{OUTPUT_DIR}/matrix.json` in this shape:
|
||
```json
|
||
{
|
||
"category": "AI search APIs",
|
||
"features": [{ "name": "Web Search API", "description": "..." }, ...],
|
||
"integrations": [{ "name": "LangChain" }, ...],
|
||
"userCompany": {
|
||
"name": "Exa",
|
||
"winningSummary": "Exa's moats are its first-party neural index and the integrated Research API — no one else in the set ships a semantic/embeddings-native retrieval primitive alongside a multi-step agentic research endpoint. It's also the only provider with a crawler product bundled in, and ties with SerpAPI on breadth of SDK language coverage.",
|
||
"losingSummary": "Exa trails competitors on operational transparency — SerpAPI, Serper, and Tavily all publish hourly throughput SLAs, and Exa lacks a dedicated news endpoint that SerpAPI, Serper, and You.com all ship. Image/visual search is also missing vs 4 of 5 competitors.",
|
||
"features": { "Web Search API": true, "Site crawler": true, ... },
|
||
"integrations": { "LangChain": true, ... }
|
||
},
|
||
"competitors": {
|
||
"tavily": {
|
||
"features": { "Web Search API": true, "Site crawler": true, ... },
|
||
"integrations": { "LangChain": true, "Databricks Marketplace": true, ... }
|
||
},
|
||
"serpapi": { "features": {...}, "integrations": {...} }
|
||
}
|
||
}
|
||
```
|
||
|
||
**`userCompany` is required**. The overview page renders two cards — "Where {user} is winning" and "Where {user} is losing". Populate `userCompany.features` and `userCompany.integrations` from the self-research profile (Step 1). Without this field those two cards don't render.
|
||
|
||
**Write order (two passes — this resolves the apparent ordering tension below).** In this step (5b) write all `features` / `integrations` cells for `userCompany` and every competitor, plus a **draft** `winningSummary` / `losingSummary`. The drafts exist only to tell the Step 5c fact-checker which claims are high-stakes (it prioritizes cells named in the summaries). After Step 5c flips cells on verified evidence, **rewrite** the two summaries so the prose reflects only fact-checked cells. The JSON shape above shows the finalized post-fact-check object.
|
||
|
||
**`userCompany.winningSummary` / `losingSummary` are strongly preferred** (analyst-style prose, 2-4 sentences each). When present, the cards render as paragraphs instead of bulleted lists — reads like a briefing, not a spreadsheet. If absent, the cards fall back to a bulleted list of winning/losing items with who-else-has-it.
|
||
|
||
If this step is skipped, the matrix view falls back to the raw pipe-split axis (useless for atomic comparison) and the strategic summary doesn't render. Do not skip.
|
||
|
||
### Fact-check the matrix — spot-check the high-stakes cells (default)
|
||
|
||
**Do not trust the taxonomy pass alone for high-stakes cells.** It is LLM inference from prose and will hallucinate moats. Observed during a search-API run (2026-04-23): matrix.json claimed SOC 2 was unique to the user's company; verification showed three of the other competitors also have SOC 2 Type II.
|
||
|
||
But verifying every cell is the opposite mistake. A 7-company × 33-axis matrix has 231 cells. The Apr 2026 search-API run got stuck at 111+ tool calls in fact-check before interrupt — the subagent kept going on table-stakes cells (REST API, JSON responses, Python SDK) that are universal in the category.
|
||
|
||
**Default = spot-check, not full sweep.** Only verify cells that meaningfully change the strategic narrative.
|
||
|
||
Launch a single fact-check subagent (Bash-only) with **a hard 25-call budget** that targets ONLY these high-stakes axes:
|
||
|
||
1. **Every `userCompany.features` and `userCompany.integrations` cell** (the user's own moats — these go straight into "Where you're winning" prose). Typical: 17 + 16 = 33 cells, but most are obvious (your own product). Focus on:
|
||
- Anything claimed as a *moat* in `winningSummary`
|
||
- Anything claimed as a *gap* in `losingSummary`
|
||
- Compliance (SOC 2, HIPAA, ISO 27001, GDPR)
|
||
- Open-source license claims (MIT / Apache 2.0 / AGPL — observed wrong on a competitor's SDK)
|
||
- Published uptime SLA (status page ≠ SLA)
|
||
|
||
2. **Across competitors, only the cells that drive the win/loss summary**:
|
||
- For each "Winning" claim, verify the user has it AND verify the competitors don't.
|
||
- For each "Losing" claim, verify the named competitors do have it.
|
||
- Compliance + license + SLA across all competitors (high-trust, frequently wrong).
|
||
|
||
3. **Do NOT verify**:
|
||
- Universal table-stakes (REST API, JSON responses, Python SDK, API-key auth) — every search API has these.
|
||
- `false` cells with no claim being made (no moat lost or won).
|
||
- Integration cells unless they appear in the win/loss summary.
|
||
|
||
```
|
||
You are a matrix spot-check subagent. Budget: 25 browse cloud calls TOTAL across all cells.
|
||
Stop and return what you have when you hit the budget — partial fact-check is
|
||
better than blocking the rest of the pipeline.
|
||
|
||
TOOL RULES: Bash ONLY. browse cloud search + browse cloud fetch. Count your calls; stop at 25.
|
||
|
||
PRIORITY ORDER (highest-stakes first — work down until budget):
|
||
1. Every cell that appears in userCompany.winningSummary or losingSummary
|
||
2. Compliance cells (SOC 2, HIPAA, ISO 27001) for user + every competitor
|
||
3. Open-source / self-hostable + license cells across all competitors
|
||
4. Pricing tier numbers ($X/mo, /hr) for user + competitors named in summaries
|
||
5. Funding / employee_estimate fields (only if cited in summaries)
|
||
|
||
Skip:
|
||
- Universal cells (REST API, JSON responses, Python SDK, API-key auth, etc.)
|
||
- `false` cells where no claim is being made
|
||
- Integration matrix cells unless they appear in summaries
|
||
|
||
For each cell verified:
|
||
- If `true` — find one source URL (docs, trust portal, GitHub LICENSE, etc).
|
||
- If `false` — one targeted browse cloud search. Flip ONLY on first-party evidence.
|
||
|
||
Output: matrix.json with `sources: { "Feature": "https://..." }` on the
|
||
verified cells (other cells stay as-is). Cells-changed log to
|
||
{OUTPUT_DIR}/matrix_fact_check.md with each flip + URL + quoted evidence.
|
||
Report back: "spot-check: N cells verified, M flipped, B/25 budget used".
|
||
```
|
||
|
||
**Full-sweep mode (opt-in, slower)**: if the user explicitly says "full fact check" or for a high-stakes deliverable (board deck, press release), set the budget to 80 calls and verify every non-universal cell. Default is spot-check.
|
||
|
||
After the subagent completes, re-read matrix.json, recompile, and surface `matrix_fact_check.md` delta to the user. The summary is much more trustworthy with spot-check than without — and ships in 3-5 minutes instead of stalling the pipeline.
|
||
|
||
### Step 5d: Battle Card synthesis (deep/deeper only, after Step 5c)
|
||
|
||
**Depends on fact-checked matrix.json from Step 5c.** This is a sales-enablement lane. For each competitor, launch a Bash-only synthesis subagent (no new `browse cloud` calls) that reads all 5 existing partials + the user's merged `.md` + fact-checked `matrix.json`, and produces per-competitor Landmines / Objection Handlers / Talk Tracks grounded in cited evidence.
|
||
|
||
Prompt template: `references/battle-card-subagent.md` (substitute `{COMPETITOR_SLUG}` / `{COMPETITOR_NAME}` / `{USER_COMPANY_NAME}` / `{USER_WINNING_SUMMARY}` per competitor). Format spec: `references/battle-card.md`.
|
||
|
||
Output: `{OUTPUT_DIR}/partials/{slug}.battle.md` with a `## Battle Card` section.
|
||
|
||
**Re-run the merge after this lane completes.** The Step 5 merge ran *before* the battle partials existed, so the consolidated `{slug}.md` files don't contain them yet. Re-run:
|
||
```bash
|
||
node {SKILL_DIR}/scripts/merge_partials.mjs {OUTPUT_DIR}
|
||
```
|
||
This unions each `{slug}.battle.md` into its consolidated `{slug}.md` (the `battle` lane is already handled by `merge_partials.mjs`). `compile_report.mjs` reads the `## Battle Card` section from `{slug}.md` and renders it as a brand-accented card on the per-competitor HTML page. **Skip this re-merge and the battle cards never appear in the report.**
|
||
|
||
**Why this lane is synthesis-only** — battle cards must be grounded in facts that already survived Step 5c. Letting the subagent do fresh `browse cloud` searches would reintroduce the hallucinated-moat problem the fact-check step exists to prevent. The subagent's adversarial self-check explicitly rejects claims not traceable to an input partial bullet or a `sources`-backed matrix cell.
|
||
|
||
Parallelism: 1 subagent per competitor, all in one Agent-tool message (synthesis is fast, ~3-5 Bash calls per subagent). Skip this step in `quick` mode — there isn't enough research depth to ground the cards credibly.
|
||
|
||
## Step 6: Screenshots
|
||
|
||
Capture a homepage hero screenshot per competitor:
|
||
```bash
|
||
node {SKILL_DIR}/scripts/capture_screenshots.mjs {OUTPUT_DIR} --mode remote
|
||
```
|
||
|
||
Uses the `browse` CLI (`npm install -g browse`). The `--mode` flag selects the browser session: `remote` (default) drives a Browserbase session — best for protected/bot-detecting homepages and the only option without local Chrome; `local` uses Chrome on your machine. The script passes the corresponding `--remote` / `--local` flag on each `browse` command, so there is no separate environment-config step to run. Writes one PNG per competitor to `{OUTPUT_DIR}/screenshots/{slug}-hero.png`. The compile step in Step 7 auto-embeds the hero on each per-competitor HTML page.
|
||
|
||
Cost: ~10-20s per competitor. ~60s for 5 competitors.
|
||
|
||
## Step 7: HTML Report
|
||
|
||
1. **Generate all views + CSV** (opens overview in browser):
|
||
```bash
|
||
node {SKILL_DIR}/scripts/compile_report.mjs {OUTPUT_DIR} --user-company "{user_company}" --open
|
||
```
|
||
Produces:
|
||
- `{OUTPUT_DIR}/index.html` — overview: competitor table with tagline, pricing summary, key features, strategic diff
|
||
- `{OUTPUT_DIR}/competitors/{slug}.html` — per-competitor deep dive (all sections)
|
||
- `{OUTPUT_DIR}/matrix.html` — side-by-side feature/pricing matrix
|
||
- `{OUTPUT_DIR}/mentions.html` — chronological feed with source-type pills + client-side filter
|
||
- `{OUTPUT_DIR}/results.csv` — flat spreadsheet
|
||
|
||
2. **Present a chat summary**:
|
||
|
||
```
|
||
## Competitor Analysis Complete
|
||
|
||
- **Competitors researched**: {count}
|
||
- **Depth mode**: {mode}
|
||
- **Mentions collected**: {total mentions} across {source types count} source types
|
||
- **Public benchmarks found**: {count}
|
||
- **Opened in browser**: ~/Desktop/{company_slug}_competitors_{date}/index.html
|
||
```
|
||
|
||
3. Show the **overview table** in chat:
|
||
|
||
```
|
||
| Competitor | Positioning | Pricing | Key Features | Strategic Diff |
|
||
|------------|-------------|---------|--------------|----------------|
|
||
| Rival Co | AI-native web search API | $99/mo entry | semantic search, reranking, crawler | Similar retrieval; cheaper entry |
|
||
```
|
||
|
||
4. Call out the top 3-5 most interesting findings — e.g., "3 competitors have public benchmarks; Rival Co is cheapest; Foo Inc launched a dedicated news-search endpoint 2 weeks ago." Offer to dig deeper into any specific competitor or re-run with different depth.
|
||
|
||
|
||
## Limitations
|
||
|
||
- Requires the upstream tool, account, API key, or local setup when the workflow names one.
|
||
- Does not authorize destructive, production, paid, or external-message actions without explicit user approval.
|
||
- Validate generated artifacts or recommendations against the user's real sources before treating them as final.
|