180 lines
8.4 KiB
Markdown
180 lines
8.4 KiB
Markdown
---
|
|
name: seo-technical
|
|
description: "Audit technical SEO across crawlability, indexability, security, URLs, mobile, Core Web Vitals, structured data, JavaScript rendering, and related platform signals like robots.txt and AI crawler access."
|
|
risk: unknown
|
|
source: "https://github.com/AgriciDaniel/claude-seo"
|
|
date_added: "2026-03-21"
|
|
user-invokable: true
|
|
argument-hint: "[url]"
|
|
allowed-tools:
|
|
- Read
|
|
- Grep
|
|
- Glob
|
|
- Bash
|
|
- WebFetch
|
|
---
|
|
|
|
# Technical SEO Audit
|
|
|
|
## When to Use
|
|
- Use when the user wants a technical SEO review focused on crawlability, indexability, performance, or rendering.
|
|
- Use when auditing robots.txt, canonicalization, JavaScript SEO, Core Web Vitals, or AI crawler access.
|
|
- Use when the task is infrastructure- and implementation-oriented rather than content-focused.
|
|
|
|
## Categories
|
|
|
|
### 1. Crawlability
|
|
- robots.txt: exists, valid, not blocking important resources
|
|
- XML sitemap: exists, referenced in robots.txt, valid format
|
|
- Noindex tags: intentional vs accidental
|
|
- Crawl depth: important pages within 3 clicks of homepage
|
|
- JavaScript rendering: check if critical content requires JS execution
|
|
- Crawl budget: for large sites (>10k pages), efficiency matters
|
|
|
|
#### AI Crawler Management
|
|
|
|
As of 2025-2026, AI companies actively crawl the web to train models and power AI search. Managing these crawlers via robots.txt is a critical technical SEO consideration.
|
|
|
|
**Known AI crawlers:**
|
|
|
|
| Crawler | Company | robots.txt token | Purpose |
|
|
|---------|---------|-----------------|---------|
|
|
| GPTBot | OpenAI | `GPTBot` | Model training |
|
|
| ChatGPT-User | OpenAI | `ChatGPT-User` | Real-time browsing |
|
|
| ClaudeBot | Anthropic | `ClaudeBot` | Model training |
|
|
| PerplexityBot | Perplexity | `PerplexityBot` | Search index + training |
|
|
| Bytespider | ByteDance | `Bytespider` | Model training |
|
|
| Google-Extended | Google | `Google-Extended` | Gemini training (NOT search) |
|
|
| CCBot | Common Crawl | `CCBot` | Open dataset |
|
|
|
|
**Key distinctions:**
|
|
- Blocking `Google-Extended` prevents Gemini training use but does NOT affect Google Search indexing or AI Overviews (those use `Googlebot`)
|
|
- Blocking `GPTBot` prevents OpenAI training but does NOT prevent ChatGPT from citing your content via browsing (`ChatGPT-User`)
|
|
- ~3-5% of websites now use AI-specific robots.txt rules
|
|
|
|
**Example, selective AI crawler blocking:**
|
|
```
|
|
# Allow search indexing, block AI training crawlers
|
|
User-agent: GPTBot
|
|
Disallow: /
|
|
|
|
User-agent: Google-Extended
|
|
Disallow: /
|
|
|
|
User-agent: Bytespider
|
|
Disallow: /
|
|
|
|
# Allow all other crawlers (including Googlebot for search)
|
|
User-agent: *
|
|
Allow: /
|
|
```
|
|
|
|
**Recommendation:** Consider your AI visibility strategy before blocking. Being cited by AI systems drives brand awareness and referral traffic. Cross-reference the `seo-geo` skill for full AI visibility optimization.
|
|
|
|
### 2. Indexability
|
|
- Canonical tags: self-referencing, no conflicts with noindex
|
|
- Duplicate content: near-duplicates, parameter URLs, www vs non-www
|
|
- Thin content: pages below minimum word counts per type
|
|
- Pagination: rel=next/prev or load-more pattern
|
|
- Hreflang: correct for multi-language/multi-region sites
|
|
- Index bloat: unnecessary pages consuming crawl budget
|
|
|
|
### 3. Security
|
|
- HTTPS: enforced, valid SSL certificate, no mixed content
|
|
- Security headers:
|
|
- Content-Security-Policy (CSP)
|
|
- Strict-Transport-Security (HSTS)
|
|
- X-Frame-Options
|
|
- X-Content-Type-Options
|
|
- Referrer-Policy
|
|
- HSTS preload: check preload list inclusion for high-security sites
|
|
|
|
### 4. URL Structure
|
|
- Clean URLs: descriptive, hyphenated, no query parameters for content
|
|
- Hierarchy: logical folder structure reflecting site architecture
|
|
- Redirects: no chains (max 1 hop), 301 for permanent moves
|
|
- URL length: flag >100 characters
|
|
- Trailing slashes: consistent usage
|
|
|
|
### 5. Mobile Optimization
|
|
- Responsive design: viewport meta tag, responsive CSS
|
|
- Touch targets: minimum 48x48px with 8px spacing
|
|
- Font size: minimum 16px base
|
|
- No horizontal scroll
|
|
- Mobile-first indexing: Google indexes mobile version. **Mobile-first indexing is 100% complete as of July 5, 2024.** Google now crawls and indexes ALL websites exclusively with the mobile Googlebot user-agent.
|
|
|
|
### 6. Core Web Vitals
|
|
- **LCP** (Largest Contentful Paint): target <2.5s
|
|
- **INP** (Interaction to Next Paint): target <200ms
|
|
- INP replaced FID on March 12, 2024. FID was fully removed from all Chrome tools (CrUX API, PageSpeed Insights, Lighthouse) on September 9, 2024. Do NOT reference FID anywhere.
|
|
- **CLS** (Cumulative Layout Shift): target <0.1
|
|
- Evaluation uses 75th percentile of real user data
|
|
- Use PageSpeed Insights API or CrUX data if MCP available
|
|
|
|
### 7. Structured Data
|
|
- Detection: JSON-LD (preferred), Microdata, RDFa
|
|
- Validation against Google's supported types
|
|
- See seo-schema skill for full analysis
|
|
|
|
### 8. JavaScript Rendering
|
|
- Check if content visible in initial HTML vs requires JS
|
|
- Identify client-side rendered (CSR) vs server-side rendered (SSR)
|
|
- Flag SPA frameworks (React, Vue, Angular) that may cause indexing issues
|
|
- Verify dynamic rendering setup if applicable
|
|
|
|
#### JavaScript SEO: Canonical & Indexing Guidance (December 2025)
|
|
|
|
Google updated its JavaScript SEO documentation in December 2025 with critical clarifications:
|
|
|
|
1. **Canonical conflicts:** If a canonical tag in raw HTML differs from one injected by JavaScript, Google may use EITHER one. Ensure canonical tags are identical between server-rendered HTML and JS-rendered output.
|
|
2. **noindex with JavaScript:** If raw HTML contains `<meta name="robots" content="noindex">` but JavaScript removes it, Google MAY still honor the noindex from raw HTML. Serve correct robots directives in the initial HTML response.
|
|
3. **Non-200 status codes:** Google does NOT render JavaScript on pages returning non-200 HTTP status codes. Any content or meta tags injected via JS on error pages will be invisible to Googlebot.
|
|
4. **Structured data in JavaScript:** Product, Article, and other structured data injected via JS may face delayed processing. For time-sensitive structured data (especially e-commerce Product markup), include it in the initial server-rendered HTML.
|
|
|
|
**Best practice:** Serve critical SEO elements (canonical, meta robots, structured data, title, meta description) in the initial server-rendered HTML rather than relying on JavaScript injection.
|
|
|
|
### 9. IndexNow Protocol
|
|
- Check if site supports IndexNow for Bing, Yandex, Naver
|
|
- Supported by search engines other than Google
|
|
- Recommend implementation for faster indexing on non-Google engines
|
|
|
|
## Output
|
|
|
|
### Technical Score: XX/100
|
|
|
|
### Category Breakdown
|
|
| Category | Status | Score |
|
|
|----------|--------|-------|
|
|
| Crawlability | pass/warn/fail | XX/100 |
|
|
| Indexability | pass/warn/fail | XX/100 |
|
|
| Security | pass/warn/fail | XX/100 |
|
|
| URL Structure | pass/warn/fail | XX/100 |
|
|
| Mobile | pass/warn/fail | XX/100 |
|
|
| Core Web Vitals | pass/warn/fail | XX/100 |
|
|
| Structured Data | pass/warn/fail | XX/100 |
|
|
| JS Rendering | pass/warn/fail | XX/100 |
|
|
| IndexNow | pass/warn/fail | XX/100 |
|
|
|
|
### Critical Issues (fix immediately)
|
|
### High Priority (fix within 1 week)
|
|
### Medium Priority (fix within 1 month)
|
|
### Low Priority (backlog)
|
|
|
|
## DataForSEO Integration (Optional)
|
|
|
|
If DataForSEO MCP tools are available, use `on_page_instant_pages` for real page analysis (status codes, page timing, broken links, on-page checks), `on_page_lighthouse` for Lighthouse audits (performance, accessibility, SEO scores), and `domain_analytics_technologies_domain_technologies` for technology stack detection.
|
|
|
|
## Error Handling
|
|
|
|
| Scenario | Action |
|
|
|----------|--------|
|
|
| URL unreachable | Report connection error with status code. Suggest verifying URL, checking DNS resolution, and confirming the site is publicly accessible. |
|
|
| robots.txt not found | Note that no robots.txt was detected at the root domain. Recommend creating one with appropriate directives. Continue audit on remaining categories. |
|
|
| HTTPS not configured | Flag as a critical issue. Report whether HTTP is served without redirect, mixed content exists, or SSL certificate is missing/expired. |
|
|
| Core Web Vitals data unavailable | Note that CrUX data is not available (common for low-traffic sites). Suggest using Lighthouse lab data as a proxy and recommend increasing traffic before re-testing. |
|
|
|
|
## Limitations
|
|
- Use this skill only when the task clearly matches the scope described above.
|
|
- Do not treat the output as a substitute for environment-specific validation, testing, or expert review.
|
|
- Stop and ask for clarification if required inputs, permissions, safety boundaries, or success criteria are missing.
|