189 lines
9.2 KiB
Markdown
189 lines
9.2 KiB
Markdown
---
|
||
name: seo-programmatic
|
||
description: "Plan and audit programmatic SEO pages generated at scale from structured data. Use when designing templates, URL systems, internal linking, quality gates, and index-bloat safeguards for pages at scale."
|
||
risk: unknown
|
||
source: "https://github.com/AgriciDaniel/claude-seo"
|
||
date_added: "2026-03-21"
|
||
user-invokable: true
|
||
argument-hint: "[url or plan]"
|
||
allowed-tools:
|
||
- Read
|
||
- Grep
|
||
- Glob
|
||
- Bash
|
||
- WebFetch
|
||
- Write
|
||
---
|
||
|
||
# Programmatic SEO Analysis & Planning
|
||
|
||
Build and audit SEO pages generated at scale from structured data sources.
|
||
Enforces quality gates to prevent thin content penalties and index bloat.
|
||
|
||
## When to Use
|
||
- Use when the user wants programmatic SEO planning or review.
|
||
- Use when designing templates, data-driven pages, or scalable URL systems.
|
||
- Use when preventing thin content and index bloat across large page sets.
|
||
|
||
## Data Source Assessment
|
||
|
||
Evaluate the data powering programmatic pages:
|
||
- **CSV/JSON files**: Row count, column uniqueness, missing values
|
||
- **API endpoints**: Response structure, data freshness, rate limits
|
||
- **Database queries**: Record count, field completeness, update frequency
|
||
- Data quality checks:
|
||
- Each record must have enough unique attributes to generate distinct content
|
||
- Flag duplicate or near-duplicate records (>80% field overlap)
|
||
- Verify data freshness; stale data produces stale pages
|
||
|
||
## Template Engine Planning
|
||
|
||
Design templates that produce unique, valuable pages:
|
||
- **Variable injection points**: Title, H1, body sections, meta description, schema
|
||
- **Content blocks**: Static (shared across pages) vs dynamic (unique per page)
|
||
- **Conditional logic**: Show/hide sections based on data availability
|
||
- **Supplementary content**: Related items, contextual tips, user-generated content
|
||
- Template review checklist:
|
||
- Each page must read as a standalone, valuable resource
|
||
- No "mad-libs" patterns (just swapping city/product names in identical text)
|
||
- Dynamic sections must add genuine information, not just keyword variations
|
||
|
||
## URL Pattern Strategy
|
||
|
||
### Common Patterns
|
||
- `/tools/[tool-name]`: Tool/product directory pages
|
||
- `/[city]/[service]`: Location + service pages
|
||
- `/integrations/[platform]`: Integration landing pages
|
||
- `/glossary/[term]`: Definition/reference pages
|
||
- `/templates/[template-name]`: Downloadable template pages
|
||
|
||
### URL Rules
|
||
- Lowercase, hyphenated slugs derived from data
|
||
- Logical hierarchy reflecting site architecture
|
||
- No duplicate slugs; enforce uniqueness at generation time
|
||
- Keep URLs under 100 characters
|
||
- No query parameters for primary content URLs
|
||
- Consistent trailing slash usage (match existing site pattern)
|
||
|
||
## Internal Linking Automation
|
||
|
||
- **Hub/spoke model**: Category hub pages linking to individual programmatic pages
|
||
- **Related items**: Auto-link to 3-5 related pages based on data attributes
|
||
- **Breadcrumbs**: Generate BreadcrumbList schema from URL hierarchy
|
||
- **Cross-linking**: Link between programmatic pages sharing attributes (same category, same city, same feature)
|
||
- **Anchor text**: Use descriptive, varied anchor text. Avoid exact-match keyword repetition
|
||
- Link density: 3-5 internal links per 1000 words (match seo-content guidelines)
|
||
|
||
## Thin Content Safeguards
|
||
|
||
### Quality Gates
|
||
|
||
| Metric | Threshold | Action |
|
||
|--------|-----------|--------|
|
||
| Pages without content review | 100+ | ⚠️ WARNING: require content audit before publishing |
|
||
| Pages without justification | 500+ | 🛑 HARD STOP: require explicit user approval and thin content audit |
|
||
| Unique content per page | <40% | ❌ Flag as thin content (likely penalty risk) |
|
||
| Word count per page | <300 | ⚠️ Flag for review (may lack sufficient value) |
|
||
|
||
### Scaled Content Abuse: Enforcement Context (2025-2026)
|
||
|
||
Google's Scaled Content Abuse policy (introduced March 2024) saw major enforcement escalation in 2025:
|
||
|
||
- **June 2025:** Wave of manual actions targeting websites with AI-generated content at scale
|
||
- **August 2025:** SpamBrain spam update enhanced pattern detection for AI-generated link schemes and content farms
|
||
- **Result:** Google reported 45% reduction in low-quality, unoriginal content in search results post-March 2024 enforcement
|
||
|
||
**Enhanced quality gates for programmatic pages:**
|
||
- **Content differentiation:** ≥30-40% of content must be genuinely unique between any two programmatic pages (not just city/keyword string replacement)
|
||
- **Human review:** Minimum 5-10% sample review of generated pages before publishing
|
||
- **Progressive rollout:** Publish in batches of 50-100 pages. Monitor indexing and rankings for 2-4 weeks before expanding. Never publish 500+ programmatic pages simultaneously without explicit quality review.
|
||
- **Standalone value test:** Each page should pass: "Would this page be worth publishing even if no other similar pages existed?"
|
||
- **Site reputation abuse:** If publishing programmatic content under a high-authority domain (not your own), this may trigger site reputation abuse penalties. Google began enforcing this aggressively in November 2024.
|
||
|
||
> **Recommendation:** The WARNING gate at `<40% unique content` remains appropriate. Consider a HARD STOP at `<30%` unique content to prevent scaled content abuse risk.
|
||
|
||
### Safe Programmatic Pages (OK at scale)
|
||
✅ Integration pages (with real setup docs, API details, screenshots)
|
||
✅ Template/tool pages (with downloadable content, usage instructions)
|
||
✅ Glossary pages (200+ word definitions with examples, related terms)
|
||
✅ Product pages (unique specs, reviews, comparison data)
|
||
✅ Data-driven pages (unique statistics, charts, analysis per record)
|
||
|
||
### Penalty Risk (avoid at scale)
|
||
❌ Location pages with only city name swapped in identical text
|
||
❌ "Best [tool] for [industry]" without industry-specific value
|
||
❌ "[Competitor] alternative" without real comparison data
|
||
❌ AI-generated pages without human review and unique value-add
|
||
❌ Pages where >60% of content is shared template boilerplate
|
||
|
||
### Uniqueness Calculation
|
||
Unique content % = (words unique to this page) / (total words on page) × 100
|
||
|
||
Measure against all other pages in the programmatic set. Shared headers, footers, and navigation are excluded from the calculation. Template boilerplate text IS included.
|
||
|
||
## Canonical Strategy
|
||
|
||
- Every programmatic page must have a self-referencing canonical tag
|
||
- Parameter variations (sort, filter, pagination) canonical to the base URL
|
||
- Paginated series: canonical to page 1 or use rel=next/prev
|
||
- If programmatic pages overlap with manual pages, the manual page is canonical
|
||
- No canonical to a different domain unless intentional cross-domain setup
|
||
|
||
## Sitemap Integration
|
||
|
||
- Auto-generate sitemap entries for all programmatic pages
|
||
- Split at 50,000 URLs per sitemap file (protocol limit)
|
||
- Use sitemap index if multiple sitemap files needed
|
||
- `<lastmod>` reflects actual data update timestamp (not generation time)
|
||
- Exclude noindexed programmatic pages from sitemap
|
||
- Register sitemap in robots.txt
|
||
- Update sitemap dynamically as new records are added to data source
|
||
|
||
## Index Bloat Prevention
|
||
|
||
- **Noindex low-value pages**: Pages that don't meet quality gates
|
||
- **Pagination**: Noindex paginated results beyond page 1 (or use rel=next/prev)
|
||
- **Faceted navigation**: Noindex filtered views, canonical to base category
|
||
- **Crawl budget**: For sites with >10k programmatic pages, monitor crawl stats in Search Console
|
||
- **Thin page consolidation**: Merge records with insufficient data into aggregated pages
|
||
- **Regular audits**: Monthly review of indexed page count vs intended count
|
||
|
||
## Output
|
||
|
||
### Programmatic SEO Score: XX/100
|
||
|
||
### Assessment Summary
|
||
| Category | Status | Score |
|
||
|----------|--------|-------|
|
||
| Data Quality | ✅/⚠️/❌ | XX/100 |
|
||
| Template Uniqueness | ✅/⚠️/❌ | XX/100 |
|
||
| URL Structure | ✅/⚠️/❌ | XX/100 |
|
||
| Internal Linking | ✅/⚠️/❌ | XX/100 |
|
||
| Thin Content Risk | ✅/⚠️/❌ | XX/100 |
|
||
| Index Management | ✅/⚠️/❌ | XX/100 |
|
||
|
||
### Critical Issues (fix immediately)
|
||
### High Priority (fix within 1 week)
|
||
### Medium Priority (fix within 1 month)
|
||
### Low Priority (backlog)
|
||
|
||
### Recommendations
|
||
- Data source improvements
|
||
- Template modifications
|
||
- URL pattern adjustments
|
||
- Quality gate compliance actions
|
||
|
||
## Error Handling
|
||
|
||
| Scenario | Action |
|
||
|----------|--------|
|
||
| URL unreachable | Report connection error with status code. Suggest verifying URL accessibility and checking for authentication requirements. |
|
||
| No programmatic pages detected | Inform user that no template-generated or data-driven page patterns were found. Suggest checking if pages use client-side rendering or if the URL points to the correct section. |
|
||
| Thin content threshold exceeded | Trigger quality gate warning. Report the unique content percentage and flag pages below 40% uniqueness. Require user acknowledgment before proceeding. |
|
||
| Quality gate violation | Halt analysis at the HARD STOP threshold (500+ pages without justification or <30% unique content). Present findings and require explicit user approval to continue. |
|
||
|
||
## Limitations
|
||
- Use this skill only when the task clearly matches the scope described above.
|
||
- Do not treat the output as a substitute for environment-specific validation, testing, or expert review.
|
||
- Stop and ask for clarification if required inputs, permissions, safety boundaries, or success criteria are missing.
|