264 lines
7.6 KiB
Markdown
264 lines
7.6 KiB
Markdown
---
|
||
name: nextjs-seo-indexing
|
||
description: "Fix SEO indexing issues, crawl budget problems, and Search Console coverage errors for Next.js apps. Covers canonical tags, noindex audits, sitemap health, static rendering, and internal linking."
|
||
category: seo
|
||
risk: safe
|
||
source: self
|
||
source_type: self
|
||
date_added: "2026-05-31"
|
||
author: Whoisabhishekadhikari
|
||
tags: [seo, indexing, nextjs, search-console, crawl-budget, canonical, sitemap]
|
||
tools: [claude, cursor, gemini, claude-code]
|
||
version: 1.0.0
|
||
---
|
||
|
||
# Next.js SEO Indexing & Crawl Budget Skill
|
||
|
||
Fix Google Search Console coverage issues, canonical problems, sitemap errors, and crawl budget waste in Next.js apps.
|
||
|
||
---
|
||
|
||
## When to Use
|
||
|
||
- Use when a Next.js site has Google Search Console coverage issues such as duplicate canonicals, accidental noindex, crawl waste, or discovered-but-not-indexed URLs.
|
||
- Use when auditing sitemap, robots.txt, redirect, internal-linking, or static-rendering problems before an SEO release.
|
||
- Use when you need framework-specific examples for Next.js App Router metadata, `generateMetadata`, `robots.js`, and sitemap routes.
|
||
|
||
---
|
||
|
||
## Understanding Search Console Coverage States
|
||
|
||
| Status | Meaning | Fix |
|
||
|--------|---------|-----|
|
||
| Crawled – not indexed | Google crawled but chose not to index | Improve content quality + canonical + internal links |
|
||
| Duplicate without canonical | Multiple URLs serve same content, no canonical | Add explicit canonical to the preferred URL |
|
||
| Excluded by noindex | `noindex` tag present | Remove noindex if page should be indexed |
|
||
| Duplicate, Google chose different canonical | Google prefers a different URL than you specified | Align canonical with the URL Google naturally picks |
|
||
| Alternative page with proper canonical | Correct — non-preferred duplicate pointing to canonical | Expected behavior, not a problem |
|
||
| Not found 404 | Page deleted or URL changed | Add redirect or restore page |
|
||
| Discovered – not indexed | Google knows it exists but hasn't crawled it | Improve internal linking + crawl budget |
|
||
| Page with redirect | Redirect chain or redirect to wrong target | Shorten redirect chain, verify destination |
|
||
|
||
---
|
||
|
||
## Step 1 — Canonical Audit
|
||
|
||
### Next.js App Router (metadata export)
|
||
```js
|
||
// app/blog/my-post/page.js
|
||
export const metadata = {
|
||
title: 'My Post Title',
|
||
alternates: {
|
||
canonical: 'https://www.yourdomain.com/blog/my-post',
|
||
},
|
||
};
|
||
```
|
||
|
||
### Next.js App Router (generateMetadata)
|
||
```js
|
||
export async function generateMetadata({ params }) {
|
||
return {
|
||
alternates: {
|
||
canonical: `https://www.yourdomain.com/blog/${params.slug}`,
|
||
},
|
||
};
|
||
}
|
||
```
|
||
|
||
### Common canonical mistakes to fix:
|
||
```js
|
||
// ❌ WRONG — relative URL
|
||
canonical: '/blog/my-post'
|
||
|
||
// ❌ WRONG — missing trailing slash inconsistency
|
||
// (pick one and stick with it sitewide)
|
||
|
||
// ✓ CORRECT — absolute URL, consistent scheme + subdomain
|
||
canonical: 'https://www.yourdomain.com/blog/my-post'
|
||
```
|
||
|
||
---
|
||
|
||
## Step 2 — Noindex Audit
|
||
|
||
Find pages that are accidentally noindexed:
|
||
|
||
```bash
|
||
# Search for noindex in metadata
|
||
grep -r "noindex\|robots.*noindex" --include="*.{js,ts,jsx,tsx}" app/ pages/ -l
|
||
|
||
# Check layout.js — a noindex here affects ALL pages
|
||
grep -n "robots" app/layout.js
|
||
```
|
||
|
||
In Next.js App Router, `robots` in the root layout applies globally. Only set it there if you want the whole site affected.
|
||
|
||
```js
|
||
// app/layout.js — only set robots if you need sitewide control
|
||
export const metadata = {
|
||
// ✓ Allow indexing
|
||
robots: { index: true, follow: true },
|
||
// ❌ This would noindex the entire site:
|
||
// robots: { index: false }
|
||
};
|
||
```
|
||
|
||
---
|
||
|
||
## Step 3 — Sitemap Health
|
||
|
||
### Verify sitemap routes return 200 + valid XML
|
||
```bash
|
||
curl -sI https://www.yourdomain.com/sitemap.xml | grep -i "content-type\|status"
|
||
curl -s https://www.yourdomain.com/sitemap.xml | head -20
|
||
```
|
||
|
||
### Next.js App Router sitemap (recommended pattern)
|
||
```js
|
||
// app/sitemap.js
|
||
export default async function sitemap() {
|
||
const baseUrl = 'https://www.yourdomain.com';
|
||
|
||
// Static pages
|
||
const staticPages = [
|
||
{ url: baseUrl, lastModified: new Date(), changeFrequency: 'daily', priority: 1.0 },
|
||
{ url: `${baseUrl}/about`, lastModified: new Date(), changeFrequency: 'monthly', priority: 0.8 },
|
||
];
|
||
|
||
// Dynamic pages (fetch from DB or CMS)
|
||
const posts = await getPosts(); // your data fetch
|
||
const dynamicPages = posts.map(post => ({
|
||
url: `${baseUrl}/blog/${post.slug}`,
|
||
lastModified: new Date(post.updatedAt),
|
||
changeFrequency: 'weekly',
|
||
priority: 0.7,
|
||
}));
|
||
|
||
return [...staticPages, ...dynamicPages];
|
||
}
|
||
```
|
||
|
||
### Multiple sitemaps (sitemap index)
|
||
```js
|
||
// app/sitemap-tools/sitemap.js
|
||
// app/sitemap-blog/sitemap.js
|
||
// Each returns an array of URL entries
|
||
```
|
||
|
||
---
|
||
|
||
## Step 4 — Static Rendering Verification
|
||
|
||
Pages must be statically generated (or SSR with metadata in HTML) for Google to see SEO tags.
|
||
|
||
```bash
|
||
# Check build output — pages should show ● (static) not λ (dynamic)
|
||
npm run build 2>&1 | grep -E "○|●|λ|/blog|/tools"
|
||
```
|
||
|
||
```
|
||
○ /about (static)
|
||
● /blog/[slug] (SSG) ← good
|
||
λ /api/data (serverless) ← expected for APIs
|
||
```
|
||
|
||
If important pages are `λ` (fully dynamic with no static generation), add:
|
||
|
||
```js
|
||
// app/blog/[slug]/page.js
|
||
export async function generateStaticParams() {
|
||
const posts = await getPosts();
|
||
return posts.map(post => ({ slug: post.slug }));
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## Step 5 — Internal Linking Audit
|
||
|
||
Pages with zero internal links are rarely indexed. Every important page should be reachable from:
|
||
1. Homepage or navigation
|
||
2. A sitemap
|
||
3. At least one other content page
|
||
|
||
```bash
|
||
# Find pages that have no inbound links from other pages
|
||
# (manual check — grep for the slug across all files)
|
||
grep -r "/blog/my-orphan-post" --include="*.{js,ts,jsx,tsx,md}" . | grep -v "sitemap\|the-page-itself"
|
||
```
|
||
|
||
---
|
||
|
||
## Step 6 — Redirect Audit
|
||
|
||
```bash
|
||
# Find all redirects in Next.js config
|
||
grep -A 3 "redirects" next.config.js
|
||
|
||
# Check for redirect chains (A → B → C — should be A → C)
|
||
# Test a suspected chain:
|
||
curl -sI https://www.yourdomain.com/old-url | grep -i location
|
||
```
|
||
|
||
```js
|
||
// next.config.js — keep redirects flat (no chains)
|
||
async redirects() {
|
||
return [
|
||
{
|
||
source: '/old-url',
|
||
destination: '/new-url', // Must NOT itself redirect
|
||
permanent: true, // 308 for SEO
|
||
},
|
||
];
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## Step 7 — robots.txt Check
|
||
|
||
```bash
|
||
curl -s https://www.yourdomain.com/robots.txt
|
||
```
|
||
|
||
```
|
||
# ✓ Good
|
||
User-agent: *
|
||
Allow: /
|
||
Sitemap: https://www.yourdomain.com/sitemap.xml
|
||
|
||
# ❌ Bad — disallows crawling of important content
|
||
Disallow: /blog/
|
||
Disallow: /tools/
|
||
```
|
||
|
||
```js
|
||
// app/robots.js (Next.js App Router)
|
||
export default function robots() {
|
||
return {
|
||
rules: { userAgent: '*', allow: '/' },
|
||
sitemap: 'https://www.yourdomain.com/sitemap.xml',
|
||
};
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## Indexing Checklist
|
||
|
||
- [ ] All important pages have absolute canonical URLs
|
||
- [ ] No important pages accidentally noindexed
|
||
- [ ] Sitemap routes return 200 with valid XML
|
||
- [ ] Sitemap submitted to Google Search Console
|
||
- [ ] Important pages statically generated (●) in build output
|
||
- [ ] No redirect chains (A→B→C should be A→C)
|
||
- [ ] robots.txt allows important content
|
||
- [ ] Every important page has ≥1 internal inbound link
|
||
- [ ] `generateStaticParams` added for dynamic routes with known slugs
|
||
|
||
## Limitations
|
||
|
||
- Does not guarantee Google will index a page; final indexing decisions remain with the search engine.
|
||
- Requires access to the codebase, deployed URLs, and ideally Google Search Console data for confident diagnosis.
|
||
- Treat recommendations that change URL structure, redirects, or canonical policy as production-impacting and review them before deployment.
|