---
name: survey-generator
description: "Generate source-backed AI/ML survey paper artifacts with curated bibliographies and Fireworks/Kimi HTML rendering."
allowed-tools: Read, Write, Bash, WebFetch, AskUserQuestion
category: "research"
risk: "safe"
source: "official"
source_repo: "dair-ai/dair-academy-plugins"
source_type: "official"
date_added: "2026-06-19"
author: "DAIR.AI"
license: "MIT"
license_source: "https://github.com/dair-ai/dair-academy-plugins/blob/main/README.md#license"
tags:
  - dair-academy
  - ai
  - workflow
tools:
  - claude-code
  - codex-cli
  - cursor
---

# Survey Generator Skill

## When to Use

Use when this workflow matches the user request: Use this skill for its documented workflow.


_Source: [dair-ai/dair-academy-plugins](https://github.com/dair-ai/dair-academy-plugins) (MIT)._

Generate an academic-style survey paper as a single self-contained HTML file.

## What this skill does

Given a topic and a public anchor resource, this skill:
1. Reads the anchor resource and extracts the landscape of relevant work.
2. Builds a structured `research_bundle.json` (title, taxonomy, sections, bibliography of real papers).
3. Calls Kimi K2.6 via the Fireworks chat completions API with the research bundle and a fixed `style_spec.json`.
4. Writes a single-file HTML artifact with inline SVG figures, an academic layout, numbered sections, and a References list.

The agent using this skill is responsible only for research curation. All prose, figures, and HTML are generated by Kimi K2.6 in one API call.

## Inputs from the user

The user invokes this skill with at minimum:

- `topic`: a concise survey topic, for example "Agentic Engineering" or "Reasoning Models".
- `source_url`: a public anchor resource. Any curated list, canonical blog post, arXiv survey, GitHub awesome-list, or index page works. Suggested starting points: [DAIR.AI AI Papers of the Week](https://github.com/dair-ai/AI-Papers-of-the-Week) (a continuously updated open-source index of notable AI/ML papers, well suited for broad topics), a GitHub awesome-* repo, an arXiv survey PDF, or a well-maintained papers page.

Optional:

- `bibliography_size`: target bibliography size. Default 20 for a quick survey. Use 40 to 50 for a comprehensive survey, 80 to 100 for an exhaustive one. Section length and token budget scale with this.
- `section_count`: number of sections, default 6 to 10.

If the user has not provided these, use AskUserQuestion to collect them before proceeding.

## Requirements

- `FIREWORKS_API_KEY` exported in the environment. The build script reads it from `os.environ`.
- Python 3 with stdlib only (urllib). No external dependencies.

## Workflow for the agent

Follow these steps in order. Do not skip steps.

### Step 1. Read the anchor resource

Fetch and read `source_url`. If it is a GitHub repo, fetch the README and any relevant `README-*.md` or `papers.md` indices. If it is an arXiv survey, use the abstract, figures, and section headings. If it is a blog post, read it in full. Extract the key subtopics and the papers or systems it references by name.

For broad AI/ML topics, [DAIR.AI AI Papers of the Week](https://github.com/dair-ai/AI-Papers-of-the-Week) is a particularly rich anchor: it has weekly issues going back years, each with short summaries of 6 to 10 notable papers, so it is easy to scan across time and filter to the subset that matches your topic.

If a paper-search tool is available to your agent (a Papers-of-the-Week MCP, arXiv search, Semantic Scholar, Google Scholar, an organization's internal index, etc.), use it to expand the candidate pool beyond what the anchor resource cites directly.

### Step 2. Define the taxonomy and sections

Draft a taxonomy rooted at the topic with 4 to 8 branches, each with 2 to 4 children. Branches should cover distinct subareas of the topic, not overlap. Draft 6 to 10 numbered sections that match the taxonomy progression: introduction, foundations, methods, evaluation, open problems. Figure 1's viewport height scales automatically with the total leaf count via the geometry contract in `style_spec.json`, so deeper taxonomies render cleanly.

### Step 3. Curate the bibliography

Pick real papers sized to `bibliography_size`. For a comprehensive survey, 40 to 50 entries is the sweet spot; the skill has been tested up to 100 entries with `max_tokens=81920` in `build_artifact.py`. Every entry must have: `key`, `authors`, `year`, `title`, `venue`, and a 1 to 2 sentence `summary`. Do not invent papers. Every section's `papers` array must reference keys that exist in the bibliography.

### Step 4. Write `research_bundle.json`

Write `research_bundle.json` in the skill directory (next to `build_artifact.py`). Use `templates/research_bundle_template.json` as the structural scaffold. Required top-level fields: `title`, `authors_placeholder`, `anchor_source`, `abstract_hints`, `taxonomy`, `paradigms`, `stack`, `sections`, `table`, `bibliography`. See `examples/agentic-engineering/research_bundle.json` for a complete worked example.

### Step 5. Run the generator

```bash
python3 build_artifact.py
```

Run this from the skill directory. The script reads `research_bundle.json` and `style_spec.json`, calls Kimi K2.6 on Fireworks, and writes `output/survey_kimi-k2p6_v{N}.html`. Each run produces a new versioned file.

To use a different Fireworks model (for example Kimi K2.5 for side-by-side comparison):

```bash
FIREWORKS_MODEL=accounts/fireworks/models/kimi-k2p5 python3 build_artifact.py
```

Output filenames are slugged by model so you can compare versions across models.

### Step 6. Preview and iterate

Open the HTML file locally. It is a fully self-contained HTML document, so you can also serve it from any static host, embed it in a dashboard, or hand it to any artifact-preview mechanism your agent exposes.

If figures look weak, sharpen `style_spec.json` (the `required_figures` and `figure_quality_note` keys) and rerun. If prose is thin or sections are missing, tighten the section `guidance` fields in `research_bundle.json`. Do not edit the Kimi output directly; iterate on inputs.

Common figure failure modes and the style_spec patterns that fix them:
- Nodes from different panels collapsing into one panel: require `<g transform="translate(OFFSET,0)">` groups with panel-local coordinates (enforced for Figure 2).
- Leaf rects overlapping vertically so labels get clipped: enforce rect_pitch greater than rect_height with an explicit formula and a sanity check (enforced for Figure 1).
- Root label overflowing its pill: pin minimum rect width in the spec (enforced for Figure 1, width=200).
- Sibling nodes in a row overlapping horizontally (e.g. Worker A, Worker B, Worker C in an orchestrator-workers panel): enforce a deterministic rect_width and center_x formula for N nodes in a fixed-width panel, with a minimum horizontal gap between adjacent rects (enforced for Figure 2 multi-node rows).
- Panel contents drifting to the left or right edge instead of sitting in the middle of the panel background: pin each group's translate offset to match the panel background's x position (10, 270, 530) and center all content on panel-local x=120 (enforced for Figure 2).
- Figures emitted in the wrong numeric order because the model preferred a different narrative flow: require the captions to use the exact IDs from required_figures in sequence (Figure 1 before Figure 2 before Figure 3), even if it means placing two figures in the same section (enforced via hard_rules_for_generation).
- Right-side labels on the stack diagram getting clipped at the viewport edge: widen the stack SVG viewport to 720 and require role-text tspans to fit within x=710 (enforced for Figure 3).

When adding a new figure or changing an existing one, follow the same pattern: declare an absolute viewport, per-element coordinates or a deterministic formula, and a hard-invariant check clause at the end of the description.

## Files in this skill

- `SKILL.md` - this file.
- `build_artifact.py` - Python script that calls Fireworks.
- `style_spec.json` - visual and structural spec (topic-agnostic).
- `templates/research_bundle_template.json` - empty template for new topics.
- `examples/agentic-engineering/` - reference 100-paper run (research_bundle.json + survey.html).

## Hard rules the agent must follow

1. Never invent bibliography entries. Every cited paper must be a real work with a real venue.
2. Every section's `papers` array must reference keys in the bibliography.
3. Never edit the generated HTML. Iterate on `research_bundle.json` or `style_spec.json` and rerun.
4. Do not modify the hard rules in `style_spec.json.hard_rules_for_generation`.
5. Keep the style_spec topic-agnostic. Topic-specific content lives only in `research_bundle.json`.
6. Do not use em dashes or arrow symbols in the research bundle prose fields.


## Limitations

- Requires the upstream tool, account, API key, or local setup when the workflow names one.
- Does not authorize destructive, production, paid, or external-message actions without explicit user approval.
- Validate generated artifacts or recommendations against the user's real sources before treating them as final.