144 lines
9.0 KiB
Markdown
144 lines
9.0 KiB
Markdown
---
|
|
name: survey-generator
|
|
description: "Generate source-backed AI/ML survey paper artifacts with curated bibliographies and Fireworks/Kimi HTML rendering."
|
|
allowed-tools: Read, Write, Bash, WebFetch, AskUserQuestion
|
|
category: "research"
|
|
risk: "safe"
|
|
source: "official"
|
|
source_repo: "dair-ai/dair-academy-plugins"
|
|
source_type: "official"
|
|
date_added: "2026-06-19"
|
|
author: "DAIR.AI"
|
|
license: "MIT"
|
|
license_source: "https://github.com/dair-ai/dair-academy-plugins/blob/main/README.md#license"
|
|
tags:
|
|
- dair-academy
|
|
- ai
|
|
- workflow
|
|
tools:
|
|
- claude-code
|
|
- codex-cli
|
|
- cursor
|
|
---
|
|
|
|
# Survey Generator Skill
|
|
|
|
## When to Use
|
|
|
|
Use when this workflow matches the user request: Use this skill for its documented workflow.
|
|
|
|
|
|
_Source: [dair-ai/dair-academy-plugins](https://github.com/dair-ai/dair-academy-plugins) (MIT)._
|
|
|
|
Generate an academic-style survey paper as a single self-contained HTML file.
|
|
|
|
## What this skill does
|
|
|
|
Given a topic and a public anchor resource, this skill:
|
|
1. Reads the anchor resource and extracts the landscape of relevant work.
|
|
2. Builds a structured `research_bundle.json` (title, taxonomy, sections, bibliography of real papers).
|
|
3. Calls Kimi K2.6 via the Fireworks chat completions API with the research bundle and a fixed `style_spec.json`.
|
|
4. Writes a single-file HTML artifact with inline SVG figures, an academic layout, numbered sections, and a References list.
|
|
|
|
The agent using this skill is responsible only for research curation. All prose, figures, and HTML are generated by Kimi K2.6 in one API call.
|
|
|
|
## Inputs from the user
|
|
|
|
The user invokes this skill with at minimum:
|
|
|
|
- `topic`: a concise survey topic, for example "Agentic Engineering" or "Reasoning Models".
|
|
- `source_url`: a public anchor resource. Any curated list, canonical blog post, arXiv survey, GitHub awesome-list, or index page works. Suggested starting points: [DAIR.AI AI Papers of the Week](https://github.com/dair-ai/AI-Papers-of-the-Week) (a continuously updated open-source index of notable AI/ML papers, well suited for broad topics), a GitHub awesome-* repo, an arXiv survey PDF, or a well-maintained papers page.
|
|
|
|
Optional:
|
|
|
|
- `bibliography_size`: target bibliography size. Default 20 for a quick survey. Use 40 to 50 for a comprehensive survey, 80 to 100 for an exhaustive one. Section length and token budget scale with this.
|
|
- `section_count`: number of sections, default 6 to 10.
|
|
|
|
If the user has not provided these, use AskUserQuestion to collect them before proceeding.
|
|
|
|
## Requirements
|
|
|
|
- `FIREWORKS_API_KEY` exported in the environment. The build script reads it from `os.environ`.
|
|
- Python 3 with stdlib only (urllib). No external dependencies.
|
|
|
|
## Workflow for the agent
|
|
|
|
Follow these steps in order. Do not skip steps.
|
|
|
|
### Step 1. Read the anchor resource
|
|
|
|
Fetch and read `source_url`. If it is a GitHub repo, fetch the README and any relevant `README-*.md` or `papers.md` indices. If it is an arXiv survey, use the abstract, figures, and section headings. If it is a blog post, read it in full. Extract the key subtopics and the papers or systems it references by name.
|
|
|
|
For broad AI/ML topics, [DAIR.AI AI Papers of the Week](https://github.com/dair-ai/AI-Papers-of-the-Week) is a particularly rich anchor: it has weekly issues going back years, each with short summaries of 6 to 10 notable papers, so it is easy to scan across time and filter to the subset that matches your topic.
|
|
|
|
If a paper-search tool is available to your agent (a Papers-of-the-Week MCP, arXiv search, Semantic Scholar, Google Scholar, an organization's internal index, etc.), use it to expand the candidate pool beyond what the anchor resource cites directly.
|
|
|
|
### Step 2. Define the taxonomy and sections
|
|
|
|
Draft a taxonomy rooted at the topic with 4 to 8 branches, each with 2 to 4 children. Branches should cover distinct subareas of the topic, not overlap. Draft 6 to 10 numbered sections that match the taxonomy progression: introduction, foundations, methods, evaluation, open problems. Figure 1's viewport height scales automatically with the total leaf count via the geometry contract in `style_spec.json`, so deeper taxonomies render cleanly.
|
|
|
|
### Step 3. Curate the bibliography
|
|
|
|
Pick real papers sized to `bibliography_size`. For a comprehensive survey, 40 to 50 entries is the sweet spot; the skill has been tested up to 100 entries with `max_tokens=81920` in `build_artifact.py`. Every entry must have: `key`, `authors`, `year`, `title`, `venue`, and a 1 to 2 sentence `summary`. Do not invent papers. Every section's `papers` array must reference keys that exist in the bibliography.
|
|
|
|
### Step 4. Write `research_bundle.json`
|
|
|
|
Write `research_bundle.json` in the skill directory (next to `build_artifact.py`). Use `templates/research_bundle_template.json` as the structural scaffold. Required top-level fields: `title`, `authors_placeholder`, `anchor_source`, `abstract_hints`, `taxonomy`, `paradigms`, `stack`, `sections`, `table`, `bibliography`. See `examples/agentic-engineering/research_bundle.json` for a complete worked example.
|
|
|
|
### Step 5. Run the generator
|
|
|
|
```bash
|
|
python3 build_artifact.py
|
|
```
|
|
|
|
Run this from the skill directory. The script reads `research_bundle.json` and `style_spec.json`, calls Kimi K2.6 on Fireworks, and writes `output/survey_kimi-k2p6_v{N}.html`. Each run produces a new versioned file.
|
|
|
|
To use a different Fireworks model (for example Kimi K2.5 for side-by-side comparison):
|
|
|
|
```bash
|
|
FIREWORKS_MODEL=accounts/fireworks/models/kimi-k2p5 python3 build_artifact.py
|
|
```
|
|
|
|
Output filenames are slugged by model so you can compare versions across models.
|
|
|
|
### Step 6. Preview and iterate
|
|
|
|
Open the HTML file locally. It is a fully self-contained HTML document, so you can also serve it from any static host, embed it in a dashboard, or hand it to any artifact-preview mechanism your agent exposes.
|
|
|
|
If figures look weak, sharpen `style_spec.json` (the `required_figures` and `figure_quality_note` keys) and rerun. If prose is thin or sections are missing, tighten the section `guidance` fields in `research_bundle.json`. Do not edit the Kimi output directly; iterate on inputs.
|
|
|
|
Common figure failure modes and the style_spec patterns that fix them:
|
|
- Nodes from different panels collapsing into one panel: require `<g transform="translate(OFFSET,0)">` groups with panel-local coordinates (enforced for Figure 2).
|
|
- Leaf rects overlapping vertically so labels get clipped: enforce rect_pitch greater than rect_height with an explicit formula and a sanity check (enforced for Figure 1).
|
|
- Root label overflowing its pill: pin minimum rect width in the spec (enforced for Figure 1, width=200).
|
|
- Sibling nodes in a row overlapping horizontally (e.g. Worker A, Worker B, Worker C in an orchestrator-workers panel): enforce a deterministic rect_width and center_x formula for N nodes in a fixed-width panel, with a minimum horizontal gap between adjacent rects (enforced for Figure 2 multi-node rows).
|
|
- Panel contents drifting to the left or right edge instead of sitting in the middle of the panel background: pin each group's translate offset to match the panel background's x position (10, 270, 530) and center all content on panel-local x=120 (enforced for Figure 2).
|
|
- Figures emitted in the wrong numeric order because the model preferred a different narrative flow: require the captions to use the exact IDs from required_figures in sequence (Figure 1 before Figure 2 before Figure 3), even if it means placing two figures in the same section (enforced via hard_rules_for_generation).
|
|
- Right-side labels on the stack diagram getting clipped at the viewport edge: widen the stack SVG viewport to 720 and require role-text tspans to fit within x=710 (enforced for Figure 3).
|
|
|
|
When adding a new figure or changing an existing one, follow the same pattern: declare an absolute viewport, per-element coordinates or a deterministic formula, and a hard-invariant check clause at the end of the description.
|
|
|
|
## Files in this skill
|
|
|
|
- `SKILL.md` - this file.
|
|
- `build_artifact.py` - Python script that calls Fireworks.
|
|
- `style_spec.json` - visual and structural spec (topic-agnostic).
|
|
- `templates/research_bundle_template.json` - empty template for new topics.
|
|
- `examples/agentic-engineering/` - reference 100-paper run (research_bundle.json + survey.html).
|
|
|
|
## Hard rules the agent must follow
|
|
|
|
1. Never invent bibliography entries. Every cited paper must be a real work with a real venue.
|
|
2. Every section's `papers` array must reference keys in the bibliography.
|
|
3. Never edit the generated HTML. Iterate on `research_bundle.json` or `style_spec.json` and rerun.
|
|
4. Do not modify the hard rules in `style_spec.json.hard_rules_for_generation`.
|
|
5. Keep the style_spec topic-agnostic. Topic-specific content lives only in `research_bundle.json`.
|
|
6. Do not use em dashes or arrow symbols in the research bundle prose fields.
|
|
|
|
|
|
## Limitations
|
|
|
|
- Requires the upstream tool, account, API key, or local setup when the workflow names one.
|
|
- Does not authorize destructive, production, paid, or external-message actions without explicit user approval.
|
|
- Validate generated artifacts or recommendations against the user's real sources before treating them as final.
|