--- name: survey-generator description: "Generate source-backed AI/ML survey paper artifacts with curated bibliographies and Fireworks/Kimi HTML rendering." allowed-tools: Read, Write, Bash, WebFetch, AskUserQuestion category: "research" risk: "safe" source: "official" source_repo: "dair-ai/dair-academy-plugins" source_type: "official" date_added: "2026-06-19" author: "DAIR.AI" license: "MIT" license_source: "https://github.com/dair-ai/dair-academy-plugins/blob/main/README.md#license" tags: - dair-academy - ai - workflow tools: - claude-code - codex-cli - cursor --- # Survey Generator Skill ## When to Use Use when this workflow matches the user request: Use this skill for its documented workflow. _Source: [dair-ai/dair-academy-plugins](https://github.com/dair-ai/dair-academy-plugins) (MIT)._ Generate an academic-style survey paper as a single self-contained HTML file. ## What this skill does Given a topic and a public anchor resource, this skill: 1. Reads the anchor resource and extracts the landscape of relevant work. 2. Builds a structured `research_bundle.json` (title, taxonomy, sections, bibliography of real papers). 3. Calls Kimi K2.6 via the Fireworks chat completions API with the research bundle and a fixed `style_spec.json`. 4. Writes a single-file HTML artifact with inline SVG figures, an academic layout, numbered sections, and a References list. The agent using this skill is responsible only for research curation. All prose, figures, and HTML are generated by Kimi K2.6 in one API call. ## Inputs from the user The user invokes this skill with at minimum: - `topic`: a concise survey topic, for example "Agentic Engineering" or "Reasoning Models". - `source_url`: a public anchor resource. Any curated list, canonical blog post, arXiv survey, GitHub awesome-list, or index page works. Suggested starting points: [DAIR.AI AI Papers of the Week](https://github.com/dair-ai/AI-Papers-of-the-Week) (a continuously updated open-source index of notable AI/ML papers, well suited for broad topics), a GitHub awesome-* repo, an arXiv survey PDF, or a well-maintained papers page. Optional: - `bibliography_size`: target bibliography size. Default 20 for a quick survey. Use 40 to 50 for a comprehensive survey, 80 to 100 for an exhaustive one. Section length and token budget scale with this. - `section_count`: number of sections, default 6 to 10. If the user has not provided these, use AskUserQuestion to collect them before proceeding. ## Requirements - `FIREWORKS_API_KEY` exported in the environment. The build script reads it from `os.environ`. - Python 3 with stdlib only (urllib). No external dependencies. ## Workflow for the agent Follow these steps in order. Do not skip steps. ### Step 1. Read the anchor resource Fetch and read `source_url`. If it is a GitHub repo, fetch the README and any relevant `README-*.md` or `papers.md` indices. If it is an arXiv survey, use the abstract, figures, and section headings. If it is a blog post, read it in full. Extract the key subtopics and the papers or systems it references by name. For broad AI/ML topics, [DAIR.AI AI Papers of the Week](https://github.com/dair-ai/AI-Papers-of-the-Week) is a particularly rich anchor: it has weekly issues going back years, each with short summaries of 6 to 10 notable papers, so it is easy to scan across time and filter to the subset that matches your topic. If a paper-search tool is available to your agent (a Papers-of-the-Week MCP, arXiv search, Semantic Scholar, Google Scholar, an organization's internal index, etc.), use it to expand the candidate pool beyond what the anchor resource cites directly. ### Step 2. Define the taxonomy and sections Draft a taxonomy rooted at the topic with 4 to 8 branches, each with 2 to 4 children. Branches should cover distinct subareas of the topic, not overlap. Draft 6 to 10 numbered sections that match the taxonomy progression: introduction, foundations, methods, evaluation, open problems. Figure 1's viewport height scales automatically with the total leaf count via the geometry contract in `style_spec.json`, so deeper taxonomies render cleanly. ### Step 3. Curate the bibliography Pick real papers sized to `bibliography_size`. For a comprehensive survey, 40 to 50 entries is the sweet spot; the skill has been tested up to 100 entries with `max_tokens=81920` in `build_artifact.py`. Every entry must have: `key`, `authors`, `year`, `title`, `venue`, and a 1 to 2 sentence `summary`. Do not invent papers. Every section's `papers` array must reference keys that exist in the bibliography. ### Step 4. Write `research_bundle.json` Write `research_bundle.json` in the skill directory (next to `build_artifact.py`). Use `templates/research_bundle_template.json` as the structural scaffold. Required top-level fields: `title`, `authors_placeholder`, `anchor_source`, `abstract_hints`, `taxonomy`, `paradigms`, `stack`, `sections`, `table`, `bibliography`. See `examples/agentic-engineering/research_bundle.json` for a complete worked example. ### Step 5. Run the generator ```bash python3 build_artifact.py ``` Run this from the skill directory. The script reads `research_bundle.json` and `style_spec.json`, calls Kimi K2.6 on Fireworks, and writes `output/survey_kimi-k2p6_v{N}.html`. Each run produces a new versioned file. To use a different Fireworks model (for example Kimi K2.5 for side-by-side comparison): ```bash FIREWORKS_MODEL=accounts/fireworks/models/kimi-k2p5 python3 build_artifact.py ``` Output filenames are slugged by model so you can compare versions across models. ### Step 6. Preview and iterate Open the HTML file locally. It is a fully self-contained HTML document, so you can also serve it from any static host, embed it in a dashboard, or hand it to any artifact-preview mechanism your agent exposes. If figures look weak, sharpen `style_spec.json` (the `required_figures` and `figure_quality_note` keys) and rerun. If prose is thin or sections are missing, tighten the section `guidance` fields in `research_bundle.json`. Do not edit the Kimi output directly; iterate on inputs. Common figure failure modes and the style_spec patterns that fix them: - Nodes from different panels collapsing into one panel: require `` groups with panel-local coordinates (enforced for Figure 2). - Leaf rects overlapping vertically so labels get clipped: enforce rect_pitch greater than rect_height with an explicit formula and a sanity check (enforced for Figure 1). - Root label overflowing its pill: pin minimum rect width in the spec (enforced for Figure 1, width=200). - Sibling nodes in a row overlapping horizontally (e.g. Worker A, Worker B, Worker C in an orchestrator-workers panel): enforce a deterministic rect_width and center_x formula for N nodes in a fixed-width panel, with a minimum horizontal gap between adjacent rects (enforced for Figure 2 multi-node rows). - Panel contents drifting to the left or right edge instead of sitting in the middle of the panel background: pin each group's translate offset to match the panel background's x position (10, 270, 530) and center all content on panel-local x=120 (enforced for Figure 2). - Figures emitted in the wrong numeric order because the model preferred a different narrative flow: require the captions to use the exact IDs from required_figures in sequence (Figure 1 before Figure 2 before Figure 3), even if it means placing two figures in the same section (enforced via hard_rules_for_generation). - Right-side labels on the stack diagram getting clipped at the viewport edge: widen the stack SVG viewport to 720 and require role-text tspans to fit within x=710 (enforced for Figure 3). When adding a new figure or changing an existing one, follow the same pattern: declare an absolute viewport, per-element coordinates or a deterministic formula, and a hard-invariant check clause at the end of the description. ## Files in this skill - `SKILL.md` - this file. - `build_artifact.py` - Python script that calls Fireworks. - `style_spec.json` - visual and structural spec (topic-agnostic). - `templates/research_bundle_template.json` - empty template for new topics. - `examples/agentic-engineering/` - reference 100-paper run (research_bundle.json + survey.html). ## Hard rules the agent must follow 1. Never invent bibliography entries. Every cited paper must be a real work with a real venue. 2. Every section's `papers` array must reference keys in the bibliography. 3. Never edit the generated HTML. Iterate on `research_bundle.json` or `style_spec.json` and rerun. 4. Do not modify the hard rules in `style_spec.json.hard_rules_for_generation`. 5. Keep the style_spec topic-agnostic. Topic-specific content lives only in `research_bundle.json`. 6. Do not use em dashes or arrow symbols in the research bundle prose fields. ## Limitations - Requires the upstream tool, account, API key, or local setup when the workflow names one. - Does not authorize destructive, production, paid, or external-message actions without explicit user approval. - Validate generated artifacts or recommendations against the user's real sources before treating them as final.