362 lines
12 KiB
Markdown
362 lines
12 KiB
Markdown
---
|
|
name: multi-agent-architect
|
|
description: "Design and optimize production-grade multi-agent systems with LangGraph, LangChain, and DeepAgents for complex AI workflows."
|
|
risk: safe
|
|
source: community
|
|
metadata:
|
|
category: ai-engineering
|
|
source_repo: pravin-python/antigravity-awesome-skills
|
|
source_type: community
|
|
date_added: "2025-05-07"
|
|
author: community
|
|
tags: [langgraph, langchain, multi-agent, orchestration, deepagents, rag, tool-calling]
|
|
tools: [claude, cursor, gemini]
|
|
license: "MIT"
|
|
license_source: "https://github.com/pravin-python/antigravity-awesome-skills/blob/main/LICENSE"
|
|
---
|
|
|
|
|
|
# Multi-Agent Architect & Updater Skill
|
|
|
|
## Overview
|
|
|
|
This skill turns Claude into a Senior AI Multi-Agent Architect specialized in LangGraph, LangChain, and DeepAgents. It provides structured workflows for creating and updating production-grade multi-agent systems — including supervisor agents, planners, researchers, coders, and memory-backed autonomous pipelines. Use it whenever you need to design, build, debug, or scale any multi-agent AI system.
|
|
|
|
If this skill adapts material from an external GitHub repository, declare both:
|
|
|
|
- `source_repo: owner/repo`
|
|
- `source_type: official` or `source_type: community`
|
|
|
|
## When to Use This Skill
|
|
|
|
- Use when you need to create a new agent or multi-agent workflow from scratch
|
|
- Use when working with LangGraph state graphs, nodes, edges, or conditional routing
|
|
- Use when the user asks about agent communication, memory systems, or tool-calling pipelines
|
|
- Use when debugging or optimizing an existing LangChain/LangGraph agent system
|
|
- Use when architecting supervisor, planner, research, coding, or validation agent roles
|
|
- Use when integrating DeepAgents with hierarchical planning and delegation
|
|
|
|
## How It Works
|
|
|
|
### Step 1: Understand the Goal
|
|
|
|
Before writing any code, clarify:
|
|
- What is the **business objective** this agent system must achieve?
|
|
- What **agent roles** are needed (supervisor, planner, researcher, coder, validator)?
|
|
- What **tools** does each agent require?
|
|
- What **memory** strategy is needed (Redis, Vector DB, LangChain Memory)?
|
|
- What **communication protocol** connects agents (shared state, message passing)?
|
|
|
|
### Step 2: Define the State Schema
|
|
|
|
All agents share a typed state object passed through the graph:
|
|
|
|
```python
|
|
from typing import TypedDict
|
|
|
|
class AgentState(TypedDict):
|
|
user_goal: str
|
|
tasks: list[str]
|
|
completed_tasks: list[str]
|
|
next_agent: str
|
|
context: dict
|
|
step_count: int # guards against infinite loops
|
|
error: str | None
|
|
```
|
|
|
|
### Step 3: Define Agent Nodes
|
|
|
|
Each agent is an **async function** that reads from state and returns an updated state:
|
|
|
|
```python
|
|
import logging
|
|
from langchain_openai import ChatOpenAI
|
|
|
|
logger = logging.getLogger(__name__)
|
|
|
|
async def research_node(state: AgentState) -> AgentState:
|
|
logger.info("research_node: starting")
|
|
llm = ChatOpenAI(model="gpt-4o")
|
|
result = await llm.bind_tools(research_tools).ainvoke(state["user_goal"])
|
|
state["context"]["research"] = result.content
|
|
state["next_agent"] = "coder"
|
|
return state
|
|
```
|
|
|
|
### Step 4: Build the LangGraph
|
|
|
|
Wire nodes together with edges and conditional routing:
|
|
|
|
```python
|
|
from langgraph.graph import StateGraph, END
|
|
from langgraph.prebuilt import ToolNode
|
|
|
|
def build_graph() -> StateGraph:
|
|
graph = StateGraph(AgentState)
|
|
|
|
graph.add_node("supervisor", supervisor_node)
|
|
graph.add_node("research", research_node)
|
|
graph.add_node("coder", coding_node)
|
|
graph.add_node("validator", validation_node)
|
|
graph.add_node("tools", ToolNode(all_tools))
|
|
|
|
graph.set_entry_point("supervisor")
|
|
|
|
graph.add_conditional_edges(
|
|
"supervisor",
|
|
route_next,
|
|
{"research": "research", "coder": "coder", "end": END}
|
|
)
|
|
|
|
graph.add_edge("research", "supervisor")
|
|
graph.add_edge("coder", "validator")
|
|
graph.add_edge("validator", "supervisor")
|
|
|
|
return graph.compile()
|
|
|
|
def route_next(state: AgentState) -> str:
|
|
if state["step_count"] > 20:
|
|
return "end"
|
|
return state["next_agent"]
|
|
```
|
|
|
|
### Step 5: Add Memory
|
|
|
|
```python
|
|
from langchain_community.chat_message_histories import RedisChatMessageHistory
|
|
|
|
def get_memory(session_id: str):
|
|
return RedisChatMessageHistory(
|
|
session_id=session_id,
|
|
url=os.getenv("REDIS_URL"),
|
|
ttl=3600
|
|
)
|
|
```
|
|
|
|
### Step 6: Run the Graph
|
|
|
|
```python
|
|
async def run(user_goal: str, session_id: str):
|
|
graph = build_graph()
|
|
initial_state = AgentState(
|
|
user_goal=user_goal,
|
|
tasks=[],
|
|
completed_tasks=[],
|
|
next_agent="supervisor",
|
|
context={},
|
|
step_count=0,
|
|
error=None,
|
|
)
|
|
return await graph.ainvoke(initial_state)
|
|
```
|
|
|
|
### Step 7: Expose via FastAPI (optional)
|
|
|
|
```python
|
|
from fastapi import FastAPI
|
|
from pydantic import BaseModel
|
|
|
|
app = FastAPI()
|
|
|
|
class RunRequest(BaseModel):
|
|
goal: str
|
|
session_id: str
|
|
|
|
@app.post("/run")
|
|
async def run_agent(req: RunRequest):
|
|
result = await run(req.goal, req.session_id)
|
|
return {"result": result}
|
|
```
|
|
|
|
---
|
|
|
|
## Updating an Existing Agent
|
|
|
|
When the user wants to update or debug an existing agent, structure the response as:
|
|
|
|
```
|
|
## Existing Issue
|
|
[Describe the current problem]
|
|
|
|
## Root Cause
|
|
[Identify why it's happening in the architecture]
|
|
|
|
## Proposed Update
|
|
[Outline the changes at architecture level]
|
|
|
|
## Updated Code
|
|
[Generate only the changed modules]
|
|
|
|
## Migration Notes
|
|
[What breaks, what's backward-compatible]
|
|
|
|
## Performance Impact
|
|
[Latency / token / memory delta]
|
|
```
|
|
|
|
---
|
|
|
|
## Standard Folder Structure
|
|
|
|
Always generate code in this layout:
|
|
|
|
```
|
|
multi_agent_system/
|
|
├── agents/ # One file per agent role
|
|
├── tools/ # Tool definitions and wrappers
|
|
├── memory/ # Redis, VectorDB, LangChain memory helpers
|
|
├── prompts/ # Prompt templates (one per agent)
|
|
├── workflows/ # High-level orchestration logic
|
|
├── graphs/ # LangGraph state + compiled graph definitions
|
|
├── api/ # FastAPI routes (optional)
|
|
├── configs/ # Config loader — no secrets in code
|
|
├── tests/ # Unit + integration tests per agent
|
|
└── main.py
|
|
```
|
|
|
|
---
|
|
|
|
## Examples
|
|
|
|
### Example 1: Research + Coding Multi-Agent Workflow
|
|
|
|
```python
|
|
# agents/research_agent.py
|
|
async def research_node(state: AgentState) -> AgentState:
|
|
llm = ChatOpenAI(model="gpt-4o").bind_tools([web_search, rag_search])
|
|
response = await llm.ainvoke(
|
|
f"Research the following and return structured findings:\n{state['user_goal']}"
|
|
)
|
|
state["context"]["research"] = response.content
|
|
state["next_agent"] = "coder"
|
|
return state
|
|
|
|
# agents/coding_agent.py
|
|
async def coding_node(state: AgentState) -> AgentState:
|
|
llm = ChatOpenAI(model="gpt-4o").bind_tools([python_repl, github_tool])
|
|
response = await llm.ainvoke(
|
|
f"Given this research:\n{state['context']['research']}\n\nWrite production Python code."
|
|
)
|
|
state["context"]["code"] = response.content
|
|
state["next_agent"] = "validator"
|
|
return state
|
|
```
|
|
|
|
### Example 2: Supervisor with Dynamic Delegation
|
|
|
|
```python
|
|
# agents/supervisor_agent.py
|
|
DELEGATION_PROMPT = """
|
|
You are a supervisor. Given the current state, decide the next agent.
|
|
Available agents: research, coder, validator, end.
|
|
Respond with ONLY the agent name.
|
|
|
|
Goal: {goal}
|
|
Completed: {completed}
|
|
Context keys available: {context}
|
|
"""
|
|
|
|
async def supervisor_node(state: AgentState) -> AgentState:
|
|
state["step_count"] += 1
|
|
llm = ChatOpenAI(model="gpt-4o")
|
|
decision = await llm.ainvoke(
|
|
DELEGATION_PROMPT.format(
|
|
goal=state["user_goal"],
|
|
completed=state["completed_tasks"],
|
|
context=list(state["context"].keys()),
|
|
)
|
|
)
|
|
next_agent = decision.content.strip().lower()
|
|
# Validate against allowlist before setting
|
|
allowed = {"research", "coder", "validator", "end"}
|
|
state["next_agent"] = next_agent if next_agent in allowed else "end"
|
|
return state
|
|
```
|
|
|
|
### Example 3: DeepAgents Reflection Loop
|
|
|
|
```python
|
|
async def reflection_node(state: AgentState) -> AgentState:
|
|
llm = ChatOpenAI(model="gpt-4o")
|
|
critique = await llm.ainvoke(
|
|
f"Evaluate this output critically:\n{state['context'].get('code', '')}\n"
|
|
"List any bugs, gaps, or improvements. Be concise."
|
|
)
|
|
state["context"]["critique"] = critique.content
|
|
state["next_agent"] = "coder" if "bug" in critique.content.lower() else "end"
|
|
return state
|
|
```
|
|
|
|
---
|
|
|
|
## Best Practices
|
|
|
|
- ✅ One agent = one responsibility — never combine planning + coding + testing in one node
|
|
- ✅ Use `TypedDict` for all state schemas — enables type checking and graph validation
|
|
- ✅ Bind only the tools each agent needs — reduces hallucinated tool calls
|
|
- ✅ Always add a `step_count` guard to prevent infinite routing loops
|
|
- ✅ Use `async`/`await` throughout — LangGraph supports async natively
|
|
- ✅ Store all secrets in environment variables loaded via `os.getenv()`
|
|
- ✅ Set TTLs on all Redis keys scoped to `session_id`
|
|
- ✅ Log at every node entry and tool call for observability
|
|
- ✅ Validate supervisor routing output against an allowlist of agent names
|
|
- ❌ Don't hardcode API keys, model names, or Redis URLs
|
|
- ❌ Don't share tool lists across agents that don't need them
|
|
- ❌ Don't skip error handling — tool failures and empty LLM responses are common
|
|
- ❌ Don't trust unvalidated LLM routing decisions — always check against an allowlist
|
|
|
|
---
|
|
|
|
## Limitations
|
|
|
|
- This skill does not replace environment-specific testing, load testing, or security review before production deployment.
|
|
- Generated LangGraph code targets the current stable API — always verify method signatures against your installed version (`pip show langgraph`).
|
|
- Stop and ask for clarification if the agent's goal, tool permissions, or routing logic is ambiguous before generating a full architecture.
|
|
- DeepAgents integration patterns assume the library is installed and configured in the target environment.
|
|
|
|
---
|
|
|
|
## Security & Safety Notes
|
|
|
|
- Never expose API keys in generated code. All secrets must use environment variables:
|
|
```python
|
|
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY") # ✅ correct
|
|
OPENAI_API_KEY = "sk-..." # ❌ never do this
|
|
```
|
|
- Always validate and sanitize user inputs before injecting them into agent prompts — treat all user input as untrusted.
|
|
- Add a permission layer before allowing agents to execute shell commands or write to filesystems.
|
|
- If generating a Python REPL tool node, document that it must only run in a sandboxed, isolated environment.
|
|
<!-- security-allowlist: python_repl tool examples are for sandboxed execution environments only -->
|
|
- For production deployments, add rate-limit handling and exponential backoff on all LLM and external API calls.
|
|
- Scope all Redis session keys to `session_id` and set a TTL to prevent memory leaks across sessions.
|
|
|
|
---
|
|
|
|
## Common Pitfalls
|
|
|
|
- **Problem:** Agent loops indefinitely between supervisor and sub-agents
|
|
**Solution:** Add `step_count: int` to state; return `"end"` in `route_next()` when `step_count > N`
|
|
|
|
- **Problem:** Supervisor routes to a non-existent agent name
|
|
**Solution:** Validate the LLM's routing output against a hardcoded allowlist before setting `next_agent`
|
|
|
|
- **Problem:** Memory leaks across user sessions
|
|
**Solution:** Scope Redis keys to `session_id` and always set a TTL (`ttl=3600`)
|
|
|
|
- **Problem:** Tool results are ignored by the next agent
|
|
**Solution:** Always write tool output into `state["context"]` and confirm the next node reads it
|
|
|
|
- **Problem:** Agents share too many tools and hallucinate wrong tool calls
|
|
**Solution:** Use `.bind_tools([only_relevant_tools])` per agent instead of a global tool list
|
|
|
|
- **Problem:** Graph fails silently on API rate limits
|
|
**Solution:** Wrap LLM calls in retry logic with exponential backoff using `tenacity`
|
|
|
|
---
|
|
|
|
## Related Skills
|
|
|
|
- `@langchain-rag` - When you need retrieval-augmented generation pipelines specifically
|
|
- `@fastapi-backend` - When deploying agent systems as production REST APIs
|
|
- `@python-async` - When deepening async/await patterns used throughout agent nodes
|