--- name: llm-council description: "Run Fireworks-hosted open-weight model councils that compare responses and synthesize a final answer." allowed-tools: Read, Write, Bash, AskUserQuestion category: "ai-agents" risk: "safe" source: "official" source_repo: "dair-ai/dair-academy-plugins" source_type: "official" date_added: "2026-06-19" author: "DAIR.AI" license: "MIT" license_source: "https://github.com/dair-ai/dair-academy-plugins/blob/main/README.md#license" tags: - dair-academy - ai - workflow tools: - claude-code - codex-cli - cursor --- # LLM Council (Fireworks AI) ## When to Use Use when this workflow matches the user request: Use this skill for its documented workflow. _Source: [dair-ai/dair-academy-plugins](https://github.com/dair-ai/dair-academy-plugins) (MIT)._ This skill implements Karpathy's LLM Council concept where multiple open-weight LLMs deliberate on a query, powered entirely by Fireworks AI: 1. **Phase 1**: All models respond to the query independently (parallel) 2. **Phase 2**: Models rank each other's anonymized responses 3. **Phase 3**: A Chairman LLM synthesizes the final answer All inference runs through **Fireworks AI** using open-weight models. The speed and pricing of Fireworks makes it practical to run multi-model deliberation that would be slow or expensive on other providers. ## CRITICAL RULES 1. **ALWAYS use AskUserQuestion** to let the user select council models (multiselect) and the Chairman model 2. **ALWAYS save raw responses to files** - never summarize or truncate API outputs 3. **ALWAYS show full transparency** - display all individual responses, all rankings, AND the final synthesis 4. **NEVER skip the ranking phase** - it is essential to the council deliberation process 5. **Read from files for display** - ensures content is shown unmodified 6. **ALWAYS display the final output to the user** after Phase 3 completes ## Pre-flight Check Before running any phase, verify the Fireworks API key is set: ```bash if [ -z "$FIREWORKS_API_KEY" ]; then echo "ERROR: FIREWORKS_API_KEY is not set." echo "Create a Fireworks AI account at: https://fireworks.ai/" echo "Then export it in your shell profile (~/.zshrc or ~/.bashrc):" echo ' export FIREWORKS_API_KEY="your_api_key_here"' exit 1 fi echo "FIREWORKS_API_KEY is set." ``` ## Available Models Present these options to the user via AskUserQuestion (multiselect): | Model | Fireworks ID | Provider | |-------|-------------|----------| | GLM 5 | accounts/fireworks/models/glm-5 | Z.ai | | DeepSeek V3.1 | accounts/fireworks/models/deepseek-v3p1 | DeepSeek | | DeepSeek V3.2 | accounts/fireworks/models/deepseek-v3p2 | DeepSeek | | MiniMax M2.1 | accounts/fireworks/models/minimax-m2p1 | MiniMax | | Kimi K2.5 | accounts/fireworks/models/kimi-k2p5 | Moonshot | | Qwen3 235B | accounts/fireworks/models/qwen3-235b-a22b | Alibaba | | Llama 4 Maverick | accounts/fireworks/models/llama4-maverick-instruct-basic | Meta | ## Workflow ### Step 1: Gather User Input Use AskUserQuestion to get: 1. The query/question for the council (or accept it from the conversation) 2. Which models to include (multiselect, recommend 3-5 models) 3. Which model should be the Chairman (single select) Note: AskUserQuestion supports max 4 options per question. Since there are 7 models, split model selection across two questions, or show the most popular 4 and let the user type "Other" for the rest. A good default is to show 4 models in the first question and note the others are available via "Other". Rotate which models are shown based on variety. Example AskUserQuestion for model selection (show 4, mention others): ``` question: "Which models should participate in the LLM Council? (Also available via Other: Llama 4 Maverick, Qwen3 235B, GLM 5)" header: "Models" multiSelect: true options: - label: "DeepSeek V3.2" description: "DeepSeek's newest and most capable model" - label: "MiniMax M2.1" description: "MiniMax's strong open-weight model" - label: "Kimi K2.5" description: "Moonshot's strong open-weight model" - label: "DeepSeek V3.1" description: "DeepSeek's proven reasoning model" ``` Example AskUserQuestion for chairman: ``` question: "Which model should be the Chairman (synthesizes the final answer)?" header: "Chairman" multiSelect: false options: - label: "DeepSeek V3.2 (Recommended)" description: "Newest DeepSeek, strong at comprehensive analysis" - label: "GLM 5" description: "Strong reasoning for synthesis" - label: "Kimi K2.5" description: "Strong at structured synthesis" - label: "MiniMax M2.1" description: "Strong open-weight model for synthesis" ``` ### Model Name to ID Mapping Use this mapping to convert user selections to Fireworks model IDs: ```python MODEL_MAP = { "GLM 5": "accounts/fireworks/models/glm-5", "DeepSeek V3.1": "accounts/fireworks/models/deepseek-v3p1", "DeepSeek V3.2": "accounts/fireworks/models/deepseek-v3p2", "MiniMax M2.1": "accounts/fireworks/models/minimax-m2p1", "Kimi K2.5": "accounts/fireworks/models/kimi-k2p5", "Qwen3 235B": "accounts/fireworks/models/qwen3-235b-a22b", "Llama 4 Maverick": "accounts/fireworks/models/llama4-maverick-instruct-basic", } ``` ### Step 2: Run Phase 1 - Individual Responses After gathering input, run this script to get responses from all selected models in parallel: ```bash QUERY="USER_QUERY_HERE" MODELS='["accounts/fireworks/models/glm-5", "accounts/fireworks/models/deepseek-v3p1"]' python3 << 'PYEOF' import os import json import requests import time from concurrent.futures import ThreadPoolExecutor, as_completed FIREWORKS_API_KEY = os.environ.get("FIREWORKS_API_KEY") API_URL = "https://api.fireworks.ai/inference/v1/chat/completions" QUERY = os.environ.get("QUERY", "") MODELS = json.loads(os.environ.get("MODELS", "[]")) # Create session directory timestamp = time.strftime("%Y%m%d-%H%M%S") SESSION_DIR = f"/tmp/llm-council/{timestamp}" os.makedirs(SESSION_DIR, exist_ok=True) # Save config config = {"query": QUERY, "models": MODELS, "timestamp": timestamp} with open(f"{SESSION_DIR}/config.json", "w") as f: json.dump(config, f, indent=2) def call_model(model_id, query): """Call a single model via Fireworks AI""" try: start = time.time() response = requests.post( API_URL, headers={ "Authorization": f"Bearer {FIREWORKS_API_KEY}", "Content-Type": "application/json" }, json={ "model": model_id, "messages": [ {"role": "system", "content": "You are participating in an LLM council deliberation. Provide your best, most thoughtful response to the query. Be comprehensive but focused."}, {"role": "user", "content": query} ], "max_tokens": 4000, "temperature": 1 }, timeout=120 ) response.raise_for_status() elapsed = time.time() - start data = response.json() usage = data.get("usage", {}) return { "success": True, "content": data["choices"][0]["message"]["content"], "model": model_id, "latency_seconds": round(elapsed, 2), "tokens": { "prompt": usage.get("prompt_tokens", 0), "completion": usage.get("completion_tokens", 0), "total": usage.get("total_tokens", 0) } } except Exception as e: return { "success": False, "content": f"[ERROR: {str(e)}]", "model": model_id, "latency_seconds": 0, "tokens": {"prompt": 0, "completion": 0, "total": 0} } print(f"\n{'='*60}") print("PHASE 1: Collecting Individual Responses") print(f"{'='*60}") print(f"Query: {QUERY[:200]}...") print(f"Models: {', '.join([m.split('/')[-1] for m in MODELS])}") print(f"Session: {SESSION_DIR}") print() # Parallel execution results = {} with ThreadPoolExecutor(max_workers=len(MODELS)) as executor: futures = {executor.submit(call_model, m, QUERY): m for m in MODELS} for future in as_completed(futures): model = futures[future] result = future.result() results[model] = result status = "OK" if result["success"] else "FAILED" latency = f"{result['latency_seconds']}s" if result["success"] else "N/A" print(f" [{status}] {model.split('/')[-1]} ({latency})") # Save raw results with open(f"{SESSION_DIR}/phase1_responses.json", "w") as f: json.dump(results, f, indent=2) print(f"\nPhase 1 complete. Results saved to: {SESSION_DIR}/phase1_responses.json") print(f"SESSION_DIR={SESSION_DIR}") PYEOF ``` ### Step 3: Run Phase 2 - Cross-Model Ranking Each model reviews and ranks the anonymized responses from Phase 1: ```bash SESSION_DIR="/tmp/llm-council/TIMESTAMP_HERE" python3 << 'PYEOF' import os import json import requests import time from concurrent.futures import ThreadPoolExecutor, as_completed FIREWORKS_API_KEY = os.environ.get("FIREWORKS_API_KEY") API_URL = "https://api.fireworks.ai/inference/v1/chat/completions" SESSION_DIR = os.environ.get("SESSION_DIR") # Load Phase 1 results with open(f"{SESSION_DIR}/config.json") as f: config = json.load(f) with open(f"{SESSION_DIR}/phase1_responses.json") as f: phase1_results = json.load(f) QUERY = config["query"] MODELS = config["models"] # Create anonymized mapping labels = ["A", "B", "C", "D", "E", "F", "G"][:len(MODELS)] model_to_label = dict(zip(MODELS, labels)) label_to_model = {v: k for k, v in model_to_label.items()} # Format anonymized responses anonymized_responses = [] for model_id in MODELS: label = model_to_label[model_id] content = phase1_results[model_id]["content"] anonymized_responses.append(f"=== Response {label} ===\n{content}") anonymized_text = "\n\n".join(anonymized_responses) def get_rankings(model_id, query, anonymized, own_label): """Get rankings from a single model""" ranking_prompt = f"""You are evaluating responses from multiple AI models to this query: QUERY: {query} Here are the anonymized responses: {anonymized} Please rank these responses from BEST to WORST. For each ranking: 1. State the response letter (A, B, C, etc.) 2. Give a brief reason (1-2 sentences) 3. You may skip ranking your own response (labeled {own_label}) or rank it fairly Format your response EXACTLY as: RANKINGS: 1. [Letter] - [Brief reason] 2. [Letter] - [Brief reason] 3. [Letter] - [Brief reason] ...""" try: start = time.time() response = requests.post( API_URL, headers={ "Authorization": f"Bearer {FIREWORKS_API_KEY}", "Content-Type": "application/json" }, json={ "model": model_id, "messages": [ {"role": "system", "content": f"You are ranking AI responses objectively. Your own response is labeled '{own_label}'."}, {"role": "user", "content": ranking_prompt} ], "max_tokens": 1000, "temperature": 1 }, timeout=90 ) response.raise_for_status() elapsed = time.time() - start return { "success": True, "content": response.json()["choices"][0]["message"]["content"], "model": model_id, "latency_seconds": round(elapsed, 2) } except Exception as e: return { "success": False, "content": f"[ERROR: {str(e)}]", "model": model_id, "latency_seconds": 0 } print(f"\n{'='*60}") print("PHASE 2: Cross-Model Ranking") print(f"{'='*60}") print(f"Label mapping: {json.dumps({v: k.split('/')[-1] for k, v in model_to_label.items()})}") print() # Collect rankings from all models in parallel rankings = {} with ThreadPoolExecutor(max_workers=len(MODELS)) as executor: futures = { executor.submit(get_rankings, mid, QUERY, anonymized_text, model_to_label[mid]): mid for mid in MODELS } for future in as_completed(futures): model = futures[future] result = future.result() rankings[model] = result status = "OK" if result["success"] else "FAILED" latency = f"{result['latency_seconds']}s" if result["success"] else "N/A" print(f" [{status}] {model.split('/')[-1]} ({latency})") # Save rankings output = { "label_mapping": label_to_model, "model_to_label": model_to_label, "rankings": rankings } with open(f"{SESSION_DIR}/phase2_rankings.json", "w") as f: json.dump(output, f, indent=2) print(f"\nPhase 2 complete. Rankings saved to: {SESSION_DIR}/phase2_rankings.json") PYEOF ``` ### Step 4: Run Phase 3 - Chairman Synthesis The Chairman model receives all responses and rankings, then produces the final synthesis: ```bash SESSION_DIR="/tmp/llm-council/TIMESTAMP_HERE" CHAIRMAN_MODEL="accounts/fireworks/models/glm-5" python3 << 'PYEOF' import os import json import requests import time FIREWORKS_API_KEY = os.environ.get("FIREWORKS_API_KEY") API_URL = "https://api.fireworks.ai/inference/v1/chat/completions" SESSION_DIR = os.environ.get("SESSION_DIR") CHAIRMAN_MODEL = os.environ.get("CHAIRMAN_MODEL") # Load all previous results with open(f"{SESSION_DIR}/config.json") as f: config = json.load(f) with open(f"{SESSION_DIR}/phase1_responses.json") as f: phase1 = json.load(f) with open(f"{SESSION_DIR}/phase2_rankings.json") as f: phase2 = json.load(f) QUERY = config["query"] label_to_model = phase2["label_mapping"] model_to_label = phase2["model_to_label"] # Format responses with model names revealed responses_text = [] for model_id, result in phase1.items(): label = model_to_label.get(model_id, "?") model_name = model_id.split("/")[-1] responses_text.append(f"=== {label}: {model_name} ===\n{result['content']}") # Format rankings rankings_text = [] for model_id, result in phase2["rankings"].items(): model_name = model_id.split("/")[-1] rankings_text.append(f"[{model_name}'s Rankings]\n{result['content']}") synthesis_prompt = f"""You are the Chairman of an LLM Council. Your task is to synthesize the best possible answer from multiple AI responses. ORIGINAL QUERY: {QUERY} INDIVIDUAL RESPONSES: {chr(10).join(responses_text)} MODEL RANKINGS: {chr(10).join(rankings_text)} As Chairman, produce a FINAL SYNTHESIS that: 1. Incorporates the strongest elements from the best-ranked responses 2. Resolves any contradictions between responses 3. Addresses aspects that multiple models agreed on 4. Corrects any errors identified through cross-ranking 5. Provides the most complete, accurate, and helpful answer Begin your synthesis:""" print(f"\n{'='*60}") print("PHASE 3: Chairman Synthesis") print(f"{'='*60}") print(f"Chairman: {CHAIRMAN_MODEL.split('/')[-1]}") print() try: start = time.time() response = requests.post( API_URL, headers={ "Authorization": f"Bearer {FIREWORKS_API_KEY}", "Content-Type": "application/json" }, json={ "model": CHAIRMAN_MODEL, "messages": [ {"role": "system", "content": "You are the Chairman of an LLM Council. Synthesize multiple AI perspectives into a definitive, comprehensive response."}, {"role": "user", "content": synthesis_prompt} ], "max_tokens": 4000, "temperature": 1 }, timeout=180 ) response.raise_for_status() elapsed = time.time() - start synthesis = response.json()["choices"][0]["message"]["content"] with open(f"{SESSION_DIR}/phase3_synthesis.txt", "w") as f: f.write(synthesis) print(f"Phase 3 complete ({elapsed:.2f}s). Synthesis saved to: {SESSION_DIR}/phase3_synthesis.txt") except Exception as e: print(f"ERROR: {e}") synthesis = f"[ERROR: {str(e)}]" with open(f"{SESSION_DIR}/phase3_synthesis.txt", "w") as f: f.write(synthesis) # Update config with chairman config["chairman"] = CHAIRMAN_MODEL with open(f"{SESSION_DIR}/config.json", "w") as f: json.dump(config, f, indent=2) PYEOF ``` ### Step 5: Display Full Results Read all saved files and display the complete council deliberation: ```bash SESSION_DIR="/tmp/llm-council/TIMESTAMP_HERE" python3 << 'PYEOF' import os import json SESSION_DIR = os.environ.get("SESSION_DIR") # Load all data with open(f"{SESSION_DIR}/config.json") as f: config = json.load(f) with open(f"{SESSION_DIR}/phase1_responses.json") as f: phase1 = json.load(f) with open(f"{SESSION_DIR}/phase2_rankings.json") as f: phase2 = json.load(f) with open(f"{SESSION_DIR}/phase3_synthesis.txt") as f: synthesis = f.read() model_to_label = phase2["model_to_label"] label_to_model = phase2["label_mapping"] # Build formatted output output = [] output.append("=" * 70) output.append(" LLM COUNCIL DELIBERATION") output.append(" Powered by Fireworks AI") output.append("=" * 70) output.append("") output.append(f"QUERY: {config['query']}") output.append(f"COUNCIL: {', '.join([m.split('/')[-1] for m in config['models']])}") output.append(f"CHAIRMAN: {config.get('chairman', 'N/A').split('/')[-1]}") output.append("") # Phase 1: Individual Responses output.append("-" * 70) output.append(" PHASE 1: INDIVIDUAL RESPONSES") output.append("-" * 70) output.append("") for model_id, result in phase1.items(): model_name = model_id.split("/")[-1] label = model_to_label.get(model_id, "?") latency = result.get("latency_seconds", "N/A") tokens = result.get("tokens", {}) output.append(f"[{label}] {model_name} (latency: {latency}s, tokens: {tokens.get('total', 'N/A')})") output.append("-" * 40) output.append(result["content"]) output.append("") # Phase 2: Cross-Model Rankings output.append("-" * 70) output.append(" PHASE 2: CROSS-MODEL RANKINGS") output.append("-" * 70) output.append("") output.append(f"Label mapping: {json.dumps({v: k.split('/')[-1] for k, v in model_to_label.items()}, indent=2)}") output.append("") for model_id, result in phase2["rankings"].items(): model_name = model_id.split("/")[-1] output.append(f"[{model_name}'s Rankings]") output.append(result["content"]) output.append("") # Phase 3: Chairman Synthesis output.append("-" * 70) output.append(" PHASE 3: CHAIRMAN'S SYNTHESIS") output.append("-" * 70) output.append("") chairman_name = config.get("chairman", "Chairman").split("/")[-1] output.append(f"[{chairman_name} - Chairman]") output.append("") output.append(synthesis) output.append("") output.append("=" * 70) output.append(f"Session files: {SESSION_DIR}/") # Save formatted output final_output = "\n".join(output) with open(f"{SESSION_DIR}/final_output.md", "w") as f: f.write(final_output) print(final_output) print(f"\nFull output saved to: {SESSION_DIR}/final_output.md") PYEOF ``` ## Important Notes 1. **Session Directory**: Each run creates a unique session in `/tmp/llm-council/{timestamp}/` 2. **Raw Data Preserved**: All API responses are saved as-is to JSON files for full transparency 3. **Cost**: Fireworks pricing is per-token. More models and longer queries cost more. Check current pricing at https://fireworks.ai/pricing 4. **Latency Tracking**: Each API call tracks latency so you can see Fireworks' speed in action 5. **Token Usage**: Phase 1 responses include token counts for cost awareness 6. **Rate Limits**: If you hit rate limits, wait briefly and retry 7. **Model Availability**: Check https://app.fireworks.ai/ for current model status ## Setup 1. Create a Fireworks AI account at https://fireworks.ai/ and grab your API key from the dashboard 2. Export it in your shell profile: ```bash export FIREWORKS_API_KEY="your_api_key_here" ``` 3. Restart your terminal or run `source ~/.zshrc` 4. Invoke this skill when you want multiple open-weight AI perspectives on a question ## Limitations - Requires the upstream tool, account, API key, or local setup when the workflow names one. - Does not authorize destructive, production, paid, or external-message actions without explicit user approval. - Validate generated artifacts or recommendations against the user's real sources before treating them as final.