--- name: image-generator description: Generate and edit images using Gemini's Nano Banana Pro model (gemini-3-pro-image-preview). Use this skill when the user asks you to generate images, create visuals, edit photos, create logos, generate product mockups, or perform any image generation/editing task. allowed-tools: Read, Write, Bash, WebFetch category: "media" risk: "safe" source: "official" source_repo: "dair-ai/dair-academy-plugins" source_type: "official" date_added: "2026-06-19" author: "DAIR.AI" license: "MIT" license_source: "https://github.com/dair-ai/dair-academy-plugins/blob/main/README.md#license" tags: - dair-academy - ai - workflow tools: - claude-code - codex-cli - cursor --- # Image Generator ## When to Use Use when this workflow matches the user request: Generate and edit images using Gemini's Nano Banana Pro model (gemini-3-pro-image-preview). Use this skill when the user asks you to generate images, create visuals, edit photos, create logos, generate product mockups, or perform any image generation/editing task. _Source: [dair-ai/dair-academy-plugins](https://github.com/dair-ai/dair-academy-plugins) (MIT)._ This skill generates and edits images using Google's Gemini Nano Banana Pro model (`gemini-3-pro-image-preview`). ## IMPORTANT: Setup Required Before using this skill, the user must set the `GEMINI_API_KEY` environment variable: 1. Get a free API key from [Google AI Studio](https://aistudio.google.com/) 2. Export the key in your shell profile (`~/.zshrc`, `~/.bashrc`, etc.): ```bash export GEMINI_API_KEY="your_api_key_here" ``` 3. Restart your terminal or run `source ~/.zshrc` (or `~/.bashrc`) **The skill will not work without this configuration.** ## Pre-flight Check Before making any API call, verify the key is set: ```bash if [ -z "$GEMINI_API_KEY" ]; then echo "ERROR: GEMINI_API_KEY is not set. Please export it in your shell profile." exit 1 fi ``` If the key is missing, stop and tell the user to set it using the instructions above. ## Configuration **Model**: `gemini-3-pro-image-preview` **API Key**: Read from the `GEMINI_API_KEY` environment variable ## Iterating on User-Provided Images When the user provides a path to an image they want to edit or iterate on, use this workflow: ### Step 1: Read and encode the image to base64 ```bash # Get the image path from user IMG_PATH="/path/to/user/image.png" # Detect mime type if [[ "$IMG_PATH" == *.png ]]; then MIME_TYPE="image/png" elif [[ "$IMG_PATH" == *.jpg ]] || [[ "$IMG_PATH" == *.jpeg ]]; then MIME_TYPE="image/jpeg" elif [[ "$IMG_PATH" == *.webp ]]; then MIME_TYPE="image/webp" else MIME_TYPE="image/png" fi # Encode to base64 (works on both macOS and Linux) if [[ "$(uname)" == "Darwin" ]]; then IMG_BASE64=$(base64 -i "$IMG_PATH") else IMG_BASE64=$(base64 -w0 "$IMG_PATH") fi ``` ### Step 2: Send image with edit prompt (File-Based Approach) **IMPORTANT:** Always use a file-based approach for the request body. Base64-encoded images are too large for command-line arguments and will cause "argument list too long" errors. ```bash # User's edit request EDIT_PROMPT="Add a santa hat to the person in this image" # Write request to a JSON file (avoids command line length limits) cat > /tmp/gemini_request.json << JSONEOF { "contents": [{ "parts": [ {"text": "$EDIT_PROMPT"}, { "inline_data": { "mime_type": "$MIME_TYPE", "data": "$IMG_BASE64" } } ] }], "generationConfig": { "responseModalities": ["TEXT", "IMAGE"] } } JSONEOF # Call the API using the file curl -s -X POST \ "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY" \ -H "Content-Type: application/json" \ -d @/tmp/gemini_request.json > /tmp/gemini_response.json ``` ### Step 3: Extract and save the edited image ```bash # Extract image from response and save python3 -c " import json import base64 with open('/tmp/gemini_response.json') as f: data = json.load(f) for part in data['candidates'][0]['content']['parts']: if 'inlineData' in part: img_data = part['inlineData']['data'] mime = part['inlineData']['mimeType'] ext = 'png' if 'png' in mime else 'jpg' with open('edited_image.' + ext, 'wb') as out: out.write(base64.b64decode(img_data)) print(f'Saved: edited_image.{ext}') elif 'text' in part: print(part['text']) " ``` ### Complete Example (File-Based) For iterating on images, always use file-based requests: ```bash # Variables IMG_PATH="/path/to/image.png" EDIT_PROMPT="Make the background a sunset beach" OUTPUT_PATH="edited_output.png" # Detect mime type and encode MIME_TYPE=$([[ "$IMG_PATH" == *.png ]] && echo "image/png" || echo "image/jpeg") IMG_BASE64=$(base64 -i "$IMG_PATH" 2>/dev/null || base64 -w0 "$IMG_PATH") # Write request to file (required - base64 images are too large for command line) cat > /tmp/gemini_request.json << JSONEOF { "contents": [{ "parts": [ {"text": "$EDIT_PROMPT"}, {"inline_data": {"mime_type": "$MIME_TYPE", "data": "$IMG_BASE64"}} ] }], "generationConfig": { "responseModalities": ["TEXT", "IMAGE"] } } JSONEOF # Call API and extract image curl -s -X POST \ "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY" \ -H "Content-Type: application/json" \ -d @/tmp/gemini_request.json > /tmp/gemini_response.json # Save the output image python3 -c " import json, base64 with open('/tmp/gemini_response.json') as f: data = json.load(f) for part in data.get('candidates', [{}])[0].get('content', {}).get('parts', []): if 'inlineData' in part: with open('$OUTPUT_PATH', 'wb') as f: f.write(base64.b64decode(part['inlineData']['data'])) print('Saved: $OUTPUT_PATH') " ``` ### Multi-Image Input (Combine/Compose) To combine elements from multiple images (also uses file-based approach): ```bash IMG1_PATH="/path/to/image1.png" IMG2_PATH="/path/to/image2.png" PROMPT="Put the dress from the first image on the person in the second image" IMG1_BASE64=$(base64 -i "$IMG1_PATH" 2>/dev/null || base64 -w0 "$IMG1_PATH") IMG2_BASE64=$(base64 -i "$IMG2_PATH" 2>/dev/null || base64 -w0 "$IMG2_PATH") # Write request to file cat > /tmp/gemini_request.json << JSONEOF { "contents": [{ "parts": [ {"text": "$PROMPT"}, {"inline_data": {"mime_type": "image/png", "data": "$IMG1_BASE64"}}, {"inline_data": {"mime_type": "image/png", "data": "$IMG2_BASE64"}} ] }], "generationConfig": {"responseModalities": ["TEXT", "IMAGE"]} } JSONEOF curl -s -X POST \ "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY" \ -H "Content-Type: application/json" \ -d @/tmp/gemini_request.json > /tmp/gemini_response.json ``` ## Capabilities ### Text-to-Image Generation - Generate high-quality images from text descriptions - Support for photorealistic, stylized, and artistic outputs - Accurate text rendering in images (logos, infographics, diagrams) ### Image Editing - Add or remove elements from images - Inpainting with semantic masking (edit specific parts) - Style transfer (apply artistic styles to photos) - Multi-image composition (combine elements from multiple images) ### Advanced Features - **High Resolution**: 1K, 2K, or 4K output - **Aspect Ratios**: 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9 - **Google Search Grounding**: Generate images based on real-time data - **Multi-turn Editing**: Iteratively refine images through conversation - **Up to 14 Reference Images**: Combine multiple inputs for complex compositions ## API Usage ### Basic Text-to-Image (Python) ```python from google import genai from google.genai import types client = genai.Client() response = client.models.generate_content( model="gemini-3-pro-image-preview", contents=["Your prompt here"], config=types.GenerateContentConfig( response_modalities=['TEXT', 'IMAGE'], image_config=types.ImageConfig( aspect_ratio="16:9", # Optional image_size="2K" # Optional: "1K", "2K", "4K" ) ) ) for part in response.parts: if part.text is not None: print(part.text) elif part.inline_data is not None: image = part.as_image() image.save("generated_image.png") ``` ### Basic Text-to-Image (JavaScript) ```javascript import { GoogleGenAI } from "@google/genai"; import * as fs from "node:fs"; const ai = new GoogleGenAI({}); const response = await ai.models.generateContent({ model: "gemini-3-pro-image-preview", contents: "Your prompt here", config: { responseModalities: ['TEXT', 'IMAGE'], imageConfig: { aspectRatio: "16:9", imageSize: "2K" } } }); for (const part of response.candidates[0].content.parts) { if (part.text) { console.log(part.text); } else if (part.inlineData) { const buffer = Buffer.from(part.inlineData.data, "base64"); fs.writeFileSync("generated_image.png", buffer); } } ``` ### REST API (curl) ```bash curl -s -X POST \ "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "contents": [{ "parts": [{"text": "Your prompt here"}] }], "generationConfig": { "responseModalities": ["TEXT", "IMAGE"], "imageConfig": { "aspectRatio": "16:9", "imageSize": "2K" } } }' | jq -r '.candidates[0].content.parts[] | select(.inlineData) | .inlineData.data' | base64 --decode > output.png ``` ### Image Editing (with input image) ```python from google import genai from google.genai import types from PIL import Image client = genai.Client() input_image = Image.open('input.png') prompt = "Add a wizard hat to the cat in this image" response = client.models.generate_content( model="gemini-3-pro-image-preview", contents=[prompt, input_image], config=types.GenerateContentConfig( response_modalities=['TEXT', 'IMAGE'] ) ) for part in response.parts: if part.inline_data is not None: image = part.as_image() image.save("edited_image.png") ``` ### Multi-Image Composition ```python from google import genai from google.genai import types from PIL import Image client = genai.Client() image1 = Image.open('dress.png') image2 = Image.open('model.png') prompt = "Put the dress from the first image on the model from the second image" response = client.models.generate_content( model="gemini-3-pro-image-preview", contents=[image1, image2, prompt], config=types.GenerateContentConfig( response_modalities=['TEXT', 'IMAGE'], image_config=types.ImageConfig( aspect_ratio="3:4", image_size="2K" ) ) ) ``` ### With Google Search Grounding ```python from google import genai from google.genai import types client = genai.Client() response = client.models.generate_content( model="gemini-3-pro-image-preview", contents="Visualize the current weather forecast for San Francisco", config=types.GenerateContentConfig( response_modalities=['TEXT', 'IMAGE'], image_config=types.ImageConfig(aspect_ratio="16:9"), tools=[{"google_search": {}}] ) ) ``` ## Prompting Best Practices ### 1. Be Descriptive, Not Keyword-Based Instead of: `cat, wizard hat, cute` Write: `A fluffy orange cat wearing a small knitted wizard hat, sitting on a wooden floor with soft natural lighting from a window` ### 2. Specify Style and Mood - Photography terms: "shot with 85mm lens", "soft bokeh background", "golden hour lighting" - Artistic styles: "in the style of Van Gogh", "minimalist illustration", "photorealistic" - Mood: "warm and cozy atmosphere", "dramatic noir lighting" ### 3. For Text in Images Be explicit about: - The exact text to render - Font style (descriptively): "clean, bold, sans-serif font" - Placement and size ### 4. For Editing - Describe what to change and what to preserve - Use "keep everything else unchanged" - Reference specific elements clearly ### 5. For Product/Commercial Images Mention: - Lighting setup: "three-point softbox lighting" - Background: "clean white studio background" - Camera angle: "slightly elevated 45-degree shot" ## Resolution and Aspect Ratio Reference | Aspect Ratio | 1K Resolution | 2K Resolution | 4K Resolution | |--------------|---------------|---------------|---------------| | 1:1 | 1024x1024 | 2048x2048 | 4096x4096 | | 16:9 | 1376x768 | 2752x1536 | 5504x3072 | | 9:16 | 768x1376 | 1536x2752 | 3072x5504 | | 3:2 | 1264x848 | 2528x1696 | 5056x3392 | | 2:3 | 848x1264 | 1696x2528 | 3392x5056 | ## Common Use Cases ### Logo Creation ``` Create a modern, minimalist logo for a coffee shop called 'The Daily Grind'. The text should be in a clean, bold, sans-serif font. Black and white color scheme. Put the logo in a circle. ``` ### Product Photography ``` A high-resolution, studio-lit product photograph of a minimalist ceramic coffee mug in matte black on a polished concrete surface. Three-point softbox lighting with soft, diffused highlights. Slightly elevated 45-degree camera angle. Sharp focus on steam rising from the coffee. ``` ### Style Transfer ``` Transform this photograph of a city street at night into Vincent van Gogh's 'Starry Night' style. Preserve the composition but render with swirling, impasto brushstrokes and deep blues with bright yellows. ``` ### Infographic ``` Create a vibrant infographic explaining photosynthesis as a recipe. Show "ingredients" (sunlight, water, CO2) and "finished dish" (sugar/energy). Style like a colorful kids' cookbook, suitable for 4th graders. ``` ## Error Handling Common issues: - **No image returned**: Check that `response_modalities` includes `'IMAGE'` - **Safety filters**: Some prompts may be blocked; try rephrasing - **Rate limits**: Implement exponential backoff for retries - **Large images**: For 4K, ensure sufficient timeout settings ## Dependencies To use the Python SDK: ```bash pip install google-genai pillow ``` For JavaScript: ```bash npm install @google/genai ``` ## Important Notes - All generated images include a SynthID watermark - The model uses a "thinking" process for complex prompts - For best text rendering, generate text first, then request image with that text - Images are not stored by the API - save outputs locally ## Limitations - Requires the upstream tool, account, API key, or local setup when the workflow names one. - Does not authorize destructive, production, paid, or external-message actions without explicit user approval. - Validate generated artifacts or recommendations against the user's real sources before treating them as final.