@@ -79,7 +79,7 @@ In addition to built-in tools, Hermes can load tools dynamically from MCP server
| Tool | Description | Requires environment |
|------|-------------|----------------------|
| `image_generate` | Generate high-quality images from text prompts using FLUX 2 Pro model with automatic 2x upscaling. Creates detailed, artistic images that are automatically upscaled for hi-rez results. Returns a single upscaled image URL. Display it using… | FAL_KEY |
| `image_generate` | Generate high-quality images from text prompts using FAL.ai. The underlying model is user-configured (default: FLUX 2 Klein 9B, sub-1s generation) and is not selectable by the agent. Returns a single image URL. Display it using… | FAL_KEY |
description: Generate high-quality images using FLUX 2 Pro with automatic upscaling via FAL.ai.
description: Generate images via FAL.ai — 8 models including FLUX 2, GPT-Image, Nano Banana, Ideogram, and more, selectable via `hermes tools`.
sidebar_label: Image Generation
sidebar_position: 6
---
# Image Generation
Hermes Agent can generate images from text prompts using FAL.ai's **FLUX 2 Pro** model with automatic 2x upscaling via the **Clarity Upscaler** for enhanced quality.
Hermes Agent generates images from text prompts via FAL.ai. Eight models are supported out of the box, each with different speed, quality, and cost tradeoffs. The active model is user-configurable via `hermes tools` and persists in `config.yaml`.
If you have a paid [Nous Portal](https://portal.nousresearch.com) subscription, you can use imagegenerationthroughthe **[Tool Gateway](tool-gateway.md)**without a FAL API key. Run `hermes model`or `hermes tools` to enable it.
If youhavea paid [Nous Portal](https://portal.nousresearch.com) subscription,youcanuse image generation through the**[Tool Gateway](tool-gateway.md)** without aFALAPIkey.Your modelselectionpersistsacrossbothpaths.
You can also use the raw FLUX 2 Pro size presets directly: `square_hd`, `square`, `portrait_4_3`, `portrait_16_9`, `landscape_4_3`, `landscape_16_9`. Custom sizes up to 2048x2048 are also supported.
The upscaler enhances detail and resolution while preserving the original composition. If the upscaler fails (networkissue, ratelimit), the original resolution image is returnedautomatically.
Ifupscalingfails(network issue, rate limit),theoriginalimageis returned automatically.
## Example Prompts
## How It Works Internally
Here aresome effective prompts to try:
```
A candid street photo of a woman with a pink bob and bold eyeliner
```
```
Modern architecture building with glass facade, sunset lighting
```
```
Abstract art with vibrant colors and geometric patterns
```
```
Portrait of a wise old owl perched on ancient tree branch
```
```
Futuristic cityscape with flying cars and neon lights
Debuglogs are saved to `./logs/image_tools_debug_<session_id>.json` with details about each generation request, parameters, timing, and any errors.
## Safety Settings
The image generation tool runs with safety checks disabled by default (`safety_tolerance: 5`, the most permissive setting). This is configured at the code level and is not user-adjustable.
Debug logs go to`./logs/image_tools_debug_<session_id>.json`with per-calldetails(model,parameters,timing, errors).
## Platform Delivery
Generated images are delivered differently depending on the platform:
| Platform | Delivery method |
|----------|----------------|
|**CLI** | Image URL printed as markdown `` — click to open in browser |
|**Telegram** | Image sent as a photo message with the prompt as caption |
|**Discord** | Image embedded in a message |
|**Slack** | Image URL in message (Slack unfurls it) |
| **WhatsApp** | Image sent as a media message |
| **Other platforms** | Image URL in plain text |
The agent uses `MEDIA:<url>` syntax in its response, which the platform adapter converts to the appropriate format.
@@ -30,7 +30,7 @@ Hermes Agent includes a rich set of capabilities that extend far beyond basic ch
- **[Voice Mode](voice-mode.md)** — Full voice interaction across CLI and messaging platforms. Talk to the agent using your microphone, hear spoken replies, and have live voice conversations in Discord voice channels.
- **[Browser Automation](browser.md)** — Full browser automation with multiple backends: Browserbase cloud, Browser Use cloud, local Chrome via CDP, or local Chromium. Navigate websites, fill forms, and extract information.
- **[Vision & Image Paste](vision.md)** — Multimodal vision support. Paste images from your clipboard into the CLI and ask the agent to analyze, describe, or work with them using any vision-capable model.
- **[Image Generation](image-generation.md)** — Generate images from text prompts using FAL.ai's FLUX 2 Pro model with automatic 2x upscaling via the Clarity Upscaler.
- **[Image Generation](image-generation.md)** — Generate images from text prompts using FAL.ai. Eight models supported (FLUX 2 Klein/Pro, GPT-Image 1.5, Nano Banana, Ideogram V3, Recraft V3, Qwen, Z-Image Turbo); pick one via `hermes tools`.
- **[Voice & TTS](tts.md)** — Text-to-speech output and voice message transcription across all messaging platforms, with five provider options: Edge TTS (free), ElevenLabs, OpenAI TTS, MiniMax, and NeuTTS.
| **Text-to-speech** | Convert text to speech via OpenAI TTS | `VOICE_TOOLS_OPENAI_KEY`, `ELEVENLABS_API_KEY` |
| **Browser automation** | Control cloud browsers via Browser Use | `BROWSER_USE_API_KEY`, `BROWSERBASE_API_KEY` |
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.