fix(provider): make config.yaml model.provider the single source of truth (#31222)
Policy: if it ain't a secret it goes in config.yaml. HERMES_INFERENCE_PROVIDER was leaking behavioral config into the .env surface, including from the gateway, which bypassed config.yaml entirely. Behavior: - gateway/run.py: drop HERMES_INFERENCE_PROVIDER read in _resolve_runtime_agent_kwargs. Gateway now flows through resolve_runtime_provider() with no `requested` override, which reads model.provider from config.yaml first. Docs/UX (strip env var from user-facing surface): - --provider help text no longer mentions the env var - cli-config.yaml.example same - reference/environment-variables.md: remove HERMES_INFERENCE_PROVIDER row and the cross-reference from HERMES_INFERENCE_MODEL - reference/cli-commands.md: blank the env-var column for --provider - guides/xai-grok-oauth.md, guides/minimax-oauth.md: replace HERMES_INFERENCE_PROVIDER=x hermes invocations with config.yaml / --provider - developer-guide/adding-providers.md, model-provider-plugin.md: reframe Internal mechanism (kept as-is): - hermes_cli/main.py writes HERMES_INFERENCE_PROVIDER into the TUI subprocess env - tui_gateway/server.py reads it on TUI startup - resolve_requested_provider() / oneshot.py / cli.py still fall through to the env var as a last-resort behind config.yaml, which is what makes the TUI parent->child handoff work This stays. We just stop documenting it as a user knob. Tests: tests/gateway/test_auth_fallback.py — simplify mock to fail on first call, succeed on second; drop monkeypatch.setenv lines that no longer matter. Supersedes #31064 (closed with credit to @novax635 who surfaced the underlying issue but proposed aligning gateway *to* the env var rather than removing it).
This commit is contained in:
@@ -39,7 +39,7 @@ model:
|
|||||||
# LM Studio is first-class and uses provider: "lmstudio".
|
# LM Studio is first-class and uses provider: "lmstudio".
|
||||||
# It works with both no-auth and auth-enabled server modes.
|
# It works with both no-auth and auth-enabled server modes.
|
||||||
#
|
#
|
||||||
# Can also be overridden with --provider flag or HERMES_INFERENCE_PROVIDER env var.
|
# Can also be overridden for a single invocation with the --provider flag.
|
||||||
provider: "auto"
|
provider: "auto"
|
||||||
|
|
||||||
# API configuration (falls back to OPENROUTER_API_KEY env var)
|
# API configuration (falls back to OPENROUTER_API_KEY env var)
|
||||||
|
|||||||
@@ -962,6 +962,12 @@ _AGENT_PENDING_SENTINEL = object()
|
|||||||
def _resolve_runtime_agent_kwargs() -> dict:
|
def _resolve_runtime_agent_kwargs() -> dict:
|
||||||
"""Resolve provider credentials for gateway-created AIAgent instances.
|
"""Resolve provider credentials for gateway-created AIAgent instances.
|
||||||
|
|
||||||
|
Provider is read from ``config.yaml`` ``model.provider`` (the single
|
||||||
|
source of truth). ``resolve_runtime_provider()`` falls through to env
|
||||||
|
var lookups internally for legacy compatibility, but the gateway does
|
||||||
|
not consult environment variables for behavioral config — config.yaml
|
||||||
|
is authoritative.
|
||||||
|
|
||||||
If the primary provider fails with an authentication error, attempt to
|
If the primary provider fails with an authentication error, attempt to
|
||||||
resolve credentials using the fallback provider chain from config.yaml
|
resolve credentials using the fallback provider chain from config.yaml
|
||||||
before giving up.
|
before giving up.
|
||||||
@@ -973,9 +979,7 @@ def _resolve_runtime_agent_kwargs() -> dict:
|
|||||||
from hermes_cli.auth import AuthError
|
from hermes_cli.auth import AuthError
|
||||||
|
|
||||||
try:
|
try:
|
||||||
runtime = resolve_runtime_provider(
|
runtime = resolve_runtime_provider()
|
||||||
requested=os.getenv("HERMES_INFERENCE_PROVIDER"),
|
|
||||||
)
|
|
||||||
except AuthError as auth_exc:
|
except AuthError as auth_exc:
|
||||||
# Primary provider auth failed (expired token, revoked key, etc.).
|
# Primary provider auth failed (expired token, revoked key, etc.).
|
||||||
# Try the fallback provider chain before raising.
|
# Try the fallback provider chain before raising.
|
||||||
|
|||||||
@@ -129,7 +129,8 @@ def build_top_level_parser():
|
|||||||
default=None,
|
default=None,
|
||||||
help=(
|
help=(
|
||||||
"Provider override for this invocation (e.g. openrouter, anthropic). "
|
"Provider override for this invocation (e.g. openrouter, anthropic). "
|
||||||
"Applies to -z/--oneshot and --tui. Also settable via HERMES_INFERENCE_PROVIDER env var."
|
"Applies to -z/--oneshot and --tui. The persistent provider lives in config.yaml "
|
||||||
|
"under model.provider — use `hermes setup` or edit the file to change it."
|
||||||
),
|
),
|
||||||
)
|
)
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
|
|||||||
@@ -17,7 +17,6 @@ Model / provider selection mirrors `hermes chat`:
|
|||||||
|
|
||||||
Env var fallbacks (used when the corresponding arg is not passed):
|
Env var fallbacks (used when the corresponding arg is not passed):
|
||||||
- HERMES_INFERENCE_MODEL
|
- HERMES_INFERENCE_MODEL
|
||||||
- HERMES_INFERENCE_PROVIDER (already read by resolve_runtime_provider)
|
|
||||||
"""
|
"""
|
||||||
|
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
@@ -135,9 +134,8 @@ def run_oneshot(
|
|||||||
prompt: The user message to send.
|
prompt: The user message to send.
|
||||||
model: Optional model override. Falls back to HERMES_INFERENCE_MODEL
|
model: Optional model override. Falls back to HERMES_INFERENCE_MODEL
|
||||||
env var, then config.yaml's model.default / model.model.
|
env var, then config.yaml's model.default / model.model.
|
||||||
provider: Optional provider override. Falls back to
|
provider: Optional provider override. Falls back to config.yaml's
|
||||||
HERMES_INFERENCE_PROVIDER env var, then config.yaml's model.provider,
|
model.provider, then "auto".
|
||||||
then "auto".
|
|
||||||
toolsets: Optional comma-separated string or iterable of toolsets.
|
toolsets: Optional comma-separated string or iterable of toolsets.
|
||||||
|
|
||||||
Returns the exit code. Caller should sys.exit() with the return.
|
Returns the exit code. Caller should sys.exit() with the return.
|
||||||
|
|||||||
@@ -27,8 +27,11 @@ class TestResolveRuntimeAgentKwargsAuthFallback:
|
|||||||
|
|
||||||
def _mock_resolve(**kwargs):
|
def _mock_resolve(**kwargs):
|
||||||
call_count["n"] += 1
|
call_count["n"] += 1
|
||||||
requested = kwargs.get("requested", "")
|
# First call = primary path (gateway reads model.provider from
|
||||||
if requested and "codex" in str(requested).lower():
|
# config.yaml internally; we simulate the auth failure here).
|
||||||
|
# Second call = fallback path with explicit_api_key + explicit_base_url
|
||||||
|
# supplied by gateway from fallback_model config.
|
||||||
|
if call_count["n"] == 1:
|
||||||
raise AuthError("Codex token refresh failed with status 401")
|
raise AuthError("Codex token refresh failed with status 401")
|
||||||
return {
|
return {
|
||||||
"api_key": "fallback-key",
|
"api_key": "fallback-key",
|
||||||
@@ -40,8 +43,6 @@ class TestResolveRuntimeAgentKwargsAuthFallback:
|
|||||||
"credential_pool": None,
|
"credential_pool": None,
|
||||||
}
|
}
|
||||||
|
|
||||||
monkeypatch.setenv("HERMES_INFERENCE_PROVIDER", "openai-codex")
|
|
||||||
|
|
||||||
with patch(
|
with patch(
|
||||||
"hermes_cli.runtime_provider.resolve_runtime_provider",
|
"hermes_cli.runtime_provider.resolve_runtime_provider",
|
||||||
side_effect=_mock_resolve,
|
side_effect=_mock_resolve,
|
||||||
@@ -62,7 +63,6 @@ class TestResolveRuntimeAgentKwargsAuthFallback:
|
|||||||
config_path.write_text("model:\n provider: openai-codex\n")
|
config_path.write_text("model:\n provider: openai-codex\n")
|
||||||
|
|
||||||
monkeypatch.setattr("gateway.run._hermes_home", tmp_path)
|
monkeypatch.setattr("gateway.run._hermes_home", tmp_path)
|
||||||
monkeypatch.setenv("HERMES_INFERENCE_PROVIDER", "openai-codex")
|
|
||||||
|
|
||||||
with patch(
|
with patch(
|
||||||
"hermes_cli.runtime_provider.resolve_runtime_provider",
|
"hermes_cli.runtime_provider.resolve_runtime_provider",
|
||||||
|
|||||||
@@ -116,7 +116,7 @@ When you add a plugin and it calls `register_provider()`, the following wire up
|
|||||||
8. `hermes setup` wizard delegates to `main.py` automatically
|
8. `hermes setup` wizard delegates to `main.py` automatically
|
||||||
9. `provider:model` alias syntax works
|
9. `provider:model` alias syntax works
|
||||||
10. Runtime resolver returns the correct `base_url` and `api_key`
|
10. Runtime resolver returns the correct `base_url` and `api_key`
|
||||||
11. `HERMES_INFERENCE_PROVIDER` env-var override accepts the provider id
|
11. `--provider <name>` CLI flag accepts the provider id
|
||||||
12. Fallback model activation can switch into the provider cleanly
|
12. Fallback model activation can switch into the provider cleanly
|
||||||
|
|
||||||
User plugins at `$HERMES_HOME/plugins/model-providers/<name>/` override bundled plugins of the same name (last-writer-wins in `register_provider()`) — so third parties can monkey-patch or replace any built-in profile without editing the repo.
|
User plugins at `$HERMES_HOME/plugins/model-providers/<name>/` override bundled plugins of the same name (last-writer-wins in `register_provider()`) — so third parties can monkey-patch or replace any built-in profile without editing the repo.
|
||||||
|
|||||||
@@ -89,7 +89,7 @@ Full definition in `providers/base.py`. The most useful ones:
|
|||||||
|
|
||||||
| Field | Type | Purpose |
|
| Field | Type | Purpose |
|
||||||
|---|---|---|
|
|---|---|---|
|
||||||
| `name` | str | Canonical id — matches `--provider` choices and `HERMES_INFERENCE_PROVIDER` |
|
| `name` | str | Canonical id — matches `model.provider` in `config.yaml` and the `--provider` flag |
|
||||||
| `aliases` | `tuple[str, ...]` | Alternative names resolved by `get_provider_profile()` (e.g. `grok` → `xai`) |
|
| `aliases` | `tuple[str, ...]` | Alternative names resolved by `get_provider_profile()` (e.g. `grok` → `xai`) |
|
||||||
| `api_mode` | str | `chat_completions` \| `codex_responses` \| `anthropic_messages` \| `bedrock_converse` |
|
| `api_mode` | str | `chat_completions` \| `codex_responses` \| `anthropic_messages` \| `bedrock_converse` |
|
||||||
| `display_name` | str | Human label shown in `hermes model` picker |
|
| `display_name` | str | Human label shown in `hermes model` picker |
|
||||||
|
|||||||
@@ -157,10 +157,10 @@ The `minimax-oauth` provider does **not** use `MINIMAX_API_KEY` or `MINIMAX_BASE
|
|||||||
| `MINIMAX_API_KEY` | Used by `minimax` provider only — ignored for `minimax-oauth` |
|
| `MINIMAX_API_KEY` | Used by `minimax` provider only — ignored for `minimax-oauth` |
|
||||||
| `MINIMAX_CN_API_KEY` | Used by `minimax-cn` provider only — ignored for `minimax-oauth` |
|
| `MINIMAX_CN_API_KEY` | Used by `minimax-cn` provider only — ignored for `minimax-oauth` |
|
||||||
|
|
||||||
To force the `minimax-oauth` provider at runtime:
|
To use `minimax-oauth` as the active provider, set `model.provider: minimax-oauth` in `config.yaml` (use `hermes setup` for the guided flow), or pass `--provider minimax-oauth` for a single invocation:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
HERMES_INFERENCE_PROVIDER=minimax-oauth hermes
|
hermes --provider minimax-oauth
|
||||||
```
|
```
|
||||||
|
|
||||||
## Models
|
## Models
|
||||||
|
|||||||
@@ -190,7 +190,8 @@ The chat catalog is derived live from the on-disk `models.dev` cache; new xAI re
|
|||||||
| Variable | Effect |
|
| Variable | Effect |
|
||||||
|----------|--------|
|
|----------|--------|
|
||||||
| `XAI_BASE_URL` | Override the default `https://api.x.ai/v1` endpoint (rarely needed). |
|
| `XAI_BASE_URL` | Override the default `https://api.x.ai/v1` endpoint (rarely needed). |
|
||||||
| `HERMES_INFERENCE_PROVIDER` | Force the active provider at runtime, e.g. `HERMES_INFERENCE_PROVIDER=xai-oauth hermes`. |
|
|
||||||
|
To select xAI as the active provider, set `model.provider: xai-oauth` in `config.yaml` (use `hermes setup` for the guided flow) or pass `--provider xai-oauth` for a single invocation.
|
||||||
|
|
||||||
## Troubleshooting
|
## Troubleshooting
|
||||||
|
|
||||||
|
|||||||
@@ -138,7 +138,7 @@ Per-run overrides (no mutation to `~/.hermes/config.yaml`):
|
|||||||
| Flag | Equivalent env var | Purpose |
|
| Flag | Equivalent env var | Purpose |
|
||||||
|---|---|---|
|
|---|---|---|
|
||||||
| `-m` / `--model <model>` | `HERMES_INFERENCE_MODEL` | Override the model for this run |
|
| `-m` / `--model <model>` | `HERMES_INFERENCE_MODEL` | Override the model for this run |
|
||||||
| `--provider <provider>` | `HERMES_INFERENCE_PROVIDER` | Override the provider for this run |
|
| `--provider <provider>` | _(none)_ | Override the provider for this run |
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
hermes -z "…" --provider openrouter --model openai/gpt-5.5
|
hermes -z "…" --provider openrouter --model openai/gpt-5.5
|
||||||
|
|||||||
@@ -113,7 +113,6 @@ For native Anthropic auth, Hermes prefers Claude Code's own credential files whe
|
|||||||
|
|
||||||
| Variable | Description |
|
| Variable | Description |
|
||||||
|----------|-------------|
|
|----------|-------------|
|
||||||
| `HERMES_INFERENCE_PROVIDER` | Override provider selection: `auto`, `custom`, `openrouter`, `nous`, `openai-codex`, `copilot`, `copilot-acp`, `anthropic`, `huggingface`, `novita`, `gemini`, `zai`, `kimi-coding`, `kimi-coding-cn`, `minimax`, `minimax-cn`, `minimax-oauth` (browser OAuth login — no API key required; see [MiniMax OAuth guide](../guides/minimax-oauth.md)), `kilocode`, `xiaomi`, `arcee`, `gmi`, `stepfun`, `alibaba`, `alibaba-coding-plan` (alias `alibaba_coding`), `deepseek`, `nvidia`, `ollama-cloud`, `xai` (alias `grok`), `xai-oauth` (browser OAuth login for SuperGrok subscribers — no API key required; see [xAI Grok OAuth guide](../guides/xai-grok-oauth.md)), `google-gemini-cli`, `qwen-oauth`, `bedrock`, `opencode-zen`, `opencode-go`, `ai-gateway`, `tencent-tokenhub` (default: `auto`) |
|
|
||||||
| `HERMES_PORTAL_BASE_URL` | Override Nous Portal URL (for development/testing) |
|
| `HERMES_PORTAL_BASE_URL` | Override Nous Portal URL (for development/testing) |
|
||||||
| `NOUS_INFERENCE_BASE_URL` | Override Nous inference API URL |
|
| `NOUS_INFERENCE_BASE_URL` | Override Nous inference API URL |
|
||||||
| `HERMES_NOUS_MIN_KEY_TTL_SECONDS` | Min agent key TTL before re-mint (default: 1800 = 30min) |
|
| `HERMES_NOUS_MIN_KEY_TTL_SECONDS` | Min agent key TTL before re-mint (default: 1800 = 30min) |
|
||||||
@@ -589,7 +588,7 @@ Advanced per-platform knobs for throttling the outbound message batcher. Most us
|
|||||||
| `HERMES_TUI_DIR` | Path to a prebuilt `ui-tui/` directory (must contain `dist/entry.js` and populated `node_modules`). Used by distros and Nix to skip the first-launch `npm install`. |
|
| `HERMES_TUI_DIR` | Path to a prebuilt `ui-tui/` directory (must contain `dist/entry.js` and populated `node_modules`). Used by distros and Nix to skip the first-launch `npm install`. |
|
||||||
| `HERMES_TUI_RESUME` | Resume a specific TUI session by ID on launch. When set, `hermes --tui` skips forging a fresh session and picks up the named session instead — useful for re-attaching after a disconnect or terminal crash. |
|
| `HERMES_TUI_RESUME` | Resume a specific TUI session by ID on launch. When set, `hermes --tui` skips forging a fresh session and picks up the named session instead — useful for re-attaching after a disconnect or terminal crash. |
|
||||||
| `HERMES_TUI_THEME` | Force the TUI color theme: `light`, `dark`, or a raw 6-character background hex (e.g. `ffffff` or `1a1a2e`). When unset, Hermes auto-detects using `COLORFGBG` and terminal background queries; this variable overrides detection on terminals (Ghostty, Warp, iTerm2, etc.) that don't set `COLORFGBG`. |
|
| `HERMES_TUI_THEME` | Force the TUI color theme: `light`, `dark`, or a raw 6-character background hex (e.g. `ffffff` or `1a1a2e`). When unset, Hermes auto-detects using `COLORFGBG` and terminal background queries; this variable overrides detection on terminals (Ghostty, Warp, iTerm2, etc.) that don't set `COLORFGBG`. |
|
||||||
| `HERMES_INFERENCE_MODEL` | Force the model for `hermes -z` / `hermes chat` without mutating `config.yaml`. Pairs with `HERMES_INFERENCE_PROVIDER`. Useful for scripted callers (sweeper, CI, batch runners) that need to override the default model per run. |
|
| `HERMES_INFERENCE_MODEL` | Force the model for `hermes -z` / `hermes chat` without mutating `config.yaml`. Pairs with the `--provider` flag. Useful for scripted callers (sweeper, CI, batch runners) that need to override the default model per run. |
|
||||||
|
|
||||||
## Session Settings
|
## Session Settings
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user