fix(provider): make config.yaml model.provider the single source of truth (#31222)

Policy: if it ain't a secret it goes in config.yaml. HERMES_INFERENCE_PROVIDER
was leaking behavioral config into the .env surface, including from the gateway,
which bypassed config.yaml entirely.

Behavior:
- gateway/run.py: drop HERMES_INFERENCE_PROVIDER read in _resolve_runtime_agent_kwargs.
  Gateway now flows through resolve_runtime_provider() with no `requested` override,
  which reads model.provider from config.yaml first.

Docs/UX (strip env var from user-facing surface):
- --provider help text no longer mentions the env var
- cli-config.yaml.example same
- reference/environment-variables.md: remove HERMES_INFERENCE_PROVIDER row and
  the cross-reference from HERMES_INFERENCE_MODEL
- reference/cli-commands.md: blank the env-var column for --provider
- guides/xai-grok-oauth.md, guides/minimax-oauth.md: replace
  HERMES_INFERENCE_PROVIDER=x hermes invocations with config.yaml / --provider
- developer-guide/adding-providers.md, model-provider-plugin.md: reframe

Internal mechanism (kept as-is):
- hermes_cli/main.py writes HERMES_INFERENCE_PROVIDER into the TUI subprocess env
- tui_gateway/server.py reads it on TUI startup
- resolve_requested_provider() / oneshot.py / cli.py still fall through to the
  env var as a last-resort behind config.yaml, which is what makes the TUI
  parent->child handoff work
This stays. We just stop documenting it as a user knob.

Tests: tests/gateway/test_auth_fallback.py — simplify mock to fail on first
call, succeed on second; drop monkeypatch.setenv lines that no longer matter.

Supersedes #31064 (closed with credit to @novax635 who surfaced the underlying
issue but proposed aligning gateway *to* the env var rather than removing it).
This commit is contained in:
Teknium
2026-05-23 18:18:41 -07:00
committed by GitHub
parent 7a4dc8e8d6
commit e42fcc5625
11 changed files with 25 additions and 22 deletions

View File

@@ -27,8 +27,11 @@ class TestResolveRuntimeAgentKwargsAuthFallback:
def _mock_resolve(**kwargs):
call_count["n"] += 1
requested = kwargs.get("requested", "")
if requested and "codex" in str(requested).lower():
# First call = primary path (gateway reads model.provider from
# config.yaml internally; we simulate the auth failure here).
# Second call = fallback path with explicit_api_key + explicit_base_url
# supplied by gateway from fallback_model config.
if call_count["n"] == 1:
raise AuthError("Codex token refresh failed with status 401")
return {
"api_key": "fallback-key",
@@ -40,8 +43,6 @@ class TestResolveRuntimeAgentKwargsAuthFallback:
"credential_pool": None,
}
monkeypatch.setenv("HERMES_INFERENCE_PROVIDER", "openai-codex")
with patch(
"hermes_cli.runtime_provider.resolve_runtime_provider",
side_effect=_mock_resolve,
@@ -62,7 +63,6 @@ class TestResolveRuntimeAgentKwargsAuthFallback:
config_path.write_text("model:\n provider: openai-codex\n")
monkeypatch.setattr("gateway.run._hermes_home", tmp_path)
monkeypatch.setenv("HERMES_INFERENCE_PROVIDER", "openai-codex")
with patch(
"hermes_cli.runtime_provider.resolve_runtime_provider",