fix(provider): make config.yaml model.provider the single source of truth (#31222)

Policy: if it ain't a secret it goes in config.yaml. HERMES_INFERENCE_PROVIDER was leaking behavioral config into the .env surface, including from the gateway, which bypassed config.yaml entirely. Behavior: - gateway/run.py: drop HERMES_INFERENCE_PROVIDER read in _resolve_runtime_agent_kwargs. Gateway now flows through resolve_runtime_provider() with no `requested` override, which reads model.provider from config.yaml first. Docs/UX (strip env var from user-facing surface): - --provider help text no longer mentions the env var - cli-config.yaml.example same - reference/environment-variables.md: remove HERMES_INFERENCE_PROVIDER row and the cross-reference from HERMES_INFERENCE_MODEL - reference/cli-commands.md: blank the env-var column for --provider - guides/xai-grok-oauth.md, guides/minimax-oauth.md: replace HERMES_INFERENCE_PROVIDER=x hermes invocations with config.yaml / --provider - developer-guide/adding-providers.md, model-provider-plugin.md: reframe Internal mechanism (kept as-is): - hermes_cli/main.py writes HERMES_INFERENCE_PROVIDER into the TUI subprocess env - tui_gateway/server.py reads it on TUI startup - resolve_requested_provider() / oneshot.py / cli.py still fall through to the env var as a last-resort behind config.yaml, which is what makes the TUI parent->child handoff work This stays. We just stop documenting it as a user knob. Tests: tests/gateway/test_auth_fallback.py — simplify mock to fail on first call, succeed on second; drop monkeypatch.setenv lines that no longer matter. Supersedes #31064 (closed with credit to @novax635 who surfaced the underlying issue but proposed aligning gateway *to* the env var rather than removing it).
2026-05-23 18:18:41 -07:00
parent 7a4dc8e8d6
commit e42fcc5625
11 changed files with 25 additions and 22 deletions
--- a/tests/gateway/test_auth_fallback.py
+++ b/tests/gateway/test_auth_fallback.py
@@ -27,8 +27,11 @@ class TestResolveRuntimeAgentKwargsAuthFallback:

        def _mock_resolve(**kwargs):
            call_count["n"] += 1
-            requested = kwargs.get("requested", "")
-            if requested and "codex" in str(requested).lower():
+            # First call = primary path (gateway reads model.provider from
+            # config.yaml internally; we simulate the auth failure here).
+            # Second call = fallback path with explicit_api_key + explicit_base_url
+            # supplied by gateway from fallback_model config.
+            if call_count["n"] == 1:
                raise AuthError("Codex token refresh failed with status 401")
            return {
                "api_key": "fallback-key",
@@ -40,8 +43,6 @@ class TestResolveRuntimeAgentKwargsAuthFallback:
                "credential_pool": None,
            }

-        monkeypatch.setenv("HERMES_INFERENCE_PROVIDER", "openai-codex")
-
        with patch(
            "hermes_cli.runtime_provider.resolve_runtime_provider",
            side_effect=_mock_resolve,
@@ -62,7 +63,6 @@ class TestResolveRuntimeAgentKwargsAuthFallback:
        config_path.write_text("model:\n  provider: openai-codex\n")

        monkeypatch.setattr("gateway.run._hermes_home", tmp_path)
-        monkeypatch.setenv("HERMES_INFERENCE_PROVIDER", "openai-codex")

        with patch(
            "hermes_cli.runtime_provider.resolve_runtime_provider",