fix(agent): disable SDK retries on per-request OpenAI clients
Per-request OpenAI-wire clients (used by both non-streaming and streaming chat-completions paths in _interruptible_api_call) should not run the SDK's built-in retry loop: the agent's outer loop owns retries with credential rotation, provider fallback, and backoff that the SDK can't see. Leaving SDK retries on (default 2) compounds with our outer retries and lets a single hung provider request stretch to ~3x the per-call timeout before our stale detector reports it. Shared/primary clients and Anthropic / Bedrock paths are unaffected (they don't go through here). Salvage of #15811 core improvement — the timeout push-down in the original PR required scaffolding that has since been refactored on main, so only the max_retries=0 change is preserved. Co-authored-by: QifengKuang <k2767567815@gmail.com>
This commit is contained in:
11
run_agent.py
11
run_agent.py
@@ -5816,6 +5816,17 @@ class AIAgent:
|
||||
return primary_client
|
||||
with self._openai_client_lock():
|
||||
request_kwargs = dict(self._client_kwargs)
|
||||
# Per-request OpenAI-wire clients (used by both the non-streaming
|
||||
# chat-completions path and the streaming chat-completions path
|
||||
# in `_interruptible_api_call`) should not run the SDK's built-in
|
||||
# retry loop: the agent's outer loop owns retries with credential
|
||||
# rotation, provider fallback, and backoff that the SDK can't
|
||||
# see. Leaving SDK retries on (default 2) compounds with our outer
|
||||
# retries and lets a single hung provider request stretch to ~3x
|
||||
# the per-call timeout before our stale detector reports it.
|
||||
# Shared/primary clients and Anthropic / Bedrock paths are
|
||||
# unaffected (they don't go through here).
|
||||
request_kwargs["max_retries"] = 0
|
||||
if (
|
||||
base_url_host_matches(str(request_kwargs.get("base_url", "")), "api.githubcopilot.com")
|
||||
and self._api_kwargs_have_image_parts(api_kwargs or {})
|
||||
|
||||
Reference in New Issue
Block a user