fix(delegation): increase heartbeat stale thresholds
The heartbeat stale detection was too aggressive: - idle: 5 * 30s = 150s — LLM inference on slow providers (Zhipu/GLM) frequently exceeds 150s, causing heartbeat to stop prematurely - in-tool: 20 * 30s = 600s — borderline for long tool calls When heartbeat stops, parent._last_activity_ts freezes, eventually triggering gateway timeout and killing the entire delegation. New thresholds: - idle: 15 * 30s = 450s — accommodates slow LLM inference - in-tool: 40 * 30s = 1200s — accommodates long-running tool calls child_timeout_seconds (config: delegation.child_timeout_seconds) remains the hard cap for total delegation duration.
This commit is contained in:
@@ -483,8 +483,8 @@ _HEARTBEAT_INTERVAL = 30 # seconds between parent activity heartbeats during de
|
||||
# The idle ceiling stays tight so genuinely stuck children don't mask the gateway
|
||||
# timeout. The in-tool ceiling is much higher so legit long-running tools get
|
||||
# time to finish; child_timeout_seconds (default 600s) is still the hard cap.
|
||||
_HEARTBEAT_STALE_CYCLES_IDLE = 5 # 5 * 30s = 150s idle between turns → stale
|
||||
_HEARTBEAT_STALE_CYCLES_IN_TOOL = 20 # 20 * 30s = 600s stuck on same tool → stale
|
||||
_HEARTBEAT_STALE_CYCLES_IDLE = 15 # 15 * 30s = 450s idle between turns → stale
|
||||
_HEARTBEAT_STALE_CYCLES_IN_TOOL = 40 # 40 * 30s = 1200s stuck on same tool → stale
|
||||
DEFAULT_TOOLSETS = ["terminal", "file", "web"]
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user