fix(tools): isolate get_tool_definitions quiet_mode cache + dedup LCM injection (#17335)
Long-lived Gateway processes were sending duplicate tool names to providers that enforce uniqueness: - DeepSeek: 'Tool names must be unique.' - Xiaomi MiMo: 'tools contains duplicate names: lcm_expand' - Moonshot/Kimi: 'function name lcm_grep is duplicated' TUI was unaffected because TUI runs with quiet_mode=False and skips the cache entirely. Root cause (two layered bugs) - model_tools.get_tool_definitions(quiet_mode=True) memoizes its result in _tool_defs_cache. The cache-hit path returned list(cached) (safe), but the FIRST uncached call stored and returned the SAME object. run_agent.py mutates self.tools (memory + LCM context-engine schemas) in-place, so the very first agent init in a Gateway process poisoned the cache, and every subsequent init appended LCM schemas again on top of the already-polluted list. - run_agent.py's context-engine injection (lcm_grep / lcm_describe / lcm_expand) had no dedup, unlike the memory-tools injection right above it which already skips already-present names. Fix (defense in depth, per the issue's suggested fix) - model_tools.get_tool_definitions: on the uncached branch, cache the computed list but return list(result) to the caller. Same pattern as the cache-hit path. - run_agent.py: build _existing_tool_names from self.tools and skip schemas whose names are already present, mirroring the memory-tools block. This also defends against plugin paths that may register the same schemas via ctx.register_tool(). Tests (tests/test_get_tool_definitions_cache_isolation.py) - test_first_uncached_call_returns_fresh_list \u2014 pins the fix; without it, first-call alias caused all the symptoms. - test_cache_hit_returns_fresh_list \u2014 pre-existing behavior stays. - test_caller_mutation_does_not_poison_cache \u2014 simulates run_agent appending lcm_grep / lcm_expand to the returned list and asserts the next call doesn't see them. - test_repeated_caller_mutation_does_not_accumulate \u2014 reproduces the long-lived Gateway accumulation pattern across 5 agent inits. - test_non_quiet_mode_does_not_use_cache \u2014 sanity, explains why TUI was fine. 5/5 pass on the new file; 23/23 still pass on tests/test_model_tools.py.
This commit is contained in:
19
run_agent.py
19
run_agent.py
@@ -2001,16 +2001,31 @@ class AIAgent:
|
||||
f"model.context_length in config.yaml to override."
|
||||
)
|
||||
|
||||
# Inject context engine tool schemas (e.g. lcm_grep, lcm_describe, lcm_expand)
|
||||
# Inject context engine tool schemas (e.g. lcm_grep, lcm_describe, lcm_expand).
|
||||
# Skip names that are already present — the get_tool_definitions()
|
||||
# quiet_mode cache returned a shared list pre-#17335, so a stray
|
||||
# mutation here would poison subsequent agent inits in the same
|
||||
# Gateway process and trip provider-side 'duplicate tool name'
|
||||
# errors. Even with the cache fix, dedup is the right defense
|
||||
# against plugin paths that may register the same schemas via
|
||||
# ctx.register_tool(). Mirrors the memory tools dedup above.
|
||||
self._context_engine_tool_names: set = set()
|
||||
if hasattr(self, "context_compressor") and self.context_compressor and self.tools is not None:
|
||||
_existing_tool_names = {
|
||||
t.get("function", {}).get("name")
|
||||
for t in self.tools
|
||||
if isinstance(t, dict)
|
||||
}
|
||||
for _schema in self.context_compressor.get_tool_schemas():
|
||||
_tname = _schema.get("name", "")
|
||||
if _tname and _tname in _existing_tool_names:
|
||||
continue # already registered via plugin/cache path
|
||||
_wrapped = {"type": "function", "function": _schema}
|
||||
self.tools.append(_wrapped)
|
||||
_tname = _schema.get("name", "")
|
||||
if _tname:
|
||||
self.valid_tool_names.add(_tname)
|
||||
self._context_engine_tool_names.add(_tname)
|
||||
_existing_tool_names.add(_tname)
|
||||
|
||||
# Notify context engine of session start
|
||||
if hasattr(self, "context_compressor") and self.context_compressor:
|
||||
|
||||
Reference in New Issue
Block a user