fix(compression): exclude completion tokens from compression trigger (#12026)

Cherry-picked from PR #12481 by @Sanjays2402. Reasoning models (GLM-5.1, QwQ, DeepSeek R1) inflate completion_tokens with internal thinking tokens. The compression trigger summed prompt_tokens + completion_tokens, causing premature compression at ~42% actual context usage instead of the configured 50% threshold. Now uses only prompt_tokens — completion tokens don't consume context window space for the next API call. - 3 new regression tests - Added AUTHOR_MAP entry for @Sanjays2402 Closes #12026
2026-04-20 05:06:04 -07:00
parent 42c30985c7
commit 570f8bab8f
3 changed files with 68 additions and 4 deletions
--- a/scripts/release.py
+++ b/scripts/release.py
@@ -177,6 +177,7 @@ AUTHOR_MAP = {
    "364939526@qq.com": "luyao618",
    "hgk324@gmail.com": "houziershi",
    "176644217+PStarH@users.noreply.github.com": "PStarH",
+    "51058514+Sanjays2402@users.noreply.github.com": "Sanjays2402",
    "906014227@qq.com": "bingo906",
    "aaronwong1999@icloud.com": "AaronWong1999",
    "agents@kylefrench.dev": "DeployFaith",