After a clean SIGUSR1 drain, cmd_update passively polled for systemd's
auto-restart to fire. Our unit file sets RestartSec=60 (a crash-loop
guard), so the voluntary-restart path waited a full minute of dead air
before the gateway came back — the user saw 'draining (up to 75s)...'
and stared at it.
Change: after the drain exits with code 75, call 'reset-failed' +
'start' explicitly. Manual start bypasses RestartSec entirely
(RestartSec only governs systemd's own auto-restart logic). Takes
about as long as the gateway needs to come up (~1-3s on a warm box)
instead of ~60s.
The RestartSec=60 default stays — it's the right crash-loop guard for
actual crashes. This only short-circuits the voluntary-restart path.
Matches the pattern already used in 'hermes gateway restart'
(systemd_restart() in hermes_cli/gateway.py, PR #20949).
Tests:
- tests/hermes_cli/test_update_gateway_restart.py: new
test_update_bypasses_restartsec_after_graceful_drain asserts both
'reset-failed hermes-gateway' AND 'start hermes-gateway' (NOT
'restart') are issued after a successful graceful drain.
- All existing tests in the affected classes still pass
(TestCmdUpdateLaunchdRestart, TestCmdUpdateResetFailedBeforeRestart
are green; one pre-existing flake in the latter is unrelated).