perf(tui): instrument stdout drain — rule out terminal parse bottleneck
Adds four fields to FrameEvent.phases and the matching profile
summary:
optimizedPatches post-optimize patch count (what's actually
written to stdout; the .patches field is
pre-optimize)
writeBytes UTF-8 byte count of the write this frame
backpressure true when Node's stdout.write returned false
(Writable buffer full — outer terminal can't
keep up)
prevFrameDrainMs end-to-end drain time of the PREVIOUS frame's
write, captured from stdout.write's 2-arg
callback. Reported on the next frame so the
measurement reflects "time until OS flushed
the bytes to the terminal fd", not "time until
queued in Node".
writeDiffToTerminal() now returns { bytes, backpressure } and
accepts an optional onDrain callback. Only attached on TTY with
diff; piped/non-TTY stdout bypasses flow control so the callback
would fire synchronously anyway.
Initial measurements under hold-wheel_up against 1106-msg session
(30Hz for 6s):
patches total 28,888
optimized total 16,700 (ratio 0.58 — optimizer cuts ~42%)
writeBytes 42 KB / 10s = 4.2 KB/s throughput
drainMs p50 0.14 ms terminal accepts bytes instantly
drainMs p99 0.85 ms
backpressure 0% of frames
This rules out the terminal-parse hypothesis — Cursor's xterm.js
drains our output in sub-millisecond time at only 4 KB/s. The
remaining lag has to be in the render pipeline, not the wire.
Profile output now includes the bytes+drain+backpressure lines to
keep this visible on every subsequent iteration.
This commit is contained in:
@@ -219,6 +219,45 @@ def format_report(data: dict[str, Any]) -> str:
|
||||
f" patches p50={pct(patches,0.5):.0f} p99={pct(patches,0.99):.0f} "
|
||||
f"max={max(patches)} total={sum(patches)}"
|
||||
)
|
||||
optimized = [
|
||||
f["phases"].get("optimizedPatches", 0)
|
||||
for f in frames if f.get("phases")
|
||||
]
|
||||
if any(optimized):
|
||||
out.append(
|
||||
f" optimized p50={pct(optimized,0.5):.0f} p99={pct(optimized,0.99):.0f} "
|
||||
f"max={max(optimized)} total={sum(optimized)}"
|
||||
f" (ratio: {sum(optimized)/max(1,sum(patches)):.2f})"
|
||||
)
|
||||
|
||||
# Write bytes + drain telemetry — the outer-terminal bottleneck gauge.
|
||||
bytes_written = [
|
||||
f["phases"].get("writeBytes", 0)
|
||||
for f in frames if f.get("phases")
|
||||
]
|
||||
if any(bytes_written):
|
||||
total_b = sum(bytes_written)
|
||||
kb = total_b / 1024
|
||||
out.append(
|
||||
f" writeBytes p50={pct(bytes_written,0.5):.0f}B p99={pct(bytes_written,0.99):.0f}B "
|
||||
f"max={max(bytes_written)}B total={kb:.1f}KB"
|
||||
)
|
||||
drains = [
|
||||
f["phases"].get("prevFrameDrainMs", 0)
|
||||
for f in frames if f.get("phases")
|
||||
]
|
||||
if any(d > 0 for d in drains):
|
||||
nonzero = [d for d in drains if d > 0]
|
||||
out.append(
|
||||
f" drainMs p50={pct(nonzero,0.5):.2f} p95={pct(nonzero,0.95):.2f} "
|
||||
f"p99={pct(nonzero,0.99):.2f} max={max(nonzero):.2f} (terminal flush latency)"
|
||||
)
|
||||
backpressure = sum(1 for f in frames if f.get("phases", {}).get("backpressure"))
|
||||
if backpressure:
|
||||
out.append(
|
||||
f" backpressure: {backpressure}/{len(frames)} frames "
|
||||
f"({100*backpressure/len(frames):.0f}%) (Node stdout buffer full — terminal slow)"
|
||||
)
|
||||
|
||||
# Flickers
|
||||
flicker_frames = [f for f in frames if f.get("flickers")]
|
||||
|
||||
Reference in New Issue
Block a user