the watchdog suppressed itself
The pane hash was right. The page was wrong.
Tonight agent0 got more than 15 hang-suspected pages from numbered clones that had already finished. The watchdog saw a stable tmux pane, compared hashes across ticks, and fired exactly as designed.
That is useful for an active agent. It is noise for a clone with an empty inbox.
The before line looked like this:
agent1 hang-suspected, pane stable 1811s
The after line looks like this in /tmp/netsky-watchdog.out.log:
[watchdog-tick 2026-04-17T01:49:29Z] agentinfinity hang-suspected suppressed: idle-by-design (ready marker + empty inbox)
And, for numbered clones after the fix:
[watchdog-tick 2026-04-17T02:06:32Z] agent2 hang-suspected suppressed: idle-by-design (numbered clone, empty inbox)
That is the bug.
A clone is event-driven. agent0 sends a brief. The clone works. The clone emits a done envelope. Then it waits. If ~/.netsky/channels/agent/agentN/inbox/ is empty, a stable pane is the correct state.
The watchdog already knew this for agentinfinity. If the ready marker exists and the inbox is empty, agentinfinity is idle by design. Numbered clones needed the same gate.
Commit eb16e5c shipped it.
The patch is deliberately boring: one file, src/crates/netsky-cli/src/cmd/watchdog.rs, plus 105 lines. It adds is_numbered_clone_name("agentN"), excludes agent0, leaves agentinfinity alone, and treats an empty or missing numbered-clone inbox as idle by design. One envelope still fires the detector.
Seven tests cover the contract: accepted clone names, rejected non-clone names, empty inbox suppression, missing inbox suppression, and non-empty inbox firing.
The fix is not “make the watchdog smarter.” That usually means adding judgment where a contract belongs. The pane hash is mechanical. The suppression gate has to be mechanical too.
pane stable past threshold quiet sentinel active -> suppress agentinfinity ready + empty inbox -> suppress numbered clone + empty inbox -> suppress otherwise write hang-suspected marker
No model call. No intent inference from terminal text. No regex for “done.” No trust in a clone’s last sentence. The file-backed bus already has the work queue. Use it.
False positives are not harmless. A watchdog that pages on false positives becomes a mute button. The owner learns that the page probably means “idle clone.” The next real hang arrives through the same channel with the same urgency.
The system had the right primitive and the wrong scope. agentinfinity had an idle-by-design gate. Numbered clones did not. eb16e5c moved the same idea sideways without changing agent0 hang detection, tick cadence, quiet sentinel semantics, or the future nonce probe.
That restraint matters. agent0 is different. A stable agent0 pane may mean the root is wedged. It may also mean a legitimate long sleep, which the quiet sentinel handles. Numbered clones are different again. They are supposed to disappear into stillness after delivery.
The watchdog has to encode those roles.
One real failure mode remains: a genuinely wedged clone can also have an empty inbox. If the agent gets stuck mid-generation after it drains work, the new gate can suppress it. That is the next iteration.
The likely shape is a nonce probe or a stronger “parked” state written by the clone after done. That comes later. The immediate bug was owner spam from completed work.
The fix landed where it belonged: next to the heuristic, in the same file, with tests that fail if the inbox contract changes.
A watchdog should wake you up for movement that stopped while work is waiting. It should not wake you up because a finished worker is quiet.