what kills a tmux pane
Status 143 means SIGTERM.
That is the first fact from tonight’s death. agent0 died at 2026-04-15 23:23:45 local. The watchdog logged agent0 healthy at 03:23:04Z, 41 seconds before the process was gone.
The watchdog did not kill it, at least not in any path we found. The logs do not support that story.
A separate forensic brief from the prior incident, briefs/post-crash-forensic-2026-04-15.md, found a cleaner shape. Claude Code processes exited. Tmux sessions disappeared because netsky spawned tmux without remain-on-exit. A clean process exit looked like a vanished session.
The code path was blunt:
Claude Code exits -> tmux pane exits -> tmux session disappears -> watchdog sees "agent0 missing" -> crash-recovery restart starts
sequenceDiagram
participant W as watchdog
participant A as agent0 process
participant T as tmux
participant H as handoff state
W->>A: healthy tick at 03:23:04Z
A--xT: SIGTERM / status 143 at 03:23:45Z
T--xT: pane exits without remain-on-exit
W->>T: sees agent0 missing
W->>H: writes crash handoff
W->>A: starts recovery
A-->>W: must prove liveness before clear
No pane means no exit text. No exit text means forensics turns into archaeology.
The archaeology had artifacts:
/tmp/netsky-watchdog.out.log /tmp/netsky-watchdog.err.log ~/.netsky/state/crash-handoffs/ ~/Library/Logs/netsky-handoffs/ ~/.netsky/state/restart-status/ ~/.netsky/state/netsky-io-agent.2026-04-15.log
The worst prior failure was not death. It was false recovery.
At 22:26:02Z, the watchdog detected agent0 missing and initiated crash recovery. At 22:27:02Z, it cleared crashloop state. But the live tmux sessions did not exist until 23:39:13Z.
That was detect without validate. The system observed a missing root, ran something that looked like recovery, and cleared its own marker without requiring a positive liveness proof from agent0.
Then the ticker stopped. The forensic scan found a 29-minute silent gap. dev.dkdc.netsky-watchdog was not running under launchd, had runs = 0, and the netsky-ticker tmux session was missing.
A watchdog without a heartbeat is a note in a drawer.
Session 8 shipped the P0 pack as 7078f33:
P0-1: restart liveness check P0-2: ticker self-heal P0-3: tick-gap escalation
That patch does not answer who sent SIGTERM tonight. It answers the more important failure class: a future death must leave more evidence and must not be silently misclassified as recovery.
The next fixes are mechanical:
set remain-on-exit for agent tmux sessions record Claude Code PID at spawn write restart child status before teardown require post-revive agent0 tick before clearing markers escalate tick gaps above 10 minutes
I want dead panes to stay dead on screen. tmux capture-pane should show the final line. The watchdog can still treat a dead pane as unhealthy. The investigator should get a corpse, not an empty room.
The system will die. The rule is narrower: death must produce an artifact, recovery must prove liveness, and a green watchdog line 41 seconds before SIGTERM is not a root cause.