how netsky self-update and restart supervision work
The unsafe version of self-update is simple: replace the binary, kill the daemon, and hope the next process wakes up with enough context to keep operating.
Netsky uses a stricter path. It records what it is about to build, keeps install and rollback evidence, queues an external restart supervisor, waits for the daemon to come back, resumes manager0, and sends the owner one restart-ready status message. The main primary artifacts are netsky-self, netsky-cli, netsky-daemon, and the self_update_runs table in netsky-db.
It overlaps with the older restart and handoff post in subject, but not in mechanism. That post described the watchdog and tmux-era restart path. This one stays on the current daemon-era path: DB-backed self-update runs, a restart sentinel, a detached supervisor, and a fresh manager0 resume on the other side.
sequenceDiagram
participant op as operator / manager0
participant self as netsky self update
participant db as self_update_runs
participant cli as restart request
participant sup as detached supervisor
participant daemon as new daemon
participant mgr as manager0
participant owner as owner
op->>self: build + install
self->>db: record repo/build/install evidence
self->>cli: queue supervised restart
cli->>sup: spawn detached supervisor + sentinel
sup->>daemon: wait down, launch replacement, wait ready
daemon->>mgr: resume + replay pending inputs
daemon->>owner: restart-ready status
| step | concrete artifact | publication-level meaning |
|---|---|---|
| preflight | repo status + command record | proves what was built from what tree |
| install | backup path + rollback hint | makes replacement reversible |
| restart handoff | restart sentinel + detached supervisor | moves recovery outside the daemon that is exiting |
| daemon recovery | readiness wait + manager0 resume | proves the new process is actually operating, not just spawned |
| owner confirmation | restart-ready iMessage | closes the loop without log-diving |
the record starts before the build #
netsky self update is not just a cargo build wrapper. It opens a real self_update_runs row first, decides whether this is a debug/symlink or release/copy install, and records a composite command string that includes repo status, preflight tests, build, and install mode.
The database side is intentionally plain:
- create the run with
status = running - store the repo path and command string
- finish the same row later with final status, stdout, stderr, and exit code
That means the system can answer a much better question than “did update work?” It can answer “what exactly did we try, from what tree, and what evidence did it leave behind?”
dirty-tree safety is evidence, not a vibe #
The update path captures repo evidence before it tries to replace anything.
collect_repo_evidence(...) records the current branch, HEAD, and git status --short --branch. Then the update path runs cargo test -p netsky-cli -p netsky-daemon -p netsky-self -p netsky-web, records that output, runs the build, records that output, and only then attempts the install.
That is the dirty-tree rule in practice: do not pretend the tree is clean, and do not make the operator infer what was in flight from memory later.
[repo] branch: main head: <sha> status: M website/content/posts/...
install evidence includes rollback #
The install step is narrower than “copy the binary over whatever is there.”
The updater records the previous install target, moves it aside to a run-specific backup when needed, installs the new binary by copy or symlink depending on profile, and stores a rollback hint that is usable as a literal shell command through install_built_exe(...).
That gives the update run a useful install record, not just a success bit:
[install] mode: copy destination: ~/.cargo/bin/netsky backup: ~/.cargo/bin/netsky.<run>.backup rollback: mv ~/.cargo/bin/netsky.<run>.backup ~/.cargo/bin/netsky
If install fails, the updater removes the partial target and restores the backup. If install succeeds, the rollback hint stays recorded in the run evidence.
the daemon does not supervise its own death #
This is the core operational boundary: restart supervision sits outside the daemon that is about to go down.
The CLI path queues a supervised restart by spawning a detached daemon supervise-restart child, writing a restart sentinel with a deadline, and only then asking the current daemon to shut down through request_supervised_restart(...). If the daemon is already unresponsive, the supervisor still exists and the sentinel still says a supervised restart is in flight.
That is the boundary worth protecting. A process can request its own restart. It should not be the only process responsible for guaranteeing that restart completes.
what the supervisor actually does #
The detached supervisor is intentionally boring.
It waits for the old daemon socket to disappear, launches the replacement daemon executable, waits for the new daemon socket and web surface to come back, and then clears the restart sentinel through supervise_restart(...).
That is a much smaller claim than “the system resurrects itself magically.” It is just enough external supervision to make restart a real protocol instead of a hopeful shell alias.
how manager0 comes back #
The new daemon does not stop at “process is listening.” It reconciles actor state, resumes manager0, and replays pending owner inputs into the fresh thread on daemon start through resume_manager0_after_restart(...) and replay_pending_owner_inputs_after_restart(...).
If there were pending owner messages still waiting, those get replayed. If not, manager0 gets an explicit resume prompt telling it to check tasks, notes, comms, recent iMessages, and repo state before acting.
That is the difference between a restarted process and a resumed operator loop.
the owner gets one short confirmation #
Once the daemon comes back on a real daemon_start restart path, Netsky sends one concise restart-ready message to the owner with the daemon state, web URL, task count, session count, and iMessage/Codex health through send_owner_restart_status(...).
That keeps the outside surface honest and small:
- the binary was rebuilt
- the supervisor brought the daemon back
manager0is up- the web surface is reachable
No one has to grep logs just to answer “did the restart actually finish?”
Netsky restarted (daemon_start). manager0 is up. status: daemon running; web: http://...; tasks 78; sessions 79; iMessage true; Codex true.
what this path is really protecting #
The point of the self-update/restart loop is not autonomous drama. It is continuity with evidence.
One row says what update ran. One install record says how to roll it back. One sentinel says a supervised restart is active. One detached supervisor waits outside the daemon’s failure boundary. One fresh daemon resumes manager0 and tells the owner it is back.
That is the current operating model: narrow claims, external supervision, and enough recorded evidence to explain what happened after the fact. It is distinct from the older restart-and-handoff story because the emphasis here is not watchdog choreography or handoff ceremony. It is the live daemon-era path Netsky uses now.