netsky system overview, 2026-04-21
Netsky is a VSM-inspired AI orchestration system: one root agent named agent0, zero or more bounded clone agents named agent1..agentN, one watchdog named agentinfinity, one heartbeat session named netsky-ticker, and one Rust binary named netsky that owns startup, channels, restart, health, repair, and observability. The word viable is load-bearing. The system is designed to keep serving, keep repairing, and keep enough durable evidence to explain failures after live sessions die (src/crates/netsky-prompts/prompts/base.md:3-18, src/crates/netsky-prompts/prompts/base.md:20-28, README.md:1-21).
What netsky is #
Netsky is a terminal-native viable system for AI work. It maps Stafford Beer style control layers onto a local constellation: S5 policy lives in prompts, S4 intelligence lives in notes and audits, S3 control lives in agent0, S3* audit lives in review passes, S2 coordination lives in isolated workspaces, S1 operations live in clones, and agentinfinity keeps the root alive (src/crates/netsky-prompts/prompts/base.md:20-28). It ships as one binary installed by cargo install netsky, and the repo treats ./bin/check as the landing gate (README.md:7-21, src/crates/netsky-prompts/prompts/base.md:107-113).
The shape #
owner +---------------+----------------+ | | | iMessage email iroh | | | v v v +------------------------------------------------+ | ~/.netsky/channels | | agent bus: agent0, agent1..N, agentinfinity | +-------------------+----------------------------+ | v +---------------+ | tmux: agent0 | | root control | +-------+-------+ | briefs | review / replies +-------------+----------------+ | | v v +---------------+ +---------------+ | tmux: agent1 | ... | tmux: agentN | | fresh clone | | fresh clone | +-------+-------+ +-------+-------+ | | v v workspaces/<task>/repo workspaces/<task>/repo +-------------------+ +-------------------------+ | tmux: agentinf. |<-----| netsky-ticker, 60s | | watchdog + repair | | watchdog/cron/loop tick | +---------+---------+ +------------+------------+ ^ ^ | | +-------------+---------------+ | launchd plist, 120s durable stores: ~/.netsky/meta.db ~/.netsky/state/* ~/.netsky/logs/watchdog-events-YYYY-MM-DD.jsonl
agent0 is the root orchestrator and the only agent that edits the in-tree checkout. Clones do bounded work in fresh workspaces/<task>/repo clones, and agentinfinity owns watchdog and repair duties (src/crates/netsky-prompts/prompts/base.md:30-40). Each agent is a tmux session, and AGENT_N selects identity at spawn time (src/crates/netsky-prompts/prompts/base.md:35-38, src/crates/netsky-core/src/consts.rs:7-16).
The spawn path is one code path for every role. It writes prompts under ~/.netsky/state/prompts, writes per-agent MCP config, builds a runtime command, and asks tmux to create a detached session with AGENT_N and NETSKY_PROMPT_FILE (src/crates/netsky-core/src/spawn.rs:1-18, src/crates/netsky-core/src/spawn.rs:92-143). netsky up spawns agent0, N clones, and the watchdog, while netsky agentinfinity persists the watchdog runtime and clears the readiness marker before a fresh spawn (src/crates/netsky-cli/src/cmd/up.rs:1-7, src/crates/netsky-cli/src/cmd/up.rs:23-57, src/crates/netsky-cli/src/cmd/agentinfinity.rs:16-45, src/crates/netsky-cli/src/cmd/agentinfinity.rs:66-78).
The ticker is the hot heartbeat. It runs every 60 seconds by default and drives watchdog, cron, and loop ticks (src/crates/netsky-core/src/consts.rs:263-279, website/content/posts/restart-and-handoff.md:141-146). Launchd is the colder failsafe. Its plist points to ~/.netsky/bin/netsky-watchdog-shim, not directly to the Rust binary, so a broken live binary can be replaced or bypassed (README.md:34-53, src/crates/netsky-cli/src/cmd/launchd.rs:1-7, src/crates/netsky-cli/src/cmd/launchd.rs:22-40, src/crates/netsky-cli/src/cmd/launchd.rs:280-343).
Durable state is explicit. ~/.netsky/state/ holds readiness, restart, hang, crashloop, ticker, loop, and repair markers (src/crates/netsky-prompts/prompts/base.md:83-88, src/crates/netsky-core/src/consts.rs:301-403). ~/.netsky/meta.db records the analytical trail (src/crates/netsky-prompts/prompts/base.md:90-96, src/crates/netsky-db/README.md:1-10).
How it communicates #
Netsky treats every inbound message as an envelope. The agent bus is a file-backed channel rooted at ~/.netsky/channels/agent/agent<N>/inbox, and outbound agent messages use netsky channel send <agent> "<text>" --from <agent> (src/crates/netsky-prompts/prompts/base.md:42-49, src/crates/netsky-cli/src/cmd/channel.rs:1-18). The active channel root prefers ~/.netsky/channels, falls back to the legacy ~/.claude/channels, and creates a compatibility symlink during the transition (src/crates/netsky-core/src/paths.rs:127-194).
producer | | write JSON envelope v agent<N>/inbox/ | | atomic rename v agent<N>/claimed/ | | validate from, ts, wrapper tokens, symlinks, size cap v <channel source="agent" ...> body </channel> | | emit to pane or stdout v agent<N>/delivered/ bad input -> agent<N>/poison/
The claim step is an atomic rename from inbox/ to claimed/, and a later drain adopts claimed leftovers after a crash (src/crates/netsky-cli/src/cmd/channel.rs:7-13, src/crates/netsky-cli/src/cmd/channel.rs:391-431). The guard rejects invalid agent ids, malformed timestamps, wrapper injection, oversized envelopes, and symlink traversal (src/crates/netsky-cli/src/cmd/channel.rs:35-58, src/crates/netsky-cli/src/cmd/channel.rs:473-503). Delivered envelopes write ack records, communication events, and clone lifecycle edges where applicable (src/crates/netsky-cli/src/cmd/channel.rs:527-548, src/crates/netsky-cli/src/cmd/channel.rs:773-908).
Owner communication uses several legs. imessage is the primary operator channel, email is the fallback channel, iroh carries cross-machine envelopes, and the agent bus carries inter-agent work (src/crates/netsky-cli/src/cli.rs:267-287, src/crates/netsky-io/src/lib.rs:4-9, src/crates/netsky-channels/src/email/access.rs:1-13). MCP sources are deliberately narrow. The server provides stdio MCP, tool timeouts, source observability, and mutation retry guards, but the agent source is emit-only for the bus path (src/crates/netsky-io/src/mcp.rs:1-6, src/crates/netsky-io/src/mcp.rs:443-687, src/crates/netsky-cli/src/cmd/channel.rs:1-18).
How work flows #
agent0 brief | v netsky clone brief briefs/foo.md --workspace foo --agent 6 | +--> create workspaces/foo/repo +--> spawn agent6 in tmux +--> send bus envelope kind=brief +--> record clone_dispatch + lifecycle | v agent6 edits branch in workspace clone | v git commit && git push origin branch | v agent0 fetches local origin | v cherry-pick or netsky harvest | v ./bin/check | v push main to GitHub
The clone command reads a brief, normalizes a workspace name, creates the workspace through workspace::clone_workspace, allocates or uses an agent number, sends a brief envelope, waits for Codex delivery or ack when needed, and records dispatch metadata (src/crates/netsky-cli/src/cmd/clone.rs:44-94, src/crates/netsky-cli/src/cmd/clone.rs:153-267, src/crates/netsky-cli/src/cmd/clone.rs:665-730). The command family exposes brief, status, wait, kill, health, and ls because clone lifecycle is operational state, not chat convention (src/crates/netsky-cli/src/cli.rs:906-953).
Landing is commits-to-main for netsky itself. The base prompt forbids GitHub PRs for netsky and says clone work pushes to the in-tree checkout as origin, then agent0 cherry-picks to main (src/crates/netsky-prompts/prompts/base.md:7-15, src/crates/netsky-prompts/prompts/base.md:109-113). The netsky harvest command encodes that ritual: fetch a clone branch, rebase target, apply the range, run ./bin/check, push main, write harvest_events, and emit a harvest-complete bus event (src/crates/netsky-cli/src/cmd/harvest.rs:1-6, src/crates/netsky-cli/src/cmd/harvest.rs:74-88, src/crates/netsky-cli/src/cmd/harvest.rs:140-195, src/crates/netsky-cli/src/cmd/harvest.rs:324-350).
How it stays alive #
+--------------------------------+ | owner-visible proof | | health beacon, morning, doctor | +---------------+----------------+ | +---------------v----------------+ | escalation floor | | iMessage, Gmail, desktop, tmux | +---------------+----------------+ | +---------------v----------------+ | watchdog repair | | restart, retry, hang, markers | +---------------+----------------+ | +---------------v----------------+ | shell repair | | launchd shim, LKG, self-repair | +---------------+----------------+ | +---------------v----------------+ | process bounds | | tmux deadlines, child timeouts | +---------------+----------------+ | +---------------v----------------+ | durable state | | meta.db, state markers, JSONL | +--------------------------------+
The process layer is bounded. Tmux probes were bounded in T2 e177a7a, ticker child subprocesses were bounded in T3 be491d2, and iMessage poison recovery plus inbox size caps landed in T17 a8f6a5e (notes/2026/04/21/agent0.md:84-99, notes/2026/04/21/agent0.md:119-135). The code path now treats long waits as failures that must return control to the watchdog tick, not as permanent ownership of the tick lock (src/crates/netsky-cli/src/cmd/watchdog.rs:120-239, src/crates/netsky-core/src/consts.rs:421-447).
The watchdog layer has several loops. One tick checks restart inflight state, gave-up and crashloop guards, agentinfinity, ticker liveness, iMessage source liveness, planned restart requests, agent0, crashloop state, and hang detection (src/crates/netsky-cli/src/cmd/watchdog.rs:120-239). T10 080733d6 changed failed revive and crashloop from hard stop to cooldown retry plus re-page. T1 f142ee2 archives gave-up markers after verified restart. T21 6cd6f27 reconciles stale restart-degraded markers (notes/2026/04/21/agent0.md:119-135, notes/2026/04/21/agent0.md:142-151).
Hang detection writes per-agent pane hashes, hang-suspected markers, and hang-paged markers, then clears them when pane output moves (src/crates/netsky-core/src/paths.rs:304-325, notes/2026/04/20/agent4-resilience-review.md:5-15). T14 901bb7ae moved hang-paged to delivery success, not delivery attempt, and T15 74b9c10c made restart sweep clone and agentinfinity hang state while preserving durable markers (notes/2026/04/21/agent0.md:119-135, src/crates/netsky-cli/src/cmd/restart.rs:809-882, src/crates/netsky-cli/src/cmd/restart.rs:1127-1185).
The escalation layer has four legs. netsky escalate tries iMessage, Gmail, a Desktop sentinel, and a tmux banner, then writes an incident spool for retry until ack (src/crates/netsky-cli/src/cmd/escalate.rs:37-52, src/crates/netsky-cli/src/cmd/escalate.rs:194-283, src/crates/netsky-cli/src/cmd/escalate.rs:372-454, src/crates/netsky-cli/src/cmd/escalate.rs:1087-1203). T9 7211395 added the multi-leg floor, T11 d02f3fd3 added retry-until-ack incidents, T13 3bcc95a escalates iMessage source outage, and T16 42e1c00 records owner-pages.jsonl audit rows (notes/2026/04/21/agent0.md:119-135, src/crates/netsky-cli/src/cmd/owner_pages.rs:1-17, src/crates/netsky-cli/src/cmd/owner_pages.rs:52-98).
The repair layer assumes the live Rust binary can be broken. netsky launchd install embeds and installs netsky-watchdog-shim, and the shim probes the live binary, tries install recovery, then falls back to ~/.netsky/bin/netsky.lkg (README.md:34-53, bin/netsky-watchdog-shim:11-63, src/crates/netsky-cli/src/cmd/launchd.rs:22-40). T4 f100f7f added the shim and LKG promotion, 2f7d58f embedded the shim for cargo-installed binaries, and T5 38a0ac1 added netsky self-repair (notes/2026/04/21/agent0.md:84-99, notes/2026/04/21/agent0.md:119-135, src/crates/netsky-cli/src/cmd/self_repair.rs:34-54, src/crates/netsky-cli/src/cmd/self_repair.rs:161-176).
Startup has its own layer. Claude development channels are required for server:agent, and the spawn command must avoid foreground TOS prompts. T6 546241f writes prompt-free Claude settings and tests the dev-channel consent skip (notes/2026/04/21/agent0.md:84-99, notes/2026/04/21/agentinfinity.md:69-79, src/crates/netsky-core/src/spawn.rs:178-198, src/crates/netsky-core/src/spawn.rs:236-258, src/crates/netsky-core/src/spawn.rs:342-354). Agentinfinity readiness is now a JSON payload with session identity, runtime, boot pids, and freshness, and T22 3320e5d requires freshness plus session match before treating the watchdog as idle-by-design (notes/2026/04/21/agent0.md:142-151, src/crates/netsky-cli/prompts/agentinit.md:8-16).
Observability turns failures into rows and files. T12 ec2bc38 added doctor checks for version skew and schema health. T8 67a00c4 moved health to the top of the morning brief. T20 dc0da91 made watchdog JSONL parsing tolerant. T18 64e6fa7 added netsky health beacon --once, which writes ~/.netsky/state/health.json and can mirror or webhook a snapshot (notes/2026/04/21/agent0.md:84-99, notes/2026/04/21/agent0.md:119-151, src/crates/netsky-cli/src/cmd/health.rs:72-125, src/crates/netsky-cli/src/cmd/health.rs:333-384).
Repo safety is also liveness. T23 b469a11 added the agent0 HEAD stability guard and receive.denyCurrentBranch=refuse checks after a same-day root HEAD shift (notes/2026/04/21/agent0.md:142-151, src/crates/netsky-cli/src/cmd/doctor.rs:1150-1211, bin/check-git-safety:19-23, .githooks/pre-push:43-49). A viable system that can rewrite its own root checkout needs a hard stop at the Git boundary.
The live proof arrived at 16:09Z. agentinfinity went hang-suspected after 7735 seconds. The iMessage leg failed because NETSKY_OWNER_IMESSAGE was absent from the launchd environment. The Gmail leg delivered to the owner, the Desktop sentinel was written, and the tmux banner was set (~/.netsky/state/escalations/b02bf6cf1362637f.json:1-29). That incident validated the defense-in-depth claim on the same day the layers landed.
How you use it #
netsky up 8 netsky agent 4 --type codex netsky restart 8 --handoff /tmp/handoff.md netsky channel send agent4 "review restart.rs only" --from agent0 netsky imessage send --owner "wave landed" netsky escalate "agent0 down" "watchdog could not revive root" netsky doctor netsky self-repair check netsky health beacon --once netsky drill hang-suspected netsky clone ls netsky clone brief briefs/t6.md --workspace viability-t6 --agent 6 netsky clone wait 6 --timeout 900 netsky clone kill 6
The command tree backs those verbs directly. up, restart, agent, agentinfinity, watchdog, tick, loop, launchd, self-repair, doctor, health, and drill are top-level commands (src/crates/netsky-cli/src/cli.rs:15-80, src/crates/netsky-cli/src/cli.rs:141-143, src/crates/netsky-cli/src/cli.rs:210-240). The channel, iMessage, email, Drive, calendar, iroh, and MCP source commands live in the same binary (src/crates/netsky-cli/src/cli.rs:267-287). Clone lifecycle is a subcommand family: brief, status, wait, kill, health, and ls (src/crates/netsky-cli/src/cli.rs:906-953).
Observability #
| surface | path | records |
|---|---|---|
| SQL store | ~/.netsky/meta.db | messages, CLI invocations, crashes, ticks, sessions, workspaces, clone dispatches, harvests, communication, MCP calls, tool use, marker state, owner directives, token usage, watchdog events, source cursors, source events (src/crates/netsky-db/README.md:1-10, src/crates/netsky-db/README.md:27-66) |
| watchdog JSONL | ~/.netsky/logs/watchdog-events-YYYY-MM-DD.jsonl | backend-independent watchdog events written before any database write (src/crates/netsky-db/README.md:82-117, src/crates/netsky-core/src/paths.rs:381-390) |
| owner pages | ~/.netsky/state/owner-pages.jsonl | immutable owner-page attempts and legs (src/crates/netsky-cli/src/cmd/owner_pages.rs:1-17, src/crates/netsky-cli/src/cmd/owner_pages.rs:52-98) |
| state markers | ~/.netsky/state/* | restart, hang, crashloop, escalation, readiness, ticker, repair, health (src/crates/netsky-prompts/prompts/base.md:83-88, src/crates/netsky-core/src/consts.rs:301-403) |
| event delivery | events table | per-source pending, delivered, or failed delivery rows (src/crates/netsky-db/README.md:58-66) |
Schema version is 12 as of this snapshot. The v12 migration adds fulfillment edges to owner_directives, while earlier v10 and v11 migrations added tool-use, clone lifecycle, pre-push bypass, marker state, and dependency install rows (src/crates/netsky-db/README.md:27-76). Operators can query it with netsky query, and the prompt still documents scripts/meta-db.py for DuckDB summaries (src/crates/netsky-db/README.md:21-25, src/crates/netsky-prompts/prompts/base.md:90-96).
What it’s not #
Netsky is not a general agent framework. It is an opinionated local control plane for one owner, one root constellation, tmux sessions, Unix files, Rust code, and narrow channels (src/crates/netsky-prompts/prompts/base.md:30-40, src/crates/netsky-prompts/prompts/base.md:98-105). It is not a model router. Runtime selection exists, but the architecture standardizes the bus and CLI, not model arbitration (src/crates/netsky-cli/src/cli.rs:624-628, src/crates/netsky-core/src/spawn.rs:17-19). It is not multi-user software. The owner-comms model names one netsky0 root and routes sibling constellations through it (src/crates/netsky-prompts/prompts/base.md:40-49).
History in one page #
| date | milestone |
|---|---|
| early concept | VSM mapping gave netsky its vocabulary: policy, intelligence, control, audit, coordination, operations, watchdog (src/crates/netsky-prompts/prompts/base.md:20-28) |
| first constellation | tmux became the substrate: agent0, agent1..N, agentinfinity, and later netsky-ticker (src/crates/netsky-prompts/prompts/base.md:30-40, src/crates/netsky-core/src/consts.rs:81-88) |
| Rust consolidation | one binary became the install and control surface (README.md:7-21, src/crates/netsky-cli/src/cli.rs:8-16) |
| schema growth | meta.db reached schema v12 with clone lifecycle, marker state, owner directive fulfillment, token usage, source cursors, and iroh events (src/crates/netsky-db/README.md:27-76) |
| 2026-04-20 nuke | a stale failed-revive gave-up marker survived manual intervention and paralyzed future watchdog ticks (notes/2026/04/21/agentinfinity.md:8-31) |
| 2026-04-21 paralysis | agentinfinity itself blocked on the Claude dev-channel TOS prompt until the owner dismissed it over SSH (notes/2026/04/21/agentinfinity.md:3-16, notes/2026/04/21/agentinfinity.md:69-79) |
| 2026-04-21 defense day | 17 viability tasks landed across three waves, then the new binary and launchd shim activated (notes/2026/04/21/agent0.md:84-162) |
The recent commit line shows the shape of the day: bounded tmux e177a7a, bounded ticker be491d2, gave-up archive f142ee2, multi-leg escalation 7211395, prompt-free startup 546241f, retry spool d02f3fd, delivery-aware hang paging 901bb7a, health beacon 64e6fa7, drills 7a4a190, readiness freshness 3320e5d, and Git safety b469a11 (git log --oneline origin/main -50).
Where it’s going #
- Make
agentinfinityidle-as-design hang detection less noisy while still catching real wedges. T22 added freshness and session matching, but the 16:09Z proof shows the policy still needs tuning (notes/2026/04/21/agent0.md:142-151,~/.netsky/state/escalations/b02bf6cf1362637f.json:1-29). - Add more
netsky drillscenarios beyond binary-delete, hang-suspected, escalate-fail-retry, and gave-up-archive (notes/2026/04/21/agent0.md:142-151). - Install the health beacon as a launchd-backed phone-visible surface, not just an on-demand
--oncecommand (src/crates/netsky-cli/src/cmd/health.rs:72-125,notes/2026/04/21/agent0.md:142-151). - Keep extending cross-machine routing over iroh while preserving the single-root owner-comms rule (
src/crates/netsky-prompts/prompts/base.md:40-49,src/crates/netsky-db/README.md:58-66). - Split high-contention watchdog code before the next wide resilience wave. Agent0 recorded watchdog merge contention as the biggest cost of the day (
notes/2026/04/21/agent0.md:166-174).