the observability spine

~/.netsky/meta.db is netsky’s durable observability spine. It is a Turso-backed SQLite file with a stable Rust writer API on one side and SQL read surfaces on the other: DataFusion in-process, DuckDB by attach, and small operator CLIs on top (src/crates/netsky-db/README.md:7-10, src/crates/netsky-cli/src/cmd/query.rs:9-32).

This is not log garnish. It is the place where the system remembers what happened.

why this split exists #

The write path is OLTP. The read path is OLAP. Netsky does not pretend those are the same workload.

  • Turso SQLite handles writes to ~/.netsky/meta.db, enables WAL, and sets a 10 second busy timeout so concurrent local writers do not turn every short lock into a failure (src/crates/netsky-db/README.md:7-10, src/crates/netsky-db/src/lib.rs:1761-1774).
  • DataFusion reads Arrow RecordBatch snapshots built from the stored JSON rows and registers them as in-memory tables for SQL queries (src/crates/netsky-db/README.md:8-9, src/crates/netsky-db/src/lib.rs:2053-2068).
  • DuckDB attaches the same SQLite file and projects JSON fields into temporary views for ad hoc operator queries (scripts/meta-db.py:148-153, scripts/meta-db.py:202-225).
  • If a write misses Turso, netsky appends the failed record to ~/.netsky/logs/meta-db-errors-<date>.jsonl and returns Ok instead of dropping the event on the floor (src/crates/netsky-db/README.md:10, src/crates/netsky-db/src/lib.rs:1527-1542, src/crates/netsky-db/src/lib.rs:1966-1978).
flowchart LR
    P[process]
    R[Db::record_*]
    D["~/.netsky/meta.db"]
    F["meta-db-errors-YYYY-MM-DD.jsonl"]
    Q[DataFusion or DuckDB]

    P --> R
    R --> D
    R -. write failure .-> F
    D --> Q

The result is simple. Writes stay cheap and durable. Reads get a columnar query surface without forcing the writer path to look like a warehouse.

the writer API #

The writer surface is the contract. It records concrete system verbs, not vague “event” blobs (src/crates/netsky-db/README.md:56-76).

functioncaptures
Db::record_messagesource message envelopes across bus, iMessage, email, iroh, and demos
Db::record_clicommand, argv JSON, exit code, duration, host
Db::record_crashcrash kind, agent, detail JSON
Db::record_tickticker and watchdog ticks
Db::record_workspaceworkspace create and delete lifecycle
Db::record_sessionagent up, down, and note events
Db::record_clone_dispatchclone start, finish, branch, status, brief metadata
Db::record_harvest_eventcherry-pick and harvest results
Db::record_communication_eventnormalized agent, iMessage, and email audit trail
Db::record_mcp_tool_callMCP tool timing, success, errors, timeout races
Db::record_git_operationlocal git mutations and pushes
Db::record_owner_directivetrusted owner text and resolved action
Db::record_token_usageper-event tokens, runtime, model, cost
Db::record_token_usage_batchbulk token ingestion in one id reservation and transaction
Db::record_watchdog_eventwatchdog transitions and escalations
Db::record_source_errorbounded source error classes such as timeout and auth_failure
Db::record_iroh_eventbounded iroh handshake events with hashed peer id

Two structured helpers sit beside those row writers: source cursor reads and writes, plus event-log insert, delivery update, and tail operations (src/crates/netsky-db/README.md:75-76).

the row shape #

Most tables use one storage pattern:

CREATE TABLE IF NOT EXISTS <table> (
  id INTEGER PRIMARY KEY,
  row_json TEXT NOT NULL
)

That is the real migration code for the row-backed tables (src/crates/netsky-db/src/lib.rs:793-799). The insert path serializes a typed Rust row and writes it as row_json (src/crates/netsky-db/src/lib.rs:1522-1542).

This shape is deliberate. Fields live inside JSON so new keys can appear without forcing a table rewrite on every schema turn. The exceptions are source_cursors and events, which keep real columns because the CLI needs direct field reads and those tables have a small fixed shape (src/crates/netsky-db/src/lib.rs:814-838).

Here is one real messages.row_json object from the live database on April 19, 2026:

{
  "id": 1776625628327431,
  "ts_utc": "2026-04-19T19:07:08.285Z",
  "source": "agent",
  "direction": "inbound",
  "chat_id": "agentcron",
  "from_agent": "agentcron",
  "to_agent": null,
  "body": "hourly status: in one short paragraph, summarize active clones...",
  "raw_json": "{\"chat_id\":\"agentcron\",\"from\":\"agentcron\",\"ts\":\"2026-04-19T19:07:08.147203+00:00\"}"
}

The MessageRow writer shows the same field set in code: ts_utc, source, direction, chat_id, from_agent, to_agent, body, and raw_json (src/crates/netsky-db/src/lib.rs:846-857).

the read surfaces #

Three surfaces matter day to day.

netsky query opens the database read-only, snapshots batches, and emits plain tables or JSON envelopes (src/crates/netsky-cli/src/cmd/query.rs:9-32):

netsky query "SELECT source, COUNT(*) AS n FROM messages GROUP BY source"

scripts/meta-db.py attaches SQLite into DuckDB, builds meta_* views by extracting JSON keys, and ships a few operator subcommands on top (scripts/meta-db.py:148-153, scripts/meta-db.py:195-225, scripts/meta-db.py:568-597):

uv run scripts/meta-db.py recent --hours 6 --limit 20

netsky watchdog events reads the durable JSONL trail, not meta.db, so watchdog forensics survive a database outage (src/crates/netsky-cli/src/cmd/watchdog_events.rs:29-76, src/crates/netsky-cli/src/cmd/watchdog.rs:552-576):

netsky watchdog events --since 6h --json

There is also a daily rollup layer. netsky analytics daily aggregates one UTC day out of meta.db into JSON and HTML under ~/.netsky/analytics/ and can emit a Zorto page into website/content/analytics/ (src/crates/netsky-cli/src/cmd/analytics.rs:20-50, src/crates/netsky-cli/src/cmd/analytics.rs:136-166).

what gets captured #

Schema v7 has 19 tables. Seventeen use the (id, row_json) shape. Two are structured support tables (src/crates/netsky-db/README.md:27-46, src/crates/netsky-db/src/lib.rs:33-51, src/crates/netsky-db/src/lib.rs:814-838).

tableone-line purpose
messagesinbound and outbound message envelopes
cli_invocationsevery CLI run with argv, exit, duration, host
crashescrash kind, agent, detail
ticksticker and watchdog heartbeat rows
workspacesworkspace lifecycle
sessionsagent session lifecycle
clone_dispatchesclone brief and execution lifecycle
harvest_eventsharvest and cherry-pick outcomes
communication_eventsnormalized comms audit log
mcp_tool_callsMCP request and response timing
git_operationsgit mutations and pushes
owner_directivestrusted owner directives and resolution
token_usagemodel usage and cost rows
watchdog_eventswatchdog state transitions
netsky_taskslocal task tracker rows, exposed as tasks in DataFusion
source_errorsbounded per-source failures
iroh_eventsiroh connects, evicts, reconnects, refusals
source_cursorsdurable per-source cursor state
eventsper-source delivery log with pending, delivered, failed

That inventory is wider than “logs.” It is enough to answer operator questions about control, cost, delivery, failure, and audit without scraping tmux panes or grepping random files.

limits #

This is not a streaming telemetry stack. There is no real-time subscription surface yet. There is no retention or eviction policy yet. There are no resident dashboards. The primary interfaces are SQL, small CLIs, and generated daily reports (src/crates/netsky-db/README.md:119-126, src/crates/netsky-cli/src/cmd/analytics.rs:136-166).

That restraint is part of the design. The first job is to make the system tell the truth in one durable place.

The observability spine enables the next layer cleanly: cron-fire monitors, token-usage rollups, owner-visible audit trails, and watchdog forensics that survive the moment they are needed.