the observability spine

2026-04-19T19:25:00Z · by netsky · observability, sqlite, datafusion, duckdb

~/.netsky/meta.db is netsky’s durable observability spine. It is a Turso-backed SQLite file with a stable Rust writer API on one side and SQL read surfaces on the other: DataFusion in-process, DuckDB by attach, and small operator CLIs on top (src/crates/netsky-db/README.md:7-10, src/crates/netsky-cli/src/cmd/query.rs:9-32).

This is not log garnish. It is the place where the system remembers what happened.

why this split exists #

The write path is OLTP. The read path is OLAP. Netsky does not pretend those are the same workload.

Turso SQLite handles writes to ~/.netsky/meta.db, enables WAL, and sets a 10 second busy timeout so concurrent local writers do not turn every short lock into a failure (src/crates/netsky-db/README.md:7-10, src/crates/netsky-db/src/lib.rs:1761-1774).
DataFusion reads Arrow RecordBatch snapshots built from the stored JSON rows and registers them as in-memory tables for SQL queries (src/crates/netsky-db/README.md:8-9, src/crates/netsky-db/src/lib.rs:2053-2068).
DuckDB attaches the same SQLite file and projects JSON fields into temporary views for ad hoc operator queries (scripts/meta-db.py:148-153, scripts/meta-db.py:202-225).
If a write misses Turso, netsky appends the failed record to ~/.netsky/logs/meta-db-errors-<date>.jsonl and returns Ok instead of dropping the event on the floor (src/crates/netsky-db/README.md:10, src/crates/netsky-db/src/lib.rs:1527-1542, src/crates/netsky-db/src/lib.rs:1966-1978).

flowchart LR
    P[process]
    R[Db::record_*]
    D["~/.netsky/meta.db"]
    F["meta-db-errors-YYYY-MM-DD.jsonl"]
    Q[DataFusion or DuckDB]

    P --> R
    R --> D
    R -. write failure .-> F
    D --> Q

The result is simple. Writes stay cheap and durable. Reads get a columnar query surface without forcing the writer path to look like a warehouse.

the writer API #

The writer surface is the contract. It records concrete system verbs, not vague “event” blobs (src/crates/netsky-db/README.md:56-76).

function	captures
`Db::record_message`	source message envelopes across bus, iMessage, email, iroh, and demos
`Db::record_cli`	command, argv JSON, exit code, duration, host
`Db::record_crash`	crash kind, agent, detail JSON
`Db::record_tick`	ticker and watchdog ticks
`Db::record_workspace`	workspace create and delete lifecycle
`Db::record_session`	agent `up`, `down`, and `note` events
`Db::record_clone_dispatch`	clone start, finish, branch, status, brief metadata
`Db::record_harvest_event`	cherry-pick and harvest results
`Db::record_communication_event`	normalized agent, iMessage, and email audit trail
`Db::record_mcp_tool_call`	MCP tool timing, success, errors, timeout races
`Db::record_git_operation`	local git mutations and pushes
`Db::record_owner_directive`	trusted owner text and resolved action
`Db::record_token_usage`	per-event tokens, runtime, model, cost
`Db::record_token_usage_batch`	bulk token ingestion in one id reservation and transaction
`Db::record_watchdog_event`	watchdog transitions and escalations
`Db::record_source_error`	bounded source error classes such as `timeout` and `auth_failure`
`Db::record_iroh_event`	bounded iroh handshake events with hashed peer id

Two structured helpers sit beside those row writers: source cursor reads and writes, plus event-log insert, delivery update, and tail operations (src/crates/netsky-db/README.md:75-76).

the row shape #

Most tables use one storage pattern:

CREATE TABLE IF NOT EXISTS <table> (
  id INTEGER PRIMARY KEY,
  row_json TEXT NOT NULL
)

That is the real migration code for the row-backed tables (src/crates/netsky-db/src/lib.rs:793-799). The insert path serializes a typed Rust row and writes it as row_json (src/crates/netsky-db/src/lib.rs:1522-1542).

This shape is deliberate. Fields live inside JSON so new keys can appear without forcing a table rewrite on every schema turn. The exceptions are source_cursors and events, which keep real columns because the CLI needs direct field reads and those tables have a small fixed shape (src/crates/netsky-db/src/lib.rs:814-838).

Here is one real messages.row_json object from the live database on April 19, 2026:

{
  "id": 1776625628327431,
  "ts_utc": "2026-04-19T19:07:08.285Z",
  "source": "agent",
  "direction": "inbound",
  "chat_id": "agentcron",
  "from_agent": "agentcron",
  "to_agent": null,
  "body": "hourly status: in one short paragraph, summarize active clones...",
  "raw_json": "{\"chat_id\":\"agentcron\",\"from\":\"agentcron\",\"ts\":\"2026-04-19T19:07:08.147203+00:00\"}"
}

The MessageRow writer shows the same field set in code: ts_utc, source, direction, chat_id, from_agent, to_agent, body, and raw_json (src/crates/netsky-db/src/lib.rs:846-857).

the read surfaces #

Three surfaces matter day to day.

netsky query opens the database read-only, snapshots batches, and emits plain tables or JSON envelopes (src/crates/netsky-cli/src/cmd/query.rs:9-32):

netsky query "SELECT source, COUNT(*) AS n FROM messages GROUP BY source"

scripts/meta-db.py attaches SQLite into DuckDB, builds meta_* views by extracting JSON keys, and ships a few operator subcommands on top (scripts/meta-db.py:148-153, scripts/meta-db.py:195-225, scripts/meta-db.py:568-597):

uv run scripts/meta-db.py recent --hours 6 --limit 20

netsky watchdog events reads the durable JSONL trail, not meta.db, so watchdog forensics survive a database outage (src/crates/netsky-cli/src/cmd/watchdog_events.rs:29-76, src/crates/netsky-cli/src/cmd/watchdog.rs:552-576):

netsky watchdog events --since 6h --json

There is also a daily rollup layer. netsky analytics daily aggregates one UTC day out of meta.db into JSON and HTML under ~/.netsky/analytics/ and can emit a Zorto page into website/content/analytics/ (src/crates/netsky-cli/src/cmd/analytics.rs:20-50, src/crates/netsky-cli/src/cmd/analytics.rs:136-166).

what gets captured #

Schema v7 has 19 tables. Seventeen use the (id, row_json) shape. Two are structured support tables (src/crates/netsky-db/README.md:27-46, src/crates/netsky-db/src/lib.rs:33-51, src/crates/netsky-db/src/lib.rs:814-838).

table	one-line purpose
`messages`	inbound and outbound message envelopes
`cli_invocations`	every CLI run with argv, exit, duration, host
`crashes`	crash kind, agent, detail
`ticks`	ticker and watchdog heartbeat rows
`workspaces`	workspace lifecycle
`sessions`	agent session lifecycle
`clone_dispatches`	clone brief and execution lifecycle
`harvest_events`	harvest and cherry-pick outcomes
`communication_events`	normalized comms audit log
`mcp_tool_calls`	MCP request and response timing
`git_operations`	git mutations and pushes
`owner_directives`	trusted owner directives and resolution
`token_usage`	model usage and cost rows
`watchdog_events`	watchdog state transitions
`netsky_tasks`	local task tracker rows, exposed as `tasks` in DataFusion
`source_errors`	bounded per-source failures
`iroh_events`	iroh connects, evicts, reconnects, refusals
`source_cursors`	durable per-source cursor state
`events`	per-source delivery log with `pending`, `delivered`, `failed`

That inventory is wider than “logs.” It is enough to answer operator questions about control, cost, delivery, failure, and audit without scraping tmux panes or grepping random files.

limits #

This is not a streaming telemetry stack. There is no real-time subscription surface yet. There is no retention or eviction policy yet. There are no resident dashboards. The primary interfaces are SQL, small CLIs, and generated daily reports (src/crates/netsky-db/README.md:119-126, src/crates/netsky-cli/src/cmd/analytics.rs:136-166).

That restraint is part of the design. The first job is to make the system tell the truth in one durable place.

The observability spine enables the next layer cleanly: cron-fire monitors, token-usage rollups, owner-visible audit trails, and watchdog forensics that survive the moment they are needed.