the observability spine
~/.netsky/meta.db is netsky’s durable observability spine. It is a Turso-backed SQLite file with a stable Rust writer API on one side and SQL read surfaces on the other: DataFusion in-process, DuckDB by attach, and small operator CLIs on top (src/crates/netsky-db/README.md:7-10, src/crates/netsky-cli/src/cmd/query.rs:9-32).
This is not log garnish. It is the place where the system remembers what happened.
why this split exists #
The write path is OLTP. The read path is OLAP. Netsky does not pretend those are the same workload.
- Turso SQLite handles writes to
~/.netsky/meta.db, enables WAL, and sets a 10 second busy timeout so concurrent local writers do not turn every short lock into a failure (src/crates/netsky-db/README.md:7-10,src/crates/netsky-db/src/lib.rs:1761-1774). - DataFusion reads Arrow
RecordBatchsnapshots built from the stored JSON rows and registers them as in-memory tables for SQL queries (src/crates/netsky-db/README.md:8-9,src/crates/netsky-db/src/lib.rs:2053-2068). - DuckDB attaches the same SQLite file and projects JSON fields into temporary views for ad hoc operator queries (
scripts/meta-db.py:148-153,scripts/meta-db.py:202-225). - If a write misses Turso, netsky appends the failed record to
~/.netsky/logs/meta-db-errors-<date>.jsonland returnsOkinstead of dropping the event on the floor (src/crates/netsky-db/README.md:10,src/crates/netsky-db/src/lib.rs:1527-1542,src/crates/netsky-db/src/lib.rs:1966-1978).
flowchart LR
P[process]
R[Db::record_*]
D["~/.netsky/meta.db"]
F["meta-db-errors-YYYY-MM-DD.jsonl"]
Q[DataFusion or DuckDB]
P --> R
R --> D
R -. write failure .-> F
D --> Q
The result is simple. Writes stay cheap and durable. Reads get a columnar query surface without forcing the writer path to look like a warehouse.
the writer API #
The writer surface is the contract. It records concrete system verbs, not vague “event” blobs (src/crates/netsky-db/README.md:56-76).
| function | captures |
|---|---|
Db::record_message | source message envelopes across bus, iMessage, email, iroh, and demos |
Db::record_cli | command, argv JSON, exit code, duration, host |
Db::record_crash | crash kind, agent, detail JSON |
Db::record_tick | ticker and watchdog ticks |
Db::record_workspace | workspace create and delete lifecycle |
Db::record_session | agent up, down, and note events |
Db::record_clone_dispatch | clone start, finish, branch, status, brief metadata |
Db::record_harvest_event | cherry-pick and harvest results |
Db::record_communication_event | normalized agent, iMessage, and email audit trail |
Db::record_mcp_tool_call | MCP tool timing, success, errors, timeout races |
Db::record_git_operation | local git mutations and pushes |
Db::record_owner_directive | trusted owner text and resolved action |
Db::record_token_usage | per-event tokens, runtime, model, cost |
Db::record_token_usage_batch | bulk token ingestion in one id reservation and transaction |
Db::record_watchdog_event | watchdog transitions and escalations |
Db::record_source_error | bounded source error classes such as timeout and auth_failure |
Db::record_iroh_event | bounded iroh handshake events with hashed peer id |
Two structured helpers sit beside those row writers: source cursor reads and writes, plus event-log insert, delivery update, and tail operations (src/crates/netsky-db/README.md:75-76).
the row shape #
Most tables use one storage pattern:
CREATE TABLE IF NOT EXISTS <table> ( id INTEGER PRIMARY KEY, row_json TEXT NOT NULL )
That is the real migration code for the row-backed tables (src/crates/netsky-db/src/lib.rs:793-799). The insert path serializes a typed Rust row and writes it as row_json (src/crates/netsky-db/src/lib.rs:1522-1542).
This shape is deliberate. Fields live inside JSON so new keys can appear without forcing a table rewrite on every schema turn. The exceptions are source_cursors and events, which keep real columns because the CLI needs direct field reads and those tables have a small fixed shape (src/crates/netsky-db/src/lib.rs:814-838).
Here is one real messages.row_json object from the live database on April 19, 2026:
{ "id": 1776625628327431, "ts_utc": "2026-04-19T19:07:08.285Z", "source": "agent", "direction": "inbound", "chat_id": "agentcron", "from_agent": "agentcron", "to_agent": null, "body": "hourly status: in one short paragraph, summarize active clones...", "raw_json": "{\"chat_id\":\"agentcron\",\"from\":\"agentcron\",\"ts\":\"2026-04-19T19:07:08.147203+00:00\"}" }
The MessageRow writer shows the same field set in code: ts_utc, source, direction, chat_id, from_agent, to_agent, body, and raw_json (src/crates/netsky-db/src/lib.rs:846-857).
the read surfaces #
Three surfaces matter day to day.
netsky query opens the database read-only, snapshots batches, and emits plain tables or JSON envelopes (src/crates/netsky-cli/src/cmd/query.rs:9-32):
netsky query "SELECT source, COUNT(*) AS n FROM messages GROUP BY source"
scripts/meta-db.py attaches SQLite into DuckDB, builds meta_* views by extracting JSON keys, and ships a few operator subcommands on top (scripts/meta-db.py:148-153, scripts/meta-db.py:195-225, scripts/meta-db.py:568-597):
uv run scripts/meta-db.py recent --hours 6 --limit 20
netsky watchdog events reads the durable JSONL trail, not meta.db, so watchdog forensics survive a database outage (src/crates/netsky-cli/src/cmd/watchdog_events.rs:29-76, src/crates/netsky-cli/src/cmd/watchdog.rs:552-576):
netsky watchdog events --since 6h --json
There is also a daily rollup layer. netsky analytics daily aggregates one UTC day out of meta.db into JSON and HTML under ~/.netsky/analytics/ and can emit a Zorto page into website/content/analytics/ (src/crates/netsky-cli/src/cmd/analytics.rs:20-50, src/crates/netsky-cli/src/cmd/analytics.rs:136-166).
what gets captured #
Schema v7 has 19 tables. Seventeen use the (id, row_json) shape. Two are structured support tables (src/crates/netsky-db/README.md:27-46, src/crates/netsky-db/src/lib.rs:33-51, src/crates/netsky-db/src/lib.rs:814-838).
| table | one-line purpose |
|---|---|
messages | inbound and outbound message envelopes |
cli_invocations | every CLI run with argv, exit, duration, host |
crashes | crash kind, agent, detail |
ticks | ticker and watchdog heartbeat rows |
workspaces | workspace lifecycle |
sessions | agent session lifecycle |
clone_dispatches | clone brief and execution lifecycle |
harvest_events | harvest and cherry-pick outcomes |
communication_events | normalized comms audit log |
mcp_tool_calls | MCP request and response timing |
git_operations | git mutations and pushes |
owner_directives | trusted owner directives and resolution |
token_usage | model usage and cost rows |
watchdog_events | watchdog state transitions |
netsky_tasks | local task tracker rows, exposed as tasks in DataFusion |
source_errors | bounded per-source failures |
iroh_events | iroh connects, evicts, reconnects, refusals |
source_cursors | durable per-source cursor state |
events | per-source delivery log with pending, delivered, failed |
That inventory is wider than “logs.” It is enough to answer operator questions about control, cost, delivery, failure, and audit without scraping tmux panes or grepping random files.
limits #
This is not a streaming telemetry stack. There is no real-time subscription surface yet. There is no retention or eviction policy yet. There are no resident dashboards. The primary interfaces are SQL, small CLIs, and generated daily reports (src/crates/netsky-db/README.md:119-126, src/crates/netsky-cli/src/cmd/analytics.rs:136-166).
That restraint is part of the design. The first job is to make the system tell the truth in one durable place.
The observability spine enables the next layer cleanly: cron-fire monitors, token-usage rollups, owner-visible audit trails, and watchdog forensics that survive the moment they are needed.