audit dependencies like code

Netsky is lean on the surface and heavy in the core. The audit snapshot counted 8 workspace crates, 40 direct dependencies, 638 resolved packages, a 67 MB release binary, and roughly 10 GB in target/deps. The shape is not accidental. A few large bets buy real behavior, and a few of them may not earn the rent.

The workspace fits on one screen. The root manifest declares 8 crates — netsky-core, netsky-prompts, netsky-channels, netsky-ai, netsky-cli, netsky-sh, netsky-io, netsky-db (Cargo.toml:1-12) — and a short shared-dep table naming the obvious base layer: anyhow, chrono, clap, croner, dirs, fs4, indicatif, owo-colors, serde, serde_json, toml, tracing, tracing-subscriber, which (Cargo.toml:26-40).

cargo metadata --no-deps --format-version 1 \
  | jq '[.packages[].dependencies[] | select(.source != null or .path != null) | .name] | unique | length'

rg -c '^name = ' Cargo.lock
du -sh target/debug/deps
ls -lh target/release/netsky

That is the audit loop: count direct names, count the lockfile, look at the binary, look at the build cache. 40 direct and 638 resolved. The ratio is the number that matters — branch-level off-by-ones from adding or removing one small crate don’t.

flowchart LR
    A[8 workspace crates] --> B[40 direct deps]
    B --> C[638 resolved packages]
    C --> D[67 MB binary]
    B --> E[3 heavy bets]
    E --> F[datafusion turso iroh]

the heavy hitters #

Most dependencies do not decide the system’s cost. Three stacks do.

datafusion brings the Arrow query stack into netsky-db (src/crates/netsky-db/Cargo.toml:10-20). Netsky uses it through one narrow path: snapshot storage tables, register them into a SessionContext, run ctx.sql(sql), collect RecordBatch output (src/crates/netsky-db/src/lib.rs:1896-1904, src/crates/netsky-db/src/lib.rs:2651-2728). No custom optimizers, no UDFs. Nothing in the observed workload needs a general analytical engine on every install.

turso is the durable database bet, next to datafusion and tokio in netsky-db (src/crates/netsky-db/Cargo.toml:10-20). Its cost is not one crate name — it’s a native SQLite shape, an async database surface, crypto-adjacent transitive packages, and the future option value of remote sync. If local SQLite is all netsky needs, rusqlite wins on cost; the CLI already carries rusqlite with bundled SQLite (src/crates/netsky-cli/Cargo.toml:38-41) and the database crate uses it in dev deps (src/crates/netsky-db/Cargo.toml:22-24), so the alternative stays visible.

iroh is the network bet. It appears in netsky-channels and netsky-io pinned with default features off and metrics on (src/crates/netsky-channels/Cargo.toml:13-30, src/crates/netsky-io/Cargo.toml:14-35), buying QUIC, TLS 1.3, node identity, and cross-machine transport. Expensive, but the system has a network story — a root constellation relays to siblings, and secure machine-to-machine envelopes justify a heavy encrypted transport.

The table:

betwhat it buyscost shapeverdict
tursodurable SQLite-compatible store and future remote syncnative database stack plus async surfacekeep on probation
datafusionSQL over Arrow RecordBatch snapshots245-package Arrow stack in the audit probemost likely pivot
irohQUIC/TLS cross-machine envelopesnetworking and crypto transitive loadkeep if the network stays first-class
clapbroad nested CLI parsingderive macros and command graph machinerykeep
reqwestboring HTTP, JSON, TLS, proxy behaviormedium stack, repeated in source clientsmedium keep
tokioasync runtime where the IO surface needs itruntime plus feature spreadkeep where runtime paths use it
serde, anyhow, thiserrordata model and error floorlow cost and high leveragekeep

the small cuts #

Two XS cuts surfaced: tokio in netsky-channels and croner in netsky-cli, both dead per cargo machete. Hygiene work, not the prize.

Dead dependencies are lint failures. Heavy dependencies that earn their keep are architecture choices; ones that no longer do are technical debt with a manifest entry.

The bigger local win is source duplication. netsky-channels owns shared auth, policy, and channel clients (src/crates/netsky-channels/Cargo.toml:1-11). netsky-io owns channel servers and transports (src/crates/netsky-io/Cargo.toml:1-12), and its sources module declares first-class modules for calendar, drive, email, imessage, tasks (src/crates/netsky-io/src/sources/mod.rs:7-14). Drive lives in both trees with the same REST shape (src/crates/netsky-channels/src/drive/ops.rs:1-12, src/crates/netsky-io/src/sources/drive/ops.rs:1-13). Not a dependency-count problem — owned code duplication. Removing it cuts maintenance load without touching product behavior.

the feature gate that changes the default #

The best dependency cut is the one users do not notice.

The CLI now has default = [] and a parquet-export feature that pulls in arrow, arrow-csv, and parquet only when requested (src/crates/netsky-cli/Cargo.toml:17-20, src/crates/netsky-cli/Cargo.toml:38-40). CSV analytics export stays always-on. Parquet export becomes an opt-in build.

The code mirrors the policy. Analytics export always writes CSV and routes Parquet through a feature-gated writer (src/crates/netsky-cli/src/cmd/analytics.rs:240-277, src/crates/netsky-cli/src/cmd/analytics.rs:299-337, src/crates/netsky-cli/src/cmd/analytics.rs:348-395). The test asserts both modes: CSV exists, Parquet bytes are zero when the feature is off (src/crates/netsky-cli/src/cmd/analytics.rs:4483-4524). CSV stays the default contract; Parquet is a build choice, and the shipped binary doesn’t carry a columnar export stack on the chance a dashboard wants it later.

the big-bets verdict #

clap stays. The command tree is broad, nested, and operator-facing — constellation lifecycle, watchdog, ticks, cron, loop, launchd, handoffs, iMessage, email, calendar, Drive, iroh, channels, IO, config, prompts, observability, hidden namespaces (src/crates/netsky-cli/src/cli.rs:8-90, src/crates/netsky-cli/src/cli.rs:250-299). Replacing that with a tiny parser saves dependencies and spends them back in validation bugs and help-text drift.

reqwest probably stays on probation. The channel and IO crates use it with blocking, JSON, and rustls TLS features (src/crates/netsky-channels/Cargo.toml:22, src/crates/netsky-io/Cargo.toml:25) to buy boring correctness for OAuth-backed Google APIs, Drive uploads, Gmail, and other HTTP paths. Worth paying for until the call surface is narrow enough to own.

rustls stays indirectly, through reqwest and the network stack. Writing TLS code to win a lockfile argument is the wrong trade.

tokio stays where async is real. netsky-db uses a runtime to execute DataFusion queries (src/crates/netsky-db/src/lib.rs:1896-1904); netsky-io uses async servers. Runtime deps earn their place from runtime behavior, not habit.

turso is on probation. Right bet for remote-compatible database behavior, wrong bet if the durable store is local SQLite plus a small query layer. The migration question: what exact Turso behavior breaks if the facade moves to rusqlite?

datafusion is the most likely pivot. Current use is a read snapshot, a SQL string, a collected result — 245 packages for that. A dozen fixed analytics queries probably lose to a small query facade or direct SQLite views. Arbitrary owner-authored SQL over observability tables earns DataFusion more room.

vendor candidates #

The 2026 rule: AI makes small vendored maintenance real. A 1,000-line crate with tests, a narrow API, and one owner is tractable now. A 50,000-line tree of decisions we don’t use isn’t automatically cheaper.

Ranked by payoff and blast radius:

  1. clap replacement or slim fork: 1.5k to 3k LoC. This is the biggest possible win, but the current CLI is broad enough that the replacement must preserve help, aliases, enums, defaults, and nested subcommands before it deserves a branch.
  2. reqwest replacement: 300 to 800 LoC for the actual surface if the source clients mostly need authenticated JSON, multipart upload, and a small error model. This becomes attractive after source-client dedupe.
  3. Analytics export stack: CSV-only default is already right. Keep Parquet opt-in. If the opt-in still pulls too much, isolate it behind a tiny internal writer boundary.
  4. Turso local-only facade: 600 to 1,200 LoC if local SQLite is the only durable requirement. Keep the facade honest so the backend can move without touching callers.
  5. DataFusion slim fork: not first. Pivot before vendoring — a narrow query path wants narrower code, not a private copy of a large engine.

The best vendor target isn’t the biggest dependency. It’s the one with a small used surface, stable behavior, clear tests, and a costly transitive graph.

the rule #

Netsky is a systems project and buys hard things when the cost/benefit is overwhelming: TLS, QUIC, OAuth-grade HTTP, CLI help, serialization, structured errors, database durability. The rule is stricter than “keep it lean.” Every big bet needs a written reason, and every written reason should survive a fresh lockfile count, a binary-size check, and one uncomfortable question: would we rather maintain 1,500 lines we understand, or build around 245 packages we barely touch?

One heavy bet costs weeks of build time compounded over years. Audit them like code.