● Methodology

How we turn logs into economics.

No black box. Every number Codenomics shows is derived from your agents' own logs with rules you can read, audit, and override. Here's exactly how.

The headline metric: true cost per commit

Token dashboards answer "how much did I spend?" Codenomics answers "what did each shipped change cost me?" — which is the question that actually compares one model, agent, or workflow against another.

true $/commit = ( compute $ + prompts × attention $ + active-time × hourly $ ) / commits
true $/commit equals compute $ plus prompts times attention $ plus active time times hourly $, all over commits shipped — here $16.60.
Every input traces to your own logs and drivers — change a driver and every metric re-derives.

Three inputs, three sources of truth:

The result: a model that costs more per token but needs fewer prompts and less of your attention to ship the same commit comes out cheaper. That inversion is the entire reason the metric exists.

Compute cost is cache-aware

Naïve token counting overstates cost by ignoring caching, which dominates real agent workloads. Codenomics prices each token class separately, per vendor:

Each token class priced separately: cache read about 0.1x, fresh input/output 1x, 5-minute cache write 1.25x, 1-hour cache write 2x.
Built-in rates are per vendor and overridable — set exact $/MTok the day a price changes.
"API-equivalent" is deliberate. On a subscription plan your marginal dollar is zero. Read these figures as a normalized compute meter that makes models and agents comparable on the same scale — not as your invoice. Built-in prices drift as vendors change them; every model is overridable in config.

Drivers: your assumptions, made explicit

The honest answer to "what does a prompt of my attention cost?" is: it depends on you. So Codenomics doesn't hard-code it. Drivers are named inputs you control — and changing one re-derives every metric instantly, because cost is never baked into stored data, only computed at read time.

This is the part an API proxy structurally can't see: the human side of the equation lives where the human is — on your machine, under your assumptions.

What counts as a "deliverable"

Today the proxy for shipped work is a commit, detected from the agent actually running git commit in its tool calls — not merely being asked to. We count it as the git subcommand, so git log, git show, git config commit.template and --dry-run don't count, and chained commands (git commit … && git commit …) count each. Where an agent's logs can't reveal commits (Gemini's telemetry, for instance), the field is shown as rather than faked to zero.

It's a proxy, and we're upfront about its limits: transcripts rarely carry exit status, so a commit that failed (nothing staged) can still count, and a squash/amend is indistinguishable from a new commit. On the roadmap: merged-PR ground truth via GitHub, so "deliverable" graduates from a good proxy to the real thing — including review cycles and reverts.

One model across every agent

Each agent writes a different, undocumented log format. Codenomics has a dedicated collector per agent that normalizes them into one schema:

These are fast-moving private formats, so parsers are tolerant by design: malformed lines are skipped, unknown events are counted as drift you can inspect, and a file that won't parse is quarantined without breaking the run.

Human vs machine work

Interactive sessions and headless/CI agents have completely different economics — one consumes your attention, the other doesn't. Codenomics separates them from each session's entrypoint, so attention cost is only ever charged to work a human actually supervised. It's also how reports can tell you when automated jobs are quietly running on a premium model.

The privacy model

Codenomics is local-first, and that's a load-bearing design choice, not a tagline. This tool reads every transcript on your machine — so it has to be one you can trust by inspection.

Prompts, code, transcripts and file paths never leave your machine; only opt-in aggregates (token, commit and prompt counts per model plus a project label) reach the cross-org benchmark, shown only at k of at least 5 orgs.
The only bytes that can leave are opt-in aggregates — inspect them with codenomics sync --json.

The benchmark: how "is that good?" gets answered (Team, planned)

Locally, Codenomics tells you your true $/commit and which of your models wins. The question one machine structurally can't answer is whether that number is good — that needs a view across many teams. The Team benchmark answers it, built from the same opt-in, aggregates-only sync described above. Because it's the one feature that sends anything off-machine, here are the rules a technical buyer should hold us to:

Status: the local metrics above ship today; the cross-org benchmark is the Team plane (Phase 2), being built from founding-cohort opt-ins. Until the cohort can support a claim, the benchmark is labeled early and shows its sample size — no number is presented as a market norm before it is one.