Codenomics — Know what each shipped change actually costs

The dashboard your logs become

One command. The whole picture.

Codenomics command center: blended true cost per commit $16.60 down 12%, model mix, attention load 58%, daily burn, best session $9.40 vs worst $48.10 per commit, a breached weekly budget, and a recommendation to route automated jobs to a cheaper model. Illustrative preview.

Local-first by design Runs on your machine No code uploaded No prompts uploaded No transcripts uploaded No proxy No account to start Open source

The blind spot

Your agents are the biggest line item. What are you getting for it?

Coding agents now out-spend every other AI tool in engineering. Teams burn tens of thousands a month and can't answer the only question that matters: is the expensive model actually worse value? Tokens and traces don't tell you. Shipped work does.

True $ / commit illustrative

$16.60

compute $ + prompts × attention $ + active time × hourly $
──────────────────────────────────
commits shipped

Why we built it

It started as one stubborn question.

When a model costs more on paper, how do you know if it's actually more expensive to ship with?

That question landed during the Fable 5 launch window. The new model burned more usage credits, so on the invoice it looked pricier. But the real question was practical: if it ships the same work with fewer prompts, fewer corrections, and less supervision, isn't it the cheaper one? The token meter couldn't say. So we built something on a laptop to find out — first as a throwaway script called ClaudeStats, then as the tool this became.

THE TRIGGER

Fable 5 launch

A "more expensive" model that might ship cheaper. No way to check.

THE EXPERIMENT

ClaudeStats

A local script reading agent logs against git activity to answer it.

THE TOOL

Codenomics, local-first

True cost per commit across Claude Code, Codex, and Gemini — on your machine.

Aggregates-only peer comparison, for design partners shaping the baseline.

The counterintuitive part

The model that burns more tokens can be the cheaper one.

A model that costs 2× per token but needs half the prompts, fewer corrections, and less of your time to ship the same commit wins on true cost. That comparison is invisible on a token meter — and it's the whole point of Codenomics.

From real usage

In one developer's 7-week dataset — 884 sessions, 811 commits across Claude Code and Codex — the model priced 2× higher per token shipped commits at ~⅓ lower true cost than a model half its token price, because it needed fewer prompts and less hands-on time. The compute bill alone wouldn't show it. How this is measured →

Single-developer sample; drivers set to $4/prompt and $300/hr; "commit" is a proxy for shipped work (see methodology). Your numbers depend on your drivers and task mix — run npx codenomics report on your own logs.

An iceberg of AI coding cost. Above the waterline, the small visible part billed to you: tokens, usage credits, plan price. Below the waterline, the large hidden part: babysitting and supervision, prompt retries, context resets, reviewing bad diffs, debugging failed runs, rework and reverts. — The biggest AI coding cost is often the human wrapped around the model — and it's the half a token meter never shows.

How it works

Three commands. No account. No upload.

Codenomics reads the logs your agents already write to disk and turns them into economics — entirely locally.

1

Init

Detects which agents you run and writes a config. Nothing leaves your machine.

$ npx codenomics init

2

Index

Parses Claude Code, Codex, and Gemini logs into per-session economics in seconds.

$ npx codenomics index

3

See it

Local dashboard, or generate weekly/monthly reports for the team and Slack.

$ npx codenomics serve

What you get

Which model actually ships cheapest?

Every model ranked by true $/commit — lowest wins. The cheapest tokens often land at the bottom: they burn more prompts and more of your time to ship the same work.

Model economics leaderboard ranked by true cost per commit: Opus wins at $14.20 despite premium tokens; Haiku reads $21.40 because it needs 14.8 prompts per commit; Gemini shows a dash because its commits are not detectable. Illustrative.

See the full dashboard — leaderboard, session box scores, daily burn →

The framing

Read your agents like a P&L.

Output on one side, what it took on the other, and a bottom line you can act on. The output is the work that shipped; the cost is compute plus the human attention and time around it. Divide one by the other and you get the only number that compares models honestly — true $/commit.

Commits are the deliverable today. PR-, bug- and revert-aware ground truth is on the roadmap — and labeled as such, never faked.

An AI coding agent P&L: output is 188 commits shipped (PRs, bugs and features marked roadmap); cost is $1,240 compute plus $1,880 attention equals $3,120 true cost; bottom line true cost per commit $16.60. Illustrative.

What's inside

Economics, not token counts.

⚖️

True cost per deliverable

Compute cost plus the human attention and time behind each commit — the one number that ranks models and agents by value, not volume.

🧩

Every agent, one view

Claude Code, Codex CLI, and Gemini CLI normalized into a single model — including subagent work most tools miss entirely.

🎛️

Drivers you control

What's a prompt of your attention worth? Your loaded hourly rate? Set the inputs; every metric updates instantly. Your economics, your assumptions.

🚦

Budgets & alerts

Dollar or token limits per day, week, or month — globally or per project. Breaches fail a one-line cron check, so overruns surface before the invoice.

📊

Reports that explain

Weekly and monthly Markdown + HTML with prior-period deltas, top sessions, and plain-English findings — "route these jobs to a cheaper model, save $X." Slack-ready.

🔒

Local-first by design

Prompts, code, and transcripts never leave your machine. The dashboard binds to localhost. Read the source and verify it yourself.

🔒

Your code never leaves your machine.

Codenomics reads logs that already exist on disk and derives metrics locally. Nothing is uploaded by any command today. When team sync arrives, it sends aggregate token/cost numbers and project labels only — never prompts, code, or transcripts (project labels are path-derived and can be hashed). See the privacy model →

Coming with Team

Your own data tells you what you spent. The benchmark tells you whether it's good.

Run it locally and you learn your true $/commit and which of your models wins. The one question a single machine can't answer is "compared to what?" — that needs a view across many teams. It's the one number you can't compute alone, and the reason Team exists.

How it stays private

The benchmark is built only from opt-in, aggregates-only sync — token and outcome counts per day, model, and project label. Prompts, code, and transcripts never leave any machine; inspect the exact payload with codenomics sync --json. Contribute anonymized aggregates, see where you stand.

Early and growing: we're seeding the baseline from design partners and a multi-month founder dataset, and we show sample size honestly — no "industry standard" claims at small n. Join the founding cohort → Design partners get 3 months of Team free and help define the baseline.

Who it's for

Built for workflow economics — not surveillance.

Codenomics measures whether your AI coding tools are saving time or quietly creating supervision debt. It is not a way to watch developers, and not another token dashboard.

✓ For

Solo builders leaning hard on AI coding agents
Engineering leads comparing models and agents on real cost
Founders managing agent spend against what actually ships
Teams asking whether AI tooling saves time or adds supervision debt

Not for

Production LLM API tracing or request-level observability
Employee surveillance or keystroke monitoring
Generic chatbot or product analytics
Teams who only want a raw token meter

Get started

Free for individuals. Forever.

The local tool is open-source and free. Team adds the benchmark — how your agent economics compare across the field — plus org-wide rollups, for the leaders who own the budget.