How Many Tokens Did You Burn Today

Early in my career, a manager at one of the big firms where I worked made a request so absurd it remains etched in my memory. I walked back to the team, repeated what he had asked, and couldn't finish the story without laughing. He wanted me to create a pie chart, of lines of code, per developer, per week.

We all lost it. Our lead developer asked if, by any chance, the manager's eyes looked glassy. We laughed even harder. Because yes. Yes, they did. He was always high.

That was twenty years ago. I've repeated that story countless times, and it always drew chuckles as we discussed the disconnect between software teams and management. Any software engineer could relate. We all knew that lines of code were a meaningless metric. A junior could write a thousand lines of spaghetti. A senior could fix the same problem with forty elegant ones.

But then, last week, I found my name at the top of a leaderboard.

My employer had been exploring productivity tools and trialed one they thought would be useful. After the trial, they were quoted $500k a year. The tool tracked developer productivity and integrated with Atlassian products, Microsoft, and many other services we used. The price was too steep, so it was dropped. A couple of months later, the same company came back with a discount. The exact same tool for just $50k a year. My employer jumped at the opportunity.

How many bytes did you use today?

I'm looking at this dashboard right now and I see my name at the top of the leaderboard. I click on the widget, and a pie chart appears. There it is: a breakdown of the total lines of code my team has produced using AI, by individual.

This isn't limited to my employer. Every company is putting something together to track AI usage and justify the investment. Instead of tracking project completions, we're tracking how many lines of code each developer generated with AI. And the joke's on me, because nobody is laughing. The whole industry is applauding and encouraging employees to use more of it.

I didn't become the champion because I have some neat agentic workflow. It was done by complete accident. While using an LLM, I accidentally selected "planning mode" for a request that had already been planned. The agent ran for several minutes, burning tokens to resolve a problem that didn't exist. Just like that, I made it to the top, without ever writing a single line of code.

If this widget is taken at face value, it won't be long before developers start gaming it deliberately. Just let the agent run overnight, and your employer can claim a 10x improvement in productivity.

We didn't use line count as a productivity metric in the past because it never made sense. Whenever we refactor code, we often end up with less than we started with. In fact, much of the time I spend modifying AI-generated code is spent deleting unnecessary things it created. Should we track negative lines of code? The better you are at programming, the worse your numbers look. We are assessing developers by the lines of code.

I've watched AI evangelists ask "how many tokens did you burn today?" They were trying to convince an audience that productivity is directly proportional to token usage. It reminds me of the transition from paper to computers. A computer evangelist of that era might have asked: "how many bytes did you use today?"

Token counts, lines of code, bytes, none of these have anything to do with actual productivity. Metrics are often entirely disconnected from what they're meant to measure. I've seen companies rely on story points only to watch employees point every ticket as high as possible. Choose lines of code as your metric, and lines of code will increase. Reward the highest contributor, and watch everyone double or triple their output by the next performance review.

It's a silly metric but it serves a purpose, just not yours. AI companies promote token usage and associate it with productivity because they directly benefit from it. Imagine an internet service provider that charges by the byte. What would their recommendation for productivity be? "Use more bytes!"

The best engineers I've ever known wrote less code, not more. They deleted things. They simplified. They understood that the goal was never the code itself. They solved problems, they made the system reliable, and they served the user. Measuring developers by output volume, whether that's lines, commits, or tokens, mistakes the exhaust for the engine.

Every era of tooling brings a new class of metric that mistakes activity for value. The spreadsheet didn't make accountants more productive just because they could fill more cells. AI won't make developers more productive just because it can generate more code.

We aren't even tracking if the right problems are being solved, and solved well. If the productivity dashboard can't answer that, it's not measuring productivity. It's measuring the subscription.

How Many Tokens Did You Burn Today

About Ibrahim Diallo

Keep Reading

In the Age of AI, Don't Forget About System 2

Jokes on You, We All Use Juicero Now

Browse the full archive for more essays and engineering notes

Join the Conversation