I keep seeing posts like this: “I’m coding in three IDEs at once now! LLMs generate code so fast, I’m basically a demon programmer!”
Cool story, bro. But let’s take a deeper look. If AI tools like Copilot actually turn average devs into 10x coders, why isn’t every open-source project drowning in PRs? Where’s the tsunami of contributions fixing Python’s 7,000+ unresolved issues or React’s 500+ issues?
Instead, GitHub feels… normal. No, making 3 commits a minute like @levelsio doesn't count. Maintainers still beg for help. Critical projects like OpenSSL still rely on a skeleton crew. And that guy with three IDEs? He’s probably just generating three times as much code to delete later.
Let’s be honest, LLMs are fantastic at the first 80% of a project. Need boilerplate for a CRUD app? Sure! A script to rename files? Absolutely. That’s why every “10x developer” demo is a todo app or a snake game.
In reality, the last 20% is where real engineering happens. The edge cases. The architecture refactors. The “oh god, I didn't think this through” moments. LLMs crumble here.
The Open Source Test: Let’s Settle This
Here’s my challenge. If AI truly makes devs 10x faster, let’s deploy it where it matters. Open source.
- Task every “10x dev” with resolving a recently reported PHP/Python/Golang bug.
- Require all PRs to include tests, docs, and maintainer approval.
Spoiler: It won’t work. Why? Because LLMs can’t handle the human part of coding.
Proponents say AI lowers the bar for contributing to OSS. But open source isn’t a coding free-for-all. It’s a curated collaboration. When the person raising the PR isn't even reading the code, it wastes everyone's time.
You don't get a free pass to just push a PR and blame it on the AI. If you asked AI to generate 2,000 LoC you better review all 2,000 LoC because you're asking your reviewers to look through all of them too. AI does not absolve a dev of their code.
Benchmarks are currently used to test the performance of a model. I'd suggest, since these models end up being deployed in the real world, they should be tested against real world problems.
- Take an existing code base
- delete a PR that resolves a real issue
- have the LLM resolve the issue
- and test the code.
Since we already have a solution, it shouldn't be a challenge to test if the LLM is doing what it is advertised to do. This grade will be more grounded in its capability.
Until then, the “10x developer” is just a LinkedIn fantasy.
To be fair, Copilot saves me from Googling readFileSync()
for the 100th time. But claiming AI turns us into coding gods is like saying spellcheck makes you Shakespeare.
If you’re truly a 10x developer now, prove it: Go fix Chromium’s... let me quickly check... 15,000 open issues. I’ll wait.
In the meantime, I’ll be over here, gently nudging Copilot to write a function that doesn’t suck.
Comments
There are no comments added yet.
Let's hear your thoughts