None of us Read the specs

AI shifts the work, it doesn't eliminate it.
Fund this Blog

After using Large Language Models extensively, the same questions keep resurfacing. Why didn't the lawyer who used ChatGPT to draft legal briefs verify the case citations before presenting them to a judge? Why are developers raising issues on projects like cURL using LLMs, but not verifying the generated code before pushing a Pull Request? Why are students using AI to write their essays, yet submitting the result without a single read-through?

The reason is simple. If you didn't have time to write it, you certainly won't spend time reading it. They are all using LLMs as their time-saving strategy. In reality, the work remains undone because they are merely shifting the burden of verification and debugging to the next person in the chain.

AI companies promise that LLMs can transform us all into a 10x developer. You can produce far more output, more lines of code, more draft documents, more specifications, than ever before. The core problem is that this initial time saved is almost always spent by someone else to review and validate your output.

At my day job, the developers who use AI to generate large swathes of code are generally lost when we ask questions during PR reviews. They can't explain the logic or the trade-offs because they didn't write it, and they didn't truly read it. Reading and understanding generated code defeats the initial purpose of using AI for speed.

Unfortunately, there is a fix for that as well. If PR reviews or verification slow the process down, then the clever reviewer can also use an LLM to review the code at a 10x speed.

Now, everyone has saved time. The code gets deployed faster. The metrics for velocity look fantastic. But then, a problem arises. A user experiences a critical issue.

At this point, you face a technical catastrophe: The developer is unfamiliar with the code, and the reviewer is also unfamiliar with the code. You are now completely at the mercy of another LLM to diagnose the issue and create a fix, because the essential human domain knowledge required to debug a problem has been bypassed by both parties.

When Your Output Becomes Someone Else's Input

This issue isn't restricted to writing code. I've seen the same dangerous pattern when architects use LLMs to write technical specifications for projects.

As an architect whose job is to produce a document that developers can use as a blueprint, using an LLM exponentially improves speed. Where it once took a day to go through notes and produce specs, an LLM can generate a draft in minutes. As far as metrics are concerned, the architect is producing more. Maybe they can even generate three or four documents a day now. As an individual contributor, they are more productive.

But that output is someone else’s input, and their work depends entirely on the quality of the document.

Just because we produce more doesn't mean we are doing a better job. Plus, our tendency is to not thoroughly vet the LLM's output because it always looks good enough, until someone has to scrutinize it.

The developer implementing a feature, following that blueprint, will now have to do the extra work of figuring out if the specs even make sense. If the document contains logical flaws, missing context, or outright hallucinations, the developer must spend time reviewing and reconciling the logic.

The worst-case scenario? They decide to save time, too. They use an LLM to "read" the flawed specs and build the product, incorporating and inheriting all the mistakes, and simply passing the technical debt along.


LLMs are powerful tools for augmentation, but we treat them as tools for abdication. They are fantastic at getting us to a first draft, but they cannot replace the critical human function of scrutiny, verification, and ultimate ownership.

When everyone is using a tool the wrong way, you can't just say they are holding it wrong. But I don't see how we can make verification a sustainable part of the process when the whole point of using an LLM is to save time.

For now at least, we have to deliberately consider all LLM outputs incorrect until vetted. If we fail to do this, we're not just creating more work for others; we're actively eroding our work, making life harder for our future selves.


Comments

There are no comments added yet.

Let's hear your thoughts

For my eyes only