Negative Externalities of Gen-AI within Software Teams

Feb 08, 2026

I don’t have anything new to contribute to the big-picture war of words about the ethics of generative AI, or its environmental impact, or whether the software industry has reached an inflection point. I want to talk, very specifically, about how generative AI has affected my work as a software engineer so far.

I am not going to argue that using an LLM to generate code is either more or less productive than doing things the “old way.” I’m not qualified to decide that; I don’t use LLMs/agents to write code and I’m ignorant of all the best practices. I’m not opposed on principle to using coding agents and have found AI-sourced code review and chatbot rubber-ducking sessions helpful.

I am going to argue that, whether or not the benefits outweigh these drawbacks, generative AI has made some kinds of collaboration within software teams more inefficient. I want to enter some examples into the record of how LLM-assisted software development has costs as well as benefits, in the hope that everyone could be made better off by acknowledging and trying to mitigate them.

Concision

Building software on a team requires communication skill as much as technical skill. Much of this communication happens via English prose. Software engineers write technical proposals, design specifications, documentation, and PR descriptions. If these documents are written poorly, it makes it harder for other engineers on the team to understand the system being described.

In my experience, engineers using coding agents to implement features often use the agent to write PR descriptions too. These PR descriptions tend to be overly long and detailed. They might include a “Quick Overview of Changes” heading followed by a bullet-point list with seven or eight entries. Then there might be a “Detailed Overview” that describes the changes made in the PR in three or four paragraphs. The information conveyed may be accurate, but the problem is the lack of context and synthesis.

By the time a feature lands in PR, often the team has discussed the feature several times and maybe even met to evaluate different possible implementations. The coding agent doesn’t know this, so it can’t write, in the PR description, “implements new feature using implementation x like we agreed,” which would be a meaningful, useful shorthand to the team. Maybe one day the coding agents could listen in on every team meeting, but for now these contextual shorthands are out of reach. This makes much of the information in LLM-generated PR descriptions redundant. Actually useful information is harder to find in the wall of mostly unneeded text.

The PR descriptions generated by agents also lack hierarchy. By that I mean the agent throws a lot of information at you without distinguishing what is important. The description outlines the changes made in the PR, but what is the upshot for the team and for the system being built? If this PR contains seven or eight changes, what unites those changes? Is one more important than the others? Does one require the others? A human might write, “I added the new dropdown, but realized I couldn’t populate it from the endpoint we’re already using, so I had to add a new module wrapping this other endpoint that Sarah told me was available on the ‘foo’ service.” From this we immediately understand that the PR is primarily delivering a new dropdown, but also includes code to get data from a new backend service only because it was necessary for implementing the dropdown. A coding agent might present the two changes as co-equal bullet points, making the PR more confusing and therefore more time-consuming to review.

I’ve only talked about PR descriptions here, but similar problems arise when LLMs are used to generate any prose document shared between engineers.

Vigilance

Whether or not coding agents introduce fewer bugs than human authors, it is a problem that the kinds of bugs coding agents introduce are different from the kinds of bugs a human would introduce.

We know humans are bad at certain things. For code written by a human, we know to look for off-by-one errors, for unhandled edge cases, for race conditions. As far as I’ve seen, coding agents introduce all possible bugs with equal likelihood. This makes code review much harder.

Recently, I reviewed a PR that, among other changes, added a command-line argument to a Python web service that disabled a certain permission check. We wanted this permission check to be required by default (the most secure option), but also wanted it to be possible to override the default only by passing a super-explicit --require-x-perm=False at the command line. A coding agent produced something like the following code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14


# ...
# bunch of other code that might distract you
# ...

parser.add_argument(
    "--require-x-perm",
    action="store_false",
    default=True,
    help="Require x permission (default: True)",
)

# ...
# even more code that also looks important
# ...

This looks correct if you just skim it! After all, we clearly have default=True on line 8 and for good measure the agent insists on it in the help string on line 9. But in fact this code implements the flag such that passing --require-x-perm at the command line would disable the permissions check, since store_false is given on line 3 instead of store_true.

What’s frightening about this error is that it doesn’t seem like the kind of error a human would make. For one thing, store_false is so much less commonly used than store_true for a Python flag that a human wouldn’t reach for it without thinking about it. The code is also exactly wrong in a way that a human would probably notice just in the process of typing out the characters on his or her keyboard.

Arguably, the fault was mine for suggesting --require-x-perm=False as the design rather than --no-require-x-perm, which would have been more typical. But the agent didn’t throw up its hands and complain about the design. It just took misguided instructions and happily turned them into incorrect code.

Having to be vigilant for these kinds of errors means that every line of code now has to be treated with equal suspicion. Even if it’s the most trivial code imaginable, as a reviewer, you can’t take shortcuts. Perhaps this is the way code review should always have been done, because bugs certainly slipped in via trivial code before LLMs came along. But, in a world of limited time and attention, being able to take some shortcuts was efficient. Not being able to do that anymore makes PR review more time-consuming and stressful.

Authorship

Software is complicated. On a team, knowledge of the workings of the codebase is often distributed across several people. We don’t have a good way of recording who knows what, though people have tried with e.g. OWNERS files. In any case, who knows what changes all the time and any record will quickly become out-of-date.

Knowing who to reach out to when you have questions about a codebase is critical though. The institutional knowledge about the codebase lives outside the code itself and can only be learned by talking to the right person.

Previously, if somebody implemented a complicated PR modifying some system in the codebase, that was a strong signal that the author had a model of the system in his or her head. Seeing that your colleague, Joe, merged in a 1000+ line PR to the caching system last week meant you could be sure that Joe would be a good person to talk to about how caching works.

Today, it’s possible that Joe generated most of that PR using an LLM. Maybe he knows some of the high-level details but can’t answer your questions about one of the modules he added. In the worst case, maybe he can’t help you at all, because despite changing the system so extensively he at no point understood it. Your time asking him questions is wasted. Without the previously strong signal of authorship to rely on, you need to spend time asking around to see who understands which part of the codebase.

This problem of murky authorship undermining previously useful signals extends to technical documents written using LLMs. It used to be the case that reading something written by somebody else could be informative on at least two different levels: It could teach you about whatever the document is actually about, but also about what its author knows and believes. Maybe your coworker writes a document proposing one third-party library over another and one of the reasons they give is that their preferred library exposes more functional APIs (in the sense of functional programming). “That’s interesting,” you might think to yourself, “they’re probably a good person to talk to about that bug in our Clojure codebase.” But it might turn out that they know nothing about functional programming and the point about the functional APIs was just the LLM talking.

If I suspect something I’m reading has been written by an LLM, I immediately feel my time spent reading it is less useful. It tells me much less about the world than a “hand-written” document would, since it tells me nothing about its author.

Negative Externalities

I think that treating these inefficiencies as “negative externalities” is using the right metaphor. Perhaps people disagree so much about whether agent-assisted coding is productive in part because the gains in efficiency redound to one person and the inefficiencies can fall on others.

Maybe on balance the team still comes out ahead for using these tools. But, out of politeness, if nothing else, I think engineers using agents should address the problems agent-produced code (or prose) causes for their teammates.

Some of these inefficiencies might eventually be solved by improving technology. Maybe LLMs will get so good that they will only ever write to-the-point PR descriptions and never produce code with bugs. But I think the best solution is cultural, not technical.

Software teams should adopt a full disclosure policy when it comes to LLM usage, with the understanding that LLM-generated code or writing will be received under a different set of expectations and norms. Any PR created with substantial use of an LLM would need to be labeled as such. The same would go for any written document. This would go a long way toward alleviating the problems I’ve described.

If a PR description or technical document is verbose but also marked as “LLM-generated,” readers can decide for themselves how much time to devote to reading something that may be repeating things they already know, with confidence that there won’t be, buried somewhere in the text, an important sentence written by a human. The implicit onus on the reader to be attentive and not miss something that the author might have been trying to communicate is relaxed. If there is an important ramification to changes introduced in a PR, the expectation would be that the author must call that out explicitly outside the block of LLM-generated text in the description.

Likewise, PRs labeled as “LLM-generated” could be reviewed more methodically than non-LLM-generated PRs, allowing PR reviewers to adopt the appropriate review strategy depending on the provenance of the code. And labeling would make it clear whether a code contribution or technical document can be treated as indicative of what its author knows about a topic or not.

I don’t mean to disparage anyone using coding agents to produce code. I’m glad people are experimenting and think it’s likely that coding agents will allow teams to do things they couldn’t have done before. But I do want to rebut the narrative that coding agents are fueling some kind of free-lunch productivity bonanza. As far as I have seen, they make it easier to produce code but harder to collaborate. Hopefully we can find a way to preserve the former while mitigating the latter.