Where I put stuff
by Simon
LLMs change what’s scarce – the ability to produce median-or-above quality code. However, they don’t solve social problems, or organisational problems, and the acceleration they provide is likely to make them worse.
What remains scarce is verification, rationale capture, senior capability (e.g. high-level architectural oversight and judgement, integration reasoning, threat modelling) (and the social / epistemic machinery that keep systems coherent over time). Some of these are (at least partially) solvable by throwing more LLMs at the problem, but others are only harmed by that approach.
The bottleneck shifts. The trick is in predicting where the next bottleneck will be, and hopefully designing a system that ideally avoids/ameliorates it, but at least doesn’t get irreversibly harmed by running into it.
And you need to build (at least some of) this up-front – for example you cannot solve rationale capture post-hoc; it’s too late and the rationale is irretrievably lost.
Note that many of these problems either only surface in the medium to long term, or are self-masking (tidy diffs, passing tests, but system decay – surface impression of health while the system-level story rots).
Also note, almost none of these problems are new – humans suffer from all of them – but we understand the shortcomings and ameliorations in human-space for these things. We are about to shift the system drastically though, and the question is really: how will these existing failures impact our future systems and workflows?
Many of these bottlenecks are not technical at all; they are organisational and incentive-aligned failures that LLMs amplify.
Here are some non-independent, overlapping thoughts about bottlenecks:
LLMs can be used to improve this situation, but they must be built carefully and intentionally to do this:
If the same model writes the spec, implements it, then reviews it, you get correlated error, not independent checking – you wouldn’t accept a dev doing their own reviews; we need diversity of thought/blind spots, and we have systems built around this for humans; it’s true for LLMs too.
LLMs make agreement cheap (and are rather tuned to prefer agreement), leading to convergence on shared but unexamined assumptions.
A less-obvious failure mode here is no obvious bad choices; it is simply that great choices may never be explored.
Does it become more difficult to learn the hard lessons in this system?
This can also lead to organisational loss of continuity – less cross-engineer discussion (since we’re all chatting with our LLMs) means less ‘hive-mind’, less cross-pollination of ideas, less shared context, less lesson-learning from others.
This could be seen as a process failure – but it’s a new process we’ll need.
Prompt and workflow design can force breadth – but you need to do this intentionally.
Rationale loss is very much not a new problem, just an exacerbation of an existing one.
“Why does this code exist?” is often solved with git blame + asking that person.
“if it matters, it goes somewhere durable”.
Given that people are lazy and feckless, actually this could be an improvement.
If the predicted failure modes above are real, the design response is not more output, but explicit counter-structures.