CS & Philosophy

Naming Things Is the Last Hard Problem in Computer Science

Cache invalidation got tooling. Off-by-one errors got tests. Naming is still mostly vibes, and that has cost us more than we admit.

Phil Karlton’s joke — “there are only two hard things in computer science: cache invalidation and naming things” — is repeated so often that the actual claim has been lost. The claim is that naming is hard at the same level cache invalidation is hard: not as a polish step, but as a load-bearing engineering activity.

We have made tremendous progress on cache invalidation. We have CDN purge APIs, vector clocks, content-addressed storage. We have not made remotely as much progress on naming. Naming is still, almost everywhere, a matter of taste, of habit, of whatever the original author happened to type at 2am.

Why bad names cost so much

A bad name is not a lexical error. It is a type error in the social system that maintains the code — one of the recurring themes of the philosophy of computer science, and one we keep underestimating. A function named processItem invites every future caller to project their own theory of what the function does — and one of them will be wrong. A boolean called flag becomes a magnet for special cases. A class called Manager accretes responsibilities until it manages the heat death of the universe.

The cost compounds because naming determines who can change the code. A reader who has to derive intent from the body before they can edit safely is a slow reader. A reader whose grasp of the system depends on memorizing nicknames is a brittle reader. Names are the project’s API for its own contributors.

// Both compile. One is a debugging session.
function check(u: User, x: number): boolean { /* ... */ }
function userHasSufficientCredits(user: User, requiredCredits: number): boolean { /* ... */ }

You can argue the second is verbose. You cannot argue the second is unclear at the call site, six months later, in a postmortem.

What “good naming” actually requires

It requires three things, none of which are tooling:

The remaining hard problem

We can cache-invalidate at scale because we treated invalidation as engineering. We can mostly avoid off-by-one errors because we treated them as engineering. The same move is now overdue for training-data provenance — also currently shrugged off as taste, also paying the same compounding bill. Naming is the last large category of bug we have collectively decided to leave to “taste”. That decision is more expensive than it looks, and the bill keeps coming due in onboarding time, in ambiguous bug reports, in the slow compounding cost of code nobody quite understands.

We could decide otherwise. It would be cheap. We just don’t, yet.