Surface Tension

We're hearing a version of the same story across engineering orgs right now. A new manager — strong pedigree, credible background, confident in every meeting — opens a PR using an AI coding tool. The tests assert that 3 == 3. An hour of explanation later, he still can't articulate why this is wrong. He isn't playing dumb. He genuinely doesn't know.

This isn't a story about a bad hire. It's a story about surface tension.

AI tools give anyone enough fluency to float. You can discuss distributed systems in a first-round interview without having debugged one in production. You can open PRs without understanding the codebase. You can lecture on Kubernetes terminology without knowing what a security group does. The surface holds until something heavy lands on it.

What's Changed

The old signals for engineering competence were self-correcting. Someone who didn't understand what they were doing would eventually say something that revealed the gap. Code reviews caught it. Debugging sessions exposed it. Conversations moved fast enough that improvisation was required, and improvisation reveals depth.

AI tools broke this feedback loop. Now you can produce fluent-sounding code, fluent-sounding design docs, fluent-sounding interview answers — and never face the moment where you have to construct an answer from first principles. The output looks right. The explanation sounds right. The gap stays hidden until there's a production incident at 2am and nobody on the call understands the system they're trying to fix.

The deeper problem isn't that some people are using AI to fake competence. It's that the faking has become ambient. Engineers who genuinely know their craft are using the same tools as engineers who don't, producing similar-looking artifacts, and the tells that used to separate them have gone quiet.

The Weight Test

Surface tension is a property of undisturbed surfaces. It breaks the moment weight lands.

In engineering teams, weight comes from specific places: production incidents, on-call rotations, anything that requires improvisation without a prompt. The moment someone has to debug a cascading failure in a system they didn't write, explain to a customer why data was corrupted, or make an architectural call under time pressure — that's when the surface breaks.

We've watched teams discover, months after shipping, that nobody on the incident could actually trace the call path. The code was generated fluently. The PR descriptions sounded considered. The tests passed. Nobody understood how it worked.

This is different from the technical debt problem we've always managed. Traditional technical debt accumulates from shortcuts taken under pressure — someone made a decision, knew it wasn't ideal, moved on. The intent is recoverable. AI-generated code assembled without genuine understanding often has no recoverable intent. It's a collection of fragments that happen to be in the same file, optimized for coherence at the line level, not the system level. There's no human reasoning to trace back. The rabbit hole has no end.

What This Means for Leaders

The old hiring and management playbook assumed that evaluating outputs was a reasonable proxy for evaluating competence. It isn't anymore.

Code review is no longer "does this work." It's "does this person understand why this works." The distinction sounds small. It's enormous. A team that produces working code without understanding it will, eventually, produce broken code without understanding why — and have no idea where to start fixing it.

This changes what you're looking for in interviews. Fluency is cheap now. Push for reasoning. "Walk me through what would happen if this fails" tells you more than "describe your approach." "What does this break that we haven't considered?" is worth more than any solution summary. The weight test isn't about making candidates uncomfortable — it's about finding out if there's something underneath to hold them up.

It changes how you run design reviews. The confident summary is easy to generate. The question nobody's asking is the one that matters: push past the architecture diagram and into the failure modes. If the answer is specific and grounded, you're seeing real depth. If it pivots to a different high-level concept, you're seeing surface tension.

It changes how you think about your incident response capacity. Who on your team can navigate an unfamiliar codebase under pressure — without a prompt? Not who wrote it, not who approved the PR. Who can hold the system in their head and reason through failure modes in real time? That's your actual load-bearing engineering capacity, and it may be much smaller than your headcount suggests.

The Compounding Problem

The engineers who have the pattern recognition to cut through surface tension — who can tell genuine depth from well-prompted fluency — are exactly the engineers whose value just increased. Their judgment can't be AI-assisted. But many organizations don't realize this until they've already diluted their teams with engineers who can only perform in calm water.

Here's the fractal version: if the engineers don't understand the system, the managers won't either. A leader who manages by AI output rather than by judgment will hire and promote in the same pattern. The org optimizes for artifact production while depth quietly drains away. Everything looks fine in the metrics until it doesn't.

We've watched this arc play out. A company ships fast, velocity metrics look strong, and then one quarter the system starts breaking in ways nobody predicted. Post-mortems reveal the team has been building on top of things they don't understand. The output was real. The understanding wasn't.

What to Do About It

Don't blame the tools. AI coding assistants are genuine force multipliers in the hands of engineers who understand what they're building. Surface tension isn't caused by AI — it's caused by organizations that optimize for output metrics while ignoring depth signals.

The antidote is weight. Put real production load on your team's understanding, deliberately and regularly. Run post-mortems that require people to trace exactly what happened and why — not to assign blame, but to surface where understanding actually lives. Require code reviewers to explain a change without re-reading it. Create the conditions where depth becomes visible before the incident makes it visible for you.

If your engineering leadership team can't credibly do this assessment, bring in someone who can. The pattern recognition that distinguishes surface fluency from genuine depth takes years of production experience to develop. It doesn't come from credentials or pedigree — it comes from having been in the room when the surface broke.

That's exactly what we bring to our engagements: not more output, but accurate judgment about whether the output is what it appears to be. Talk to us before the incident does it for you.

What's Changed

The Weight Test

What This Means for Leaders

The Compounding Problem

What to Do About It

Ready to Transform Your Organization?