← Back to Insights

The Hiring Crisis Nobody Saw Coming

How AI coding tools broke technical interviews and what to do about it

7 min readBy The Bushido Collective
HiringAITechnical LeadershipEngineering TeamsInterviews

Engineering managers are hitting the same wall: technical interviews have stopped working.

A candidate submits a take-home that looks like it was written by a senior architect. Clean code, optimized algorithms, thoughtful abstractions. You bring them in for a live round, ask one question about trade-offs, and watch them freeze. Blank stares. Frantic typing in another tab. They can't explain the decisions in code they supposedly wrote.

You've probably seen the scene twice this quarter. Gergely Orosz at The Pragmatic Engineer has been documenting a version of it all year — candidates sailing through take-homes and collapsing the moment an interviewer asks them to explain a single line. Simon Willison has written repeatedly about how fluent AI-generated code looks to someone who can't evaluate it, and how quickly that fluency evaporates under questioning. The tools we use daily to ship faster have broken the way we evaluate the people we'd hand them to.

The Standard Loop Stopped Working

The technical interview loop was built on a simple premise. If someone can produce working code under observation, and do it again on their own time, they're probably a competent engineer. Take-home assignments tested problem-solving. Live rounds verified thinking. Neither test survives contact with Claude Code or Cursor.

Take-homes are unreliable now because the floor moved. A candidate with minimal technical knowledge can prompt an AI to generate production-quality code and submit it as their own. The code is often excellent, and it tells you almost nothing about the person who submitted it. Live rounds are harder to game in real time, but they're also artificial — nobody ships code in a vacuum anymore, not even the engineers you want to hire. Asking someone to implement a binary tree on a whiteboard tells you less than it did five years ago, and it didn't tell you much then.

Both halves of the loop leak signal. And because the surviving format — a clean artifact plus a confident walk-through — is easy to produce with AI help, the candidates who look best in your process are no longer the ones you most want.

AI Isn't the Enemy

We use these tools. Our teams use them. We want candidates who use them well — an engineer who can accelerate with AI in the loop is more valuable than one who can't. Charity Majors has written about this shift toward judgment as the scarce skill: the work that matters is increasingly review, decision, and debugging, not typing.

The problem isn't candidates using AI. The problem is that a standard loop can't distinguish an engineer using AI as a force multiplier from someone using it as a crutch. An engineer using Claude Code well understands what they're asking for, evaluates what comes back, catches the wrong answer, and decides whether to take the suggestion or rewrite it. When something breaks, they can trace the path. Someone leaning on AI produces a superficially similar artifact — but can't defend it, can't predict how it'll fail, and freezes when the surface gets scratched. Traditional interviews don't scratch hard enough to tell them apart.

This is the plausibility trap. AI-generated code is convincing but reveals nothing about the person who submitted it. Not their judgment. Not their failure modes. Not what they'd do when the code is subtly wrong and the tests pass anyway. And because the trap is baked into the artifact itself, tightening your rubric won't get you out — a better-scored plausible answer is still plausible.

What Actually Produces Signal Now

Across our experience building engineering teams at GigSmart, ToolWatch (now AlignOps), and Oxen.ai, the practices that still produce signal have one thing in common: they test evaluation, not generation.

The most reliable change we've made is replacing code-from-scratch with code review. Hand the candidate AI-generated code with deliberate flaws — a subtle race condition, a performance cliff, an abstraction that collapses at 10x scale — and ask them to walk you through it. Strong engineers find the problems, name them in their own words, and propose something better. Weak ones praise the structure and miss the bug. An engineer who can tear apart bad code understands how good code works; one who can't won't catch it when the AI gives a wrong answer either.

Architectural reasoning under constraint is the second signal. Skip the generic system-design puzzle and use a real scenario from your codebase — the 10x traffic problem you had last quarter, the migration you're about to attempt. Strong candidates ask clarifying questions, name the trade-offs, and reason out loud. Weak ones recite patterns. Will Larson's work at staffeng.com on what senior engineers actually do makes this concrete: the job is judgment under ambiguity, and ambiguity is what scripted interviews engineer out.

Pair programming on a real problem beats coding puzzles every time. Give the candidate an actual bug or a small feature, let them use whatever tools they normally use — including AI — and watch them work. How do they navigate code they've never seen? When the AI hands them a plausible-but-wrong suggestion, do they catch it? An hour of that tells you more than any take-home because it mirrors the job. You can't AI your way through a race condition if you don't understand what's happening underneath.

The last move is the easiest and most underused: have your most experienced engineers run a deep dive on the candidate's actual work history. "Tell me about a system you designed that failed. What would you do differently?" Real experience has a texture that fabricated stories don't. Camille Fournier's writing on this kind of reference-as-interview names it well — the specifics people choose to remember reveal what they actually learned.

The Broader Shift

This hiring problem is the visible edge of something larger. AI didn't just change how code gets written — it moved what engineering competence means.

Five years ago, a strong engineer was someone who could write clean, efficient code under pressure. That's still valuable, but it's no longer the scarce skill. What's scarce now is judgment: architecting systems that solve real business problems, making trade-offs under uncertainty, debugging distributed systems whose state you can't fully hold in your head, evaluating what an AI hands you. Google Cloud's DORA research has spent years showing the teams that ship reliably are the ones with strong judgment at the architectural and operational layers, not the ones that type fastest. Loops that select for typing are selecting for the wrong thing.

If your pipeline feels dry, your filter may be the problem. The candidates failing your algorithm challenges might be the ones who can operate on your fastest-moving team. The ones passing might collapse the first time a production incident doesn't have a Stack Overflow answer. For startups, this is existential — you can't absorb a hire who needs AI to write every function, and you can't screen out the AI-native engineers who'd outship everyone else.

If your interviews aren't producing strong hires anymore, you're not the only one. The playbook most teams are still running was written for a constraint that doesn't bind anymore. That's the work we do — auditing hiring loops before the cost of a bad hire does the audit for you.

Ready to Transform Your Organization?

Let's discuss how The Bushido Collective can help you build efficient, scalable technology.

Start a Conversation