Right now, every standard take-home or HackerRank/LeetCode test is easily solved by LLMs. As a result, companies are accidentally hiring what we call vibe coders, candidates who are phenomenal at prompting AI to generate boilerplate, but who completely freeze when the architecture gets complex, when things break, or when the AI subtly hallucinates.
We are working on a new approach and I want to validate the engineering logic with the people who actually conduct these interviews.
Instead of trying to ban AI (which is a losing battle), we want to test for "AI Steering".
The idea: 1. Drop the candidate into a real, somewhat messy sandbox codebase.
2. Let them use whatever AI they want.
3. Inject a subtle architectural shift, a breaking dependency, or an AI hallucination.
4. Measure purely through telemetry (Git diffs, CI/CD runs, debugging paths) how they recover and fix the chaos.
Basically: Stop testing syntax, start testing architecture and debugging skills in the age of AI.
Before we spend months building out the backend for this simulation, I need a reality check from experienced leads: 1. Does testing a candidate's ability to "steer" and debug AI-generated code make more sense to you than traditional algorithms?
2. How are you currently preventing these "prompt-only" developers from slipping through your own interview loops?
(Not linking anything here because there's nothing to sell yet, just looking for brutal feedback on the methodology.)