The race to “program faster” with the help of AI assistants is casting an awkward shadow over the table: what happens to skills when the goal isn’t producing but learning. A new study by Anthropic — the company behind Claude — points to a trade-off that many companies suspected but few had measured through a controlled experiment: AI assistance can slightly speed up work…but at the cost of understanding less about what is being built.
The study is based on a randomized controlled trial with 52 software engineers (mostly junior profiles) tasked with learning a new Python library: Trio, focused on asynchronous programming. Half performed the exercises with an integrated AI assistant in their environment; the other half, without AI. After completing guided exercises, all participants took a comprehension test designed to measure skills deemed critical in an increasingly AI-assisted coding era: debugging, code reading, and conceptual understanding.
The headline from the study itself is clear: the AI group scored 17% lower in “mastery”. And this drop wasn’t justified by a speed gain that compensated for the toll.
Results: nearly the same time, much less understanding
According to Anthropic, those who used AI finished about 2 minutes earlier on average, but the difference was not statistically significant. However, the gap in the test was: the AI group averaged 50%, compared to 67% for the hand-coded group. In other words: more dependence on the tool didn’t equate to more learning, and the productivity boost didn’t offset the loss in comprehension.
What’s even more relevant is that the biggest gap emerged in debugging, precisely the skill that, in theory, should serve as a “safety belt” when part of the code is suggested or generated by AI. If the professional doesn’t quickly identify when the code is wrong and why, the cost manifests later: superficial reviews, production failures, security issues, and more expensive maintenance.
Quick summary of the experiment
| Indicator | AI group | No-AI group |
|---|---|---|
| Participants | 52 (divided into two groups) | 52 (divided into two groups) |
| Completion time | ~2 minutes faster (not significant) | — |
| Average test score | 50% | 67% |
| Largest gap | Debugging | Better debugging performance |
The key isn’t “using AI,” but how it’s used
The study doesn’t conclude that AI is inherently harmful. What it argues is more specific: the mode of interaction makes the difference between “learning with AI” and “delegating thinking to AI”.
To analyze this, the researchers reviewed recordings and classified usage patterns. A striking finding is the time spent “chatting” with the tool: some participants dedicated up to 11 minutes — nearly a third of the available time — drafting up to 15 queries. This detail helps explain why the productivity “boost” wasn’t so large: when the goal is to learn something new, AI can become an additional friction point rather than a shortcut.
Anthropic identified six interaction patterns, three associated with low performance and three with high scores. Those performing worse share one element: replacing their own reasoning with delegated generation or debugging.
Patterns associated with low performance (delegation):
- Complete delegation: asking for the final code and simply integrating it.
- Progressive dependence: starting with few questions and ending up ceding all the work.
- Delegated debugging: using AI to verify or fix code instead of understanding the error.
Patterns associated with better performance (AI as cognitive support):
- Generate then understand: ask for code but follow up with explanation, review, and self-validation.
- Generation + explanation: request solutions with detailed reasoning.
- Conceptual questioning: ask about concepts and write code within that mental framework, accepting “stalls” and errors as part of the learning process.
Reading between the lines is significant for any technical team: if the assistant is used as a substitute, less is learned; if used as a tutor, learning is better preserved, even if it’s not the fastest route.
Why this matters to companies: human oversight becomes more expensive
The industry is entering a phase where the challenge isn’t just “writing lines,” but overseeing systems. In that context, the degradation of skills like debugging isn’t anecdotal: it’s a governance issue. If trainees learn to “close tasks” without building understanding, the risk is doubled:
- Short-term: increased likelihood of undetected errors (especially in complex dependencies, concurrency, or security).
- Medium-term: weakening of the pipeline that turns juniors into capable seniors capable of leading architecture, incident response, and thorough reviews.
The Anthropic study doesn’t quantify employment impacts, but the debate ties into a broader trend seen in labor market research. A report from Stanford’s Digital Economy Lab notes that, after widespread adoption of generative AI tools, workers aged 22 to 25 in highly automatable roles show a relative drop of 13% in employment (based on U.S. administrative data). This doesn’t prove direct causality or fully explain the phenomenon, but it underscores the concern: if entry-level hiring declines and learning with AI is mismanaged, the system may run out of new talent.
Practical recommendations: AI in “learning mode” isn’t AI in “production mode”
For a technical team, the value of the study is in its operational translation. Several practices emerge as “minimum viable” for preventing tools from replacing training:
- Separate policies by context: when learning a new library or framework, prioritize conceptual questions and explanations; for routine, well-understood tasks, allow more direct generation.
- Golden rule for reviews: if pasted generated code, require a brief explanation of flow, assumptions, and potential failure points. Not as punishment, but as a quality check.
- Training debugging without assistance: reserve sessions or tasks where debugging is done without the assistant, just like training incident simulations.
- Encourage “useful errors”: the group without AI made more mistakes, but that cost seems linked to greater learning: fixing one’s own errors consolidates understanding.
Anthropic even frames it with an unusual message for copilots marketing: productivity isn’t a shortcut to mastery, especially when the work involves acquiring new skills.
Frequently Asked Questions
Does this mean AI assistants make programmers “worse”?
Not necessarily. The study suggests that the impact depends on how AI is used: delegating solutions tends to reduce understanding; using AI to ask questions and comprehend can help preserve learning.
Why is debugging the most affected area?
Because debugging requires building a mental model of the system and validating hypotheses. If AI takes on this role, the brain trains less in that capacity, even though it’s the most essential skill for supervising assisted code.
What simple policy can a company implement tomorrow?
Define a “learning mode” (AI as tutor: explanations, conceptual questions) and a “production mode” (AI as speed booster for known tasks), plus require explanations when auto-generating code during reviews.
Does this result apply to all programming with AI?
The study measures short-term learning with a specific library (Trio) and a limited sample. It’s a solid signal but not an absolute rule: effects may vary depending on experience, task type, and team discipline.
via: anthropic

