Researchers find a 19% increase in development time when using tools like Cursor Pro on real open-source projects
In an unexpected turn within the current landscape of artificial intelligence applied to programming, a recent study questions one of its most repeated promises: that AI coding tools automatically boost developers’ productivity. Conducted by the independent research group METR (Model Evaluation & Threat Research), the study concludes that in certain real-world environments, these tools can slow down work rather than speed it up.
A test under real conditions with experienced developers
The study analyzed 16 high-level developers who regularly contribute to some of the world’s most popular open-source projects, averaging about 5 years of contributions and over 1,500 commits per participant. Each developer completed actual tasks from their own repositories, both with and without the use of AI tools like Cursor Pro, Claude 3.5, and Claude 3.7 from Anthropic.
Before starting, the developers believed that AI would allow them to reduce implementation time by 24%. However, the data proved otherwise: on average, developers took 19% longer when using AI tools. This result contradicts not only their expectations but also those of AI and economic experts, who had anticipated productivity improvements of up to 39%.
How AI can slow down development
The research reveals that developers spent more time interacting with the tools than actually coding: writing prompts, waiting for responses, evaluating suggestions, and fixing generated code. In many cases, the code provided by AI required extensive review or was discarded altogether. Analysis showed that less than 44% of generated code was accepted without significant modifications, with around 9% of total time devoted solely to cleaning up erroneous results.
Additionally, participants commented that AI-generated suggestions often lacked the implicit knowledge necessary to work effectively on large, complex repositories. “AI doesn’t understand shortcuts, internal conventions, or the project’s historical compromises,” said one developer.
A phenomenon linked to context
The authors identified several factors explaining this outcome:
- Familiarity with the code: the more experienced a developer was with their repository, the less useful the AI became.
- Environment complexity: repositories with over a million lines of code and high quality standards posed challenges for AI models.
- Unrealistic expectations: even after completing their tasks, developers still believed AI had reduced their work time, despite data showing the opposite.
These findings, however, do not diminish the value of AI in other circumstances. The study acknowledges that in new projects, less defined tasks, or when used by less experienced developers, these tools’ benefits could be much more apparent.
What about the future?
The researchers emphasize that their experiment took place between February and June 2025 and that recent advancements in foundational models could change the landscape in a few months. They also suggest that improved prompting, more domain-specific training, or autonomous agents could reverse this trend.
Notably, progress has already been observed: models like Claude 3.7 have demonstrated the ability to solve core functionalities in some repositories included in the study. Still, they exhibit issues such as style violations, incomplete documentation, or insufficient testing.
Conclusion: fewer myths, more evidence
METR’s study clearly shows that enthusiasm for AI shouldn’t replace rigorous empirical evaluation. AI coding tools are not a universal magic solution. They perform better in certain contexts than in others, and their use requires maturity, judgment, and a deep understanding of the work environment.
Far from dismissing these technologies, the study invites reflection: to achieve true productivity gains, we will need not only more advanced models but also smarter and more realistic integration strategies. Artificial intelligence, on its own, does not replace experience. At least, not yet.