AI coding tools are fast. It can be hard to notice how much it’s changing if it doesn’t work in your code, but GPT-5 and Gemini 2.5 have now made it possible to automate a whole new set of developer tricks.
At the same time, other skills are moving forward more slowly. If you use AI to write emails, you probably get the same value as you did a year ago. Even if the model gets better, the product doesn’t always bring about benefits. Especially when the product is a chatbot that has different jobs at the same time. AI is still progressing, but it is not as evenly distributed as before.
The difference in progress is easier than it looks. Coding apps benefit from billions of easily measurable tests. This allows you to train to write executable code. This is reinforcement learning (RL), perhaps the biggest driving force of AI in the last six months, and has always been complicated. Human achievement and reinforcement learning can be done, but it works best when there are clear passing metrics, allowing it to be repeated billions of times without stopping human input.
As the industry is increasingly relying on reinforcement learning to improve its products, we see a real difference between features that can be automatically rated and those that cannot. RL-friendly skills like bug fixes and competitive mathematics are faster, but skills like writing only produce progressive progress.
In short, there is a gap in strengthening and is becoming one of the most important factors of what AI systems can and cannot do.
In some respects, software development is a great theme for reinforcement learning. Even before AI there was an entire subdiship line dedicated to testing how software holds up under pressure. This is because developers need to make sure their code doesn’t break before they can be deployed. So even the most elegant code must pass unit tests, integration tests, security tests and more. Human developers use these tests regularly to validate their code and, as Google’s senior director of Dev Tools recently told me, it also helps them to validate the code generated by AI. More than that, they are already systematized and repeatable on a large scale, which can be useful for reinforcement learning.
There is no easy way to validate well-written emails or good chatbot responses. These skills are subjective in nature and difficult to measure on a large scale. However, not all tasks fit properly into the “easy to test” or “hard to test” category. There are no ready-to-use test kits for quarterly financial reports or actuarial science, but a capitally appropriate accounting startup could be built from scratch. Of course, some test kits work better than others, and some companies are smarter about how to approach the problem. However, the testability of the underlying process will be a deciding factor in whether the underlying process can be made into a functional product rather than just an exciting demonstration.
TechCrunch Events
San Francisco
|
October 27th-29th, 2025
Some processes have turned out to be more testable than you think. If I asked me last week, I would have put AI-generated videos in the “hard to test” category, but the immeasurable advances made by Openai’s new Sora 2 model show that it may not be as difficult as it looks. In SORA 2, objects disappear from nowhere. Faces look more like a particular person than just a collection of features. The SORA 2 footage respects the laws of physics in clear and subtle ways. If you look behind the curtains, you will find a robust reinforcement learning system for each of these qualities. Collectively, they make the difference between photorealism and fun hallucinations.
To be clear, this is not a difficult rule for artificial intelligence. This is the result of central role reinforcement learning in AI development, which can easily change as the model develops. However, as long as RL is the main tool for bringing AI products to the market, the reinforcement gap will be large and will have a serious impact on both startups and the economy as a whole. If the process is on the right side of the enhancement gap, the startup will likely be successful in automating. And anyone doing the job may be looking for a new career. For example, the question of which healthcare services are RL-trainable will have a major impact on the shape of the economy over the next 20 years. And if a surprise like Sora 2 is any indication, you may not need to wait long for the answer.