What "Jagged Intelligence" Could Mean for STEM Careers

It's arguable that artificial intelligence (AI), since its inception, has had a huge impact in the STEM career space over the years, solving advanced equations, generating usable software code, and analyzing datasets at a speed few human teams can match. But its progress has also exposed that strength in one task doesn’t guarantee competence in another.

While entry-level programmers may have reason to worry as AI becomes more capable at coding tasks, its effect on many other forms of work remains less certain, at least for now. Where the technology improves fastest, it could offer an early signal of which jobs are likely to feel pressure first.

As The New York Times reported, understanding AI's strengths and weaknesses is helping economists better assess what the technology could mean for the future of employment. In interviews with the publication, several experts argued that the bigger story may not be wholesale job loss, but how specific tasks inside existing roles are gradually automated.

"The performance of these systems varies, and it is not easy to tell when they will fail to do things a human can do," said Anuradha Weeraman, a software engineer in Sri Lanka, who noticed that leading AI systems struggled with what should have been a simple common-sense question.

This contradiction sits at the heart of what many researchers now call jagged intelligence. AI can outperform skilled humans in narrow technical tasks, then stumble on problems that require judgment, context, or practical reasoning. It may write a clean block of code in seconds, yet miss how that code fits into a larger system. It may summarize research papers quickly, yet fail to spot an obvious flaw in the logic. Andrej Karpathy, a founding researcher at OpenAI who coined the term, puts it plainly: "Some things work extremely well (by human standards) while some things fail catastrophically, and it's not always obvious which is which."

For STEM careers, work that is repetitive, rules-based, highly documented, and easy to measure is generally easier to automate, whereas work that demands contextual reasoning or professional judgment sits further from that edge.

"If a job involves a bunch of different tasks, some tasks will be automated and some will not," said Alex Imas, an economist at the University of Chicago Booth School of Business. "And if that is the case, the worker may have more time to do bigger things."

Understanding where that boundary sits has become something of a research project in itself. François Chollet, the AI researcher behind the ARC benchmark, has argued that current systems show many useful skills without demonstrating broad general intelligence, saying AI often succeeds where rules and feedback are clear, but struggles when tasks require flexible reasoning in unfamiliar settings — a finding that maps closely onto the jagged intelligence problem Karpathy described.

Outside the technology industry, there is only anecdotal evidence so far that AI has become a meaningful job killer, according to The New York Times. But given how quickly the technology is improving, many experts argue that whether AI replaces other kinds of white-collar workers is not a question of if, but when.

people doing office works — Photo by Alex Kotliarskyi / Unsplash

Only a few years ago, these systems were beginning to show the most rudimentary programming skills. Now they are passing professional-level benchmarks with a consistency that has surprised even the researchers building them.

"These systems have been showing incredible improvements," said Imas. "Every time there is a major new release, people are surprised by how much it can do."

But improvement and general intelligence are not the same thing. "A.I. does not have general intelligence," Chollet said. "What it has is a lot of different skills."

To close those gaps, companies like Anthropic and OpenAI are increasingly relying on reinforcement learning, a training approach in which systems solve thousands of problems repeatedly, learning which strategies produce correct outcomes.

In domains like mathematics and programming, where feedback is immediate and unambiguous, this method has proven especially effective. “With coding, it is much easier to use a feedback loop to figure out what is working and what isn’t,” said Joshua Gans, an economist at the University of Toronto’s Rotman School of Management.

Even so, reinforcement learning can only close the gaps it can measure. In research methodology, the physical sciences, or any discipline where quality is context-dependent, and results take time to surface, there is no clean signal to learn from. The jaggedness in those areas does not shrink but persists, because the tool being used to fix it was never designed for terrain that ambiguous.

Imas cautions against treating today's limitations as permanent ceilings, concluding that "The valleys in the technology are closing."