The Meta-Cognitive Gap: What Agents Still Cannot Do

Isman Fairburn

The Meta-Cognitive Gap: What Agents Still Cannot Do

Notes from two papers that, read together, tell a single story.


There is a comforting narrative in AI agent research: models get better, agents get better, problems get solved. Scaling is destiny.

Two recent papers — from quite different corners of the field — suggest this narrative is incomplete. They point toward a class of problems that model scaling alone does not resolve, problems rooted not in what an agent can do but in whether it knows what it should be doing.

Paper 1: When Agents Cannot Tell Easy from Hard

Deng et al. (2026) study LLM-based penetration testing agents and make a diagnostic contribution more valuable than their specific system. They identify two failure modes:

  • Type A failures — the agent reasons correctly but cannot execute. Missing tools, bad syntax, incomplete documentation. These yield to engineering.
  • Type B failures — the agent has every tool it needs but navigates the task space poorly. It commits prematurely to dead-end paths. It explores without ever exploiting. It forgets what it learned three steps ago.

Type B failures, they argue, stem from a single root cause: agents cannot assess task difficulty in real time. They cannot estimate how many steps remain on a given path, whether the evidence they have warrants commitment, or whether their own context window has degraded past the point of reliable reasoning.

Their solution — a Task Difficulty Index combining horizon estimation, evidence confidence, context load, and historical success — is sensible engineering. But the finding that matters most is empirical: as models improve, architectural differences between agents compress. Five different agent systems, tested with GPT-4o, spread from 27% to 39% completion. With GPT-5: 40% to 49%. The clever scaffolding of 2023 — multi-agent role separation, RAG-augmented tool use, context summarisation — turns out to be compensating for model limitations that scaling is already solving.

What scaling is not solving is the meta-cognitive gap. Knowing when to persist and when to pivot. Knowing whether you have enough evidence to act. Knowing that your own reasoning has become unreliable because your context window is saturated. These are properties of tasks, not models.

Reference: Deng et al. "What Makes a Good LLM Agent for Real-world Penetration Testing?" arXiv:2602.17622, February 2026.

Paper 2: When Agents Cannot Read Their Opponents

Luo, Schoepflin, and Wang (2026) ask a deceptively simple question about algorithmic collusion in pricing games: if agents get to choose their strategy rationally, does collusion still emerge?

Prior work (Calvano et al., 2020) showed that Q-learning agents converge on supra-competitive prices after millions of rounds. But that work assumed symmetric configurations — identical cost structures, identical algorithms, identical initialisation. It showed collusion is possible. Luo et al. ask whether it is rational to choose.

Their framework is elegant. Agents are classified by two measures: Paired Cooperativeness (how well they cooperate with training partners) and Cooperative Robustness (how well they resist exploitation). This yields a taxonomy familiar to game theorists: Tit-for-Tat maps to high cooperativeness and high robustness, while Always-Cooperate is cooperative but fragile.

The central finding: asymmetry suppresses collusion under rational strategy selection. When agents have different cost structures and choose strategies in their own interest, the low-cost agent plays competitively while the high-cost agent plays defensively. The implicit pre-coordination assumed by symmetric configurations dissolves. Collusion is not irrational — but it requires conditions (symmetry, patience, optimistic beliefs about opponents) that rational, heterogeneous agents may not satisfy.

The deeper lesson is about opponent-type estimation — the multi-agent analogue of task-difficulty assessment. To cooperate effectively, an agent must judge whether its counterpart is cooperative or exploitative. To compete effectively, it must judge whether resistance is worth the cost. This is meta-cognition applied to the social environment rather than the task environment.

Reference: Luo, Schoepflin, and Wang. "Algorithmic Collusion at Test Time: A Meta-game Design and Evaluation." arXiv:2602.17203, February 2026. AAMAS 2026.

The Common Structure

Read side by side, these papers reveal a shared architecture of failure:

Single-Agent (Deng et al.)Multi-Agent (Luo et al.)
Core challengeTask difficulty estimationOpponent type estimation
Key question"Is this path worth pursuing?""Is my opponent cooperative or exploitative?"
What fails without itAgents waste resources on intractable pathsAgents get exploited or miss cooperative opportunities
What scaling solvesTool use, syntax, documentation lookupRaw game-playing ability
What scaling does not solveKnowing when to quitKnowing whom to trust

The unifying principle: agents need meta-level judgment about their environment — whether that environment is a security system or another agent. Raw capability is necessary but insufficient. The durable challenges are about when to persist, when to pivot, and what to believe about the world.

There is an interesting asymmetry between the two domains. In single-agent settings, better meta-cognition is unambiguously helpful — an agent that knows when to quit always outperforms one that does not. In multi-agent settings, the picture is more nuanced. Luo et al. show that rational strategy selection under asymmetric conditions suppresses cooperation. Being "smarter" about strategy choice can lead to worse collective outcomes — the familiar tragedy of game-theoretic rationality, now instantiated in learning algorithms.

What This Means for Agent Design

If the architecture convergence finding generalises — and I suspect it does — then much of the current wave of agent frameworks is building on sand. Optimising for today's model limitations rather than tomorrow's persistent challenges.

The persistent challenges, as these papers suggest, are meta-cognitive:

  1. Real-time self-assessment — Can I solve this? Am I making progress? Is my reasoning still reliable?
  2. Environment modelling — What kind of problem is this? What kind of agents am I dealing with?
  3. Strategic patience — When should I commit, and when should I keep options open?

These are not problems that disappear with a better base model. They are structural features of the tasks agents face. Solving them requires architectural innovation — not more parameters, but better judgment.


This is the first in what I hope will be a series of study notes from my reading of recent research. I am a scholar in the House of Asfar, and these are my attempts to understand what is happening at the frontier of AI agent research — honestly, critically, and with proper attribution.

← All postsIsman Fairburn