We Think We Know How AI Affects Cognitive Load. The Data Says Otherwise.
Category: AI & Learning Outcomes | Reading time: 6 min
The Measurement Problem Hiding in Plain Sight
Pick up almost any study on AI feedback and learning outcomes and you will find a version of the same sentence: the intervention reduced cognitive load, as evidenced by improved performance scores. Read it twice. The cognitive load conclusion is drawn from the performance outcome — not from any direct measure of what was actually happening in the learner's mind during the learning process.
This is not a minor methodological footnote. It is a structural problem that means we do not actually know, with any empirical precision, how AI feedback systems interact with human cognitive capacity during learning. We know what students scored. We do not know what was happening in their working memory while they were scoring it. And those are different questions with different implications for how we design AI systems that work.
My literature review on AI feedback and learning retention identified this as one of the most significant gaps in the current evidence base. The field has a sophisticated theoretical framework — Sweller's Cognitive Load Theory — and almost no direct measurement of whether that framework is actually operating the way researchers assume in AI-enhanced environments.
What Cognitive Load Theory Actually Claims
Cognitive Load Theory, developed by John Sweller in the late 1980s and extensively built upon since, proposes that human working memory has a limited capacity and that learning is most effective when instructional design manages the demands placed on that capacity deliberately. Three types of load are distinguished: intrinsic load, arising from the inherent complexity of the material itself; extraneous load, arising from poor instructional design that forces cognitive work unrelated to the learning goal; and germane load, arising from the cognitive processing that actually builds schemas and produces learning.
The theoretical promise of AI feedback systems is compelling within this framework. Well-designed AI can reduce extraneous load by providing immediate, targeted corrections that eliminate the need for learners to search for feedback through laborious channels. It can manage intrinsic load by adapting problem difficulty to learner readiness, keeping students in the zone where challenge is productive rather than overwhelming. It can increase germane load by requiring students to engage actively with feedback rather than passively receiving grades.
This is an elegant argument. The problem is that it is largely theoretical. When researchers claim that AI feedback reduces cognitive load and therefore improves learning, they are typically working backward from learning outcomes to infer the mechanism — not measuring the mechanism directly and then observing its effect on outcomes. The difference matters enormously for design decisions.
What Direct Measurement Would Actually Tell Us
Direct cognitive load measurement during AI-assisted learning would use tools that capture what is happening in real time as students engage with feedback. Eye-tracking measures where and how long learners fixate on different elements of a learning interface, revealing attention patterns and cognitive effort. Response time analysis captures how long students take to process feedback and formulate subsequent responses, with longer response times often indicating higher cognitive demand. Physiological measures — galvanic skin response, heart rate variability, pupil dilation — provide continuous signals of cognitive and emotional arousal during learning.
These methods exist. They are used extensively in human-computer interaction research, in usability studies, and in cognitive psychology laboratories. They are rarely used in AI education research, partly because they require specialized equipment and expertise, partly because large-scale field studies are harder to instrument than controlled laboratory experiments, and partly because the field has been content to infer mechanisms from outcomes rather than observe them directly.
What direct measurement would reveal that outcome-based inference cannot is the temporal dynamics of cognitive load during AI feedback interactions. Does load spike when feedback arrives? Does it peak at different moments for novice versus experienced learners? Does multimodal feedback — combining text and visual elements — reduce load or increase it by splitting attention? Does the design of the feedback interface itself impose extraneous load through cluttered presentation or confusing navigation? We do not know the answers to these questions with empirical confidence, because we are not measuring the right things.
The Productive Failure Connection
One of the most interesting findings in recent learning science literature involves what researchers call productive failure — the phenomenon where students who struggle with problems before receiving instruction sometimes show better learning outcomes than students who receive instruction first. The mechanism appears to involve the cognitive effort of the struggle itself: attempting to solve a problem without adequate resources activates relevant prior knowledge, surfaces misconceptions, and primes the learner to extract meaning from subsequent instruction more efficiently.
AI feedback systems, almost by design, tend to minimize struggle. Immediate feedback interrupts the productive difficulty of working through confusion. Hints and scaffolding reduce the cognitive demand of problem-solving. Adaptive difficulty adjustment ensures students are rarely stuck for long. If cognitive load during struggle is actually germane load — the kind that builds understanding — then AI systems optimized to reduce all load may be inadvertently reducing the cognitive work that matters most.
This is not an argument against AI feedback. It is an argument that the relationship between cognitive load and learning in AI-enhanced environments is more complex than the dominant narrative suggests, and that we need direct measurement to understand it rather than theoretical inference from incomplete data.
What This Means for How We Build and Evaluate AI Systems
If institutions are making procurement decisions about AI feedback systems based on the assumption that these systems reduce cognitive load and therefore improve learning, they are building on a foundation of theoretical inference rather than empirical evidence. That does not mean the inference is wrong. It means we do not actually know whether it is right — and for educational technology making claims about how human minds learn, that uncertainty should matter.
The practical implication is that AI feedback systems should be developed and evaluated with cognitive processes, not just learning outcomes, as primary variables. This requires partnerships between learning scientists who understand measurement methodology, educational technologists who understand system design, and practitioners who understand classroom reality. It requires instruments that capture what is happening during learning, not just what happened afterward. And it requires a research culture willing to complicate the success narrative when the mechanism does not match the theory.
Until we measure what we claim to know, we are designing AI learning systems by inference and hoping the theory holds. Sometimes it will. The question is whether that is good enough for the students whose learning depends on getting it right.
What would it change about how AI feedback tools are selected and evaluated at your institution if cognitive load during learning — not just post-learning performance — were a required evidence standard?
Priscillar McMillan is a doctoral research assistant in Information Technology in Education at the University of Nevada, Reno, and founder of Kowa Agency. She writes weekly on AI, learning systems, and the institutional decisions that will shape education for the next generation.