Your Students Can Do It With AI. The Question Is Whether They Can Do It Without. Category: AI & Learning Outcomes

The Problem Nobody Is Measuring
There is a version of the AI-in-education story that looks like success from every angle. Students complete assignments. Grades improve. Engagement metrics trend upward. Institutions publish case studies. And then, six months later, those same students sit in the next course, the one that assumes they retained what they learned, and something has quietly gone missing.
This is the transfer problem, and it is the most important thing the AI-in-education debate is not talking about. We have invested enormous energy in the question of whether AI helps students perform better in the short term. We have invested almost nothing in the question of whether that performance represents learning that actually moves — that carries forward into new contexts, new problems, new disciplines.
I study AI adoption in higher education at UNR, and I teach BUS 101. In both roles I keep running into the same gap: the research on AI feedback and learning outcomes almost universally measures retention of similar problems within weeks of instruction. That is a narrow definition of learning, and it is doing quiet damage to how institutions interpret their own AI pilots.
What Transfer Actually Means
Transfer, in the cognitive science literature, refers to the ability to apply knowledge or skills acquired in one context to a meaningfully different context. Near transfer involves applying learning to situations closely resembling the original. Far transfer involves applying learning to situations that look substantially different — different problems, different disciplines, different professional settings.
The distinction matters because most AI feedback systems are optimized for near-transfer performance. An intelligent tutoring system in introductory statistics helps students solve the specific problem types the system covers. It gives immediate feedback on those problems, and students get better at those problems. What it does not reliably do — and what the research largely does not measure — is whether students can take statistical reasoning into a business case, a policy brief, or a research design they have never seen before.
The Wharton AI tutoring study that I wrote about in an earlier post is the clearest public evidence of this gap. Students using AI performed 48% better on assisted practice and 17% worse on unassisted tests. That test was still closely related to the practice material. The transfer gap, measured across genuinely novel contexts, would almost certainly be wider — and we do not have the data to know by how much, because researchers rarely follow students far enough forward to find out.
Why Institutions Are Not Asking This Question
Part of the answer is structural. Measuring far transfer requires following students across time, across courses, and sometimes across institutions. That is expensive, logistically complicated, and difficult to publish in clean experimental designs. Measuring whether a student who used an AI writing tutor in freshman composition can construct a persuasive argument in a senior capstone is genuinely hard research. Measuring whether they scored higher on the next essay in the same course is easy research. The field has optimized for what is easy to measure.
Part of the answer is also incentive-driven. EdTech vendors selling AI feedback platforms have no commercial interest in funding studies that might reveal their products produce performance gains that do not transfer. Institutional administrators under pressure to demonstrate AI ROI have no political interest in commissioning research that complicates the success narrative. The people with the resources to fund transfer research are often the people who benefit most from its absence.
And part of the answer is conceptual. There is a quiet assumption embedded in much AI-in-education discourse that learning is essentially the accumulation of correct responses — that if a student has answered enough questions correctly with the help of AI feedback, something durable has been deposited in their understanding. Cognitive science has known for decades that this is not how human memory works. Correct performance and durable understanding are related but distinct, and the conditions that produce one do not automatically produce the other.
What the Evidence Does Say
The literature on retrieval practice offers the clearest evidence-based path forward. When students are required to actively retrieve information from memory — rather than receiving it through re-reading or AI-generated summaries — retention and transfer both improve substantially. The mechanism is not mysterious: the act of retrieval strengthens memory traces in ways that passive reception does not. AI systems that incorporate retrieval practice into their feedback loops, requiring students to attempt recall before receiving assistance, show more durable learning than systems that provide assistance on demand.
Spaced practice — distributing learning across time rather than concentrating it immediately before assessment — similarly improves both retention and transfer. AI systems that introduce deliberate spacing into their adaptive sequencing, revisiting earlier concepts after intervals rather than moving linearly through curriculum, show stronger long-term outcomes. The irony is that neither retrieval practice nor spaced repetition is technologically difficult to implement in AI systems. They are pedagogically demanding because they require institutions to build curricula around learning science rather than administrative convenience.
The deeper implication is this: AI feedback that is designed around performance optimization will produce performance. AI feedback that is designed around transfer will produce understanding. These are different design goals, and achieving the second requires institutions to be explicit about which one they are actually pursuing.
The Accountability Question
I want to end on something that I think the research has not yet reckoned with honestly. If students are leaving courses with AI-assisted performance gains that do not transfer to subsequent learning, the cost of that gap does not show up immediately. It shows up later, in the next course, in the internship, in the first year of a job, in the graduate program that expected preparation students technically completed but did not genuinely acquire.
By then, the AI feedback system that produced the gap has been credited with the performance improvement and evaluated against that credit. The transfer failure is invisible to the data that institutions track, because institutions do not track it. This is how a technology can simultaneously improve short-term outcomes and degrade long-term educational quality — and why the absence of transfer research is not a neutral gap. It is a gap that systematically favors certain conclusions over others.
What would it change about how your institution evaluates its AI investments if transfer — not just completion and grades — were a required outcome metric? And who in your organization has the standing to make that demand?
Priscillar McMillan is a doctoral research assistant in Information Technology in Education at the University of Nevada, Reno, and founder of Kowa Agency. She writes weekly on AI, learning systems, and the institutional decisions that will shape education for the next generation.