The Wharton AI Study Every Educator Should Read — and What It Actually Means

By Priscillar Banda

The Wharton AI Study Every Educator Should Read — and What It Actually Means

Category: AI & Learning Outcomes   |   Reading time: 6 min
The Number That Should Be Keeping Provosts Up at Night

Imagine you ran a study where students using AI scored 48% better on practice problems — and then you took the AI away and watched their test scores drop by 17% compared to the control group. That is not a hypothetical. That is what researchers found when they studied the effect of an AI tutor on student learning at a business school — and the education world has spent far too little time sitting with what it actually means.

As Faculty  at the University of Nevada, Reno. I also spend my doctoral research hours studying exactly this question: what happens to learning when AI enters the room? That Wharton number — the 48% gain that became a 17% loss — is the number I keep returning to, because it tells you everything about where the conversation is going wrong.

What the Study Actually Found

The research, conducted at Wharton, introduced an AI tutor into a business problem-solving curriculum. Students with access to the AI performed dramatically better during the assisted practice phase. On the surface, this looks like a win. AI advocates point to that 48% and call it transformation. And they are not wrong that something significant happened.

But when those same students sat down for an unassisted test — no AI, just the knowledge they had actually retained — they performed worse than the students who had never used the AI tutor at all. The gap was not marginal. Seventeen percent worse is a meaningful, measurable erosion of learning.

A complementary study out of Budapest deepened the finding. Students who used AI chatbots to study for exams performed significantly worse on follow-up assessments than students who had studied without AI assistance. The mechanism appears to be what cognitive scientists call offloading — when the brain consistently delegates a task to an external tool, it stops building the neural infrastructure to perform that task independently.

The Misread That Is Shaping Institutional Policy

Here is where I want to push back on the dominant interpretation, because I think institutions are about to make a category error that will cost students real learning.

The common read of studies like this is that AI is a crutch, therefore dangerous, therefore restrict it. That is the wrong conclusion. The problem is not AI. The problem is that we inserted AI into a learning environment that was never designed to account for it — and then we measured performance using assessments designed for a world before it existed.

Think about it this way. When calculators became available, the education response was not to ban them. It was to redesign what mathematics education was actually for. We stopped testing whether students could perform long division by hand and started asking whether they could reason about numerical relationships, interpret results, and know when an answer was nonsense. The assessment changed because what we valued changed.

AI demands the same recalibration — at a scale and speed that no institution is currently moving fast enough to meet.

What This Means for How We Design Learning

The Wharton finding is not an argument against AI in education. It is an argument that AI-assisted practice and AI-free assessment, sitting side by side in an unredesigned curriculum, will produce exactly this result: students who can perform brilliantly with the tool and struggle without it.

Three implications follow directly from this, and they are aimed at administrators and curriculum designers more than individual teachers.

First, assessment design has to change before AI deployment, not after. If an institution is piloting AI tutoring tools while still measuring learning through closed-book, AI-free exams, it has created a structural contradiction. Students are being trained in one mode and tested in another. That is not a student failure — it is a design failure.

Second, the practice-to-retention pipeline needs deliberate interruption. The research on cognitive offloading suggests that learners need regular, structured moments of unassisted retrieval — not as punishment, but as the actual mechanism by which knowledge transfers from assisted performance into durable understanding. AI-enhanced curricula that skip this step are not accelerating learning; they are accelerating forgetting.

Third, the 48% number is not the story; the 17% is. Institutions under pressure to demonstrate AI ROI will be tempted to lead with the practice gains and bury the retention loss. Researchers, educators, and accreditation bodies need to insist on the full picture. Assisted performance and genuine learning are not the same thing, and conflating them is how an edtech investment becomes a credential inflation machine.

The Question Underneath the Question

I want to end with something that the Wharton study surfaces but does not resolve, because I think it is the real animating question for everyone working in education right now.

What is education actually for?

If the answer is performance on proximate tasks — getting students through the next exam, the next module, the next course — then AI tutors are a reasonable tool and the 48% gain is the headline. If the answer is durable human capacity — the ability to reason, to retrieve, to think under pressure without a scaffold — then the 17% loss is the most important number in the room.

Those two answers demand entirely different institutional architectures. The tragedy is that most institutions have not yet decided which answer they are building toward. They are deploying AI into a purpose vacuum, and hoping the outcomes will sort themselves out.

They will not.

What would it take for your institution to answer that question before the next AI pilot goes live — and who in your organization has both the authority and the evidence to lead that conversation?

 
Priscillar McMillan is a doctoral research assistant in Information Technology in Education at the University of Nevada, Reno, and founder of Kowa Agency. She writes weekly on AI, learning systems, and the institutional decisions that will shape education for the next generation.