The Best Tutor Is an Algorithm
What makes a tutor effective? The ability to engage, to recognize when a student is struggling, to guide them toward a solution without simply handing it over. In a word: empathy. Or more precisely, pedagogical empathy. A quality we’ve long considered uniquely human — and one that now risks being matched, or even surpassed, by an artificial tutor. This isn’t science fiction: it’s the concrete result of a European study that challenges some of our most deeply held assumptions about education.
In the working paper “Educators’ Perceptions of Large Language Models as Tutors: Comparing Human and AI Tutors in a Blind Text-Only Setting,” six authors — Sankalan Pal Chowdhury, Terry Jingchen Zhang and Mrinmaya Sachan (ETH Zurich), Dirk Hovy and Donya Rooein (Bocconi University), Tanja Käser (École Polytechnique Fédérale de Lausanne) — designed an experiment to compare the perceived effectiveness of human tutors and those based on large language models (LLMs) in a strictly blind setting. No names, no interfaces, just plain text.
“We wanted to understand how LLM tutors are perceived — not in terms of learning outcomes, but in terms of the latent traits that make educational interactions effective,” explains Dirk Hovy, Associate Professor of Natural Language Processing and Computational Social Science and Dean for Digital Transformation and Artificial Intelligence at Bocconi University. “It’s a kind of pedagogical Turing Test in reverse: we’re not asking who the human is, we’re asking who’s the better tutor.”
Teachers vs AI, judged by educators
The experiment involved 210 pairs of tutor-student dialogues, all based on elementary-level math word problems. On one side were conversations generated by human teachers interacting with a simulated student — an LLM designed to make typical mistakes. On the other, responses generated by MWPTutor, a GPT-4-based system with built-in guardrails and structure to ensure correctness and consistent student guidance.
The dialogues were evaluated by 35 annotators, all with teaching experience. They were asked to judge — without knowing whether a response was human- or AI-generated — which tutor was better according to four criteria: engagement, empathy, scaffolding (i.e. the ability to guide the student toward the solution without providing it directly) and conciseness.
The result was clear. MWPTutor outperformed the human tutor on every metric. The most striking gap was in empathy, where 80% of evaluators preferred the AI tutor. Scaffolding and conciseness followed, both with statistically significant advantages for the LLM. Only in engagement was the result more mixed, but still not in favor of humans.
The (apparent) power of artificial empathy
The idea that an algorithm could be seen as more empathetic than a human teacher is surprising, to say the least. “The most counterintuitive result is that the AI was perceived as more empathetic,” says Hovy. “Of course, it doesn’t actually feel empathy — it simulates it. But it does so well enough to convince those evaluating it.” Conciseness was another non-trivial finding. Although the LLM’s dialogues were often longer, they were perceived as more focused and goal-oriented. The AI conveyed a clearer sense of progress, while the human-generated dialogues tended to appear fragmented or meandering. Add to this the matter of form: human texts included typos, grammatical slips and stylistic inconsistencies, while the AI was polished, fluid and coherent. That difference — even unconsciously — may have influenced perceptions of professionalism and authority.
Tired teachers, tireless AI
But there’s a deeper explanation. Showing empathy, engaging a student and adapting to their level of understanding is not just a matter of pedagogy — it demands mental effort and emotional availability. “Empathy is hard work,” Hovy adds. “A human teacher can be tired, overwhelmed or frustrated. AI can’t. It can be endlessly patient, cheerful, and empathetic, 24/7, without ever losing its cool.” There’s also contextual bias to consider. Human tutors in the study knew they were interacting with a simulated student. That awareness may have dampened their motivation or attentiveness. The AI, of course, doesn’t care — or even know the difference.
Can a machine teach?
The answer is: to some extent, yes. The study shows that a well-designed LLM can successfully perform certain types of text-based tutoring, especially tasks that are repetitive and rule-based. This doesn’t mean AI is ready to replace human teachers — but it can become a powerful ally. “Our hope is that teachers will use tools like MWPTutor to offload part of the workload, not to be replaced,” Hovy concludes. “AI is good at simulating empathy, but it will never truly understand a student’s human and social context. And that, fortunately, is still our job.” For educators, the message is clear: artificial intelligence won’t make teaching obsolete. Rather, it can free up time and energy to focus on what machines can’t do. For those designing educational systems, the point is equally sharp: if we want AI to serve education, we must build it not only to deliver correct answers, but to speak — and respond — like a good teacher.
Inside the School System: Inclusion, Exclusion and Decisions
From classroom discrimination to teacher bias, from Artificial Intelligence to frontal teaching, four studies by Bocconi researchers and an interview with Yellowtech’s founder serve to open the black box of...