NYU professor shares unexpected results after using AI-powered exams to test students: 'How learning is supposed to work'

Accessible artificial intelligence tools have made once-daunting assignments as simple as crafting the correct prompt, but, as Business Insider recently reported, a clever professor constructed an elaborate means to detect AI — with AI.

The advent of artificial intelligence has rocked several areas of modern life, with education and academia among the most affected.

At the same time, AI's creeping presence has proved disruptive to the job market, particularly in creative fields, and the data center boom was a separate issue. AI is resource-hungry, and as the technology has grown, data centers have been erected to power it.

Data centers strain public water resources and emit pollution, but by far, their biggest and most direct impact has been on energy costs. As they increased in number, electric bills skyrocketed, passing the costs on to ratepayers and taxing an overworked grid.

Moreover, at nearly all levels of schooling, educators have increasingly feared not only that they cannot reliably detect AI use in assignments, but also that the technology's prevalence was undermining the learning process, enabling students to feign understanding convincingly.

New York University professor Panos Ipeirotis of the Stern School of Business observed that his students' efforts were pristine, but their understanding of the curricula didn't always match.

TCD Picks » Upway Spotlight

💡Upway makes it easy to find discounts of up to 60% on premium e-bike brands

On Dec. 29, Ipeirotis published a blog post with a title neatly summarizing the issue he faced: "Fighting Fire with Fire: Scalable Personalized Oral Exams with an ElevenLabs Voice AI Agent."

Ipeirotis had noticed that his students' assignments were neat, comprehensive, and extraordinarily well-edited — a red flag to any seasoned educator. Like any experienced professor, he was suspicious and began putting students on the spot in class.

"Many students who had submitted thoughtful, well-structured work could not explain basic choices in their own submission after two follow-up questions. Some could not participate at all," Ipeirotis wrote, underscoring AI's potential to undermine actual learning.

He considered oral exams, which he noted were a "logistical nightmare," one he likened to a "month-long hostage situation … unless you cheat."

TCD Picks » Quince Spotlight

💡These best-sellers from Quince deliver affordable, sustainable luxury for all

How often do you use AI tools?

Every day 💯

Fairly regularly 🤖

Occasionally 🤷

Never ❌

Click your choice to see results and speak your mind.

Inspiration struck, and Ipeirotis realized that the once-unworkable challenges presented by oral exams were now surmountable — with the help of AI. He assembled a "council" of large language models, comprising Anthropic's Claude, OpenAI's ChatGPT, and Google's Gemini.

Each of the three assisted with testing and grading, and Ipeirotis conceded that the method assessed students more stringently while revealing the coursework's "own teaching gaps."

Ipeirotis noted that an LLM's ability to generate novel questions was particularly useful for testing students.

"That is ... actually how learning is supposed to work. Fight fire with fire," he concluded.

Get TCD's free newsletters for easy tips to save more, waste less, and make smarter choices — and earn up to $5,000 toward clean upgrades in TCD's exclusive Rewards Club.