The art of cheating in exams has come a long way since the days of scribbling a few notes on your wrist.
In fact, a new study suggests AI chatbots are making cheating more efficient than ever.
Even experienced examiners now struggle to spot the difference between AI and real human students, researchers have found.
The experts from the University of Reading secretly added responses entirely generated by ChatGPT to a real undergraduate psychology exam.
And, despite using AI in the simplest and most obvious manner, unsuspecting markers failed to spot the AI responses in 94 per cent of cases.
One of these essay samples was written by a real human, while the other has been generated by the MailOnline in ChatGPT using the researchers’ methodology. Can you tell which is which? (Answers in box below)
Researchers have found that even experienced examiners fail to tell the difference between real human responses and AI in actual exams (stock image)
Even more worryingly, the AI actually outperformed human students on average – achieving high 2:1 and 1st-level grades.
The rapid advancement of text-generating AIs such as ChatGPT has created a serious risk that AI-powered cheating could undermine the examination process.
To see just how bad this problem could be, associate Professor Peter Scarfe and Professor Etienne Roesch attempted to ‘infiltrate’ a real examination with AI.
The researchers created 33 fake student profiles which they registered to sit at-home online exams in various undergraduate psychology modules.
Using ChatGPT-4 the researchers created completely artificial responses to both short 200-word questions and entire 1,500-word essays.
These answers were then submitted alongside responses from real students on the School of Psychology and Clinical Language Sciences exam system.
To show how difficult it can be to tell the difference MailOnline has generated our own example essays.
Due to data privacy the researchers weren’t able to share any of exam answers with us but we have used their exact AI prompt to answer and example undergraduate psychology question using ChatGPT.
One of the pictured samples is generated by AI while the other is a human example taken from a University of South Australia essay writing guide.
So, can you tell which is which? The answers are in a factbox below.
Researchers secretly added AI-generated responses from 30 fake students to a real undergraduate psychology exam to see if any would be detected (stock image)
None of the markers were aware that any experiment was taking place and there was nothing to indicate that the AI papers were any different.
Out of the 63 AI-generated papers submitted, only 6 per cent were flagged by examiners as potentially suspicious – but the remaining 94 per cent were completely unnoticed.
The AI achieved higher average grades than real students, in some modules outperforming their human classmates by a full grade boundary.
In 83 per cent of cases, the AI got grades that were better than a randomly selected set of students.
This means that only 16 per cent of students would have got a better grade if they had actually studied and sat the paper themselves rather than used AI.
Out of the 63 AI-generated papers submitted, 95 per cent went completely unnoticed by the human examiners
Using ChatGPT-4 the researchers simply asked the AI to respond to the essay question and submitted the unedited text. This is the simplest and most obvious way that any student could use AI (stock image)
In fact, as the researchers point out, there is a very real chance some of the real human students did cheat and pass using AI in this very online exam.
Since the pandemic, many universities have been moving away from traditional examinations towards an online take-home exam model.
Lead researcher Professor Scarfe says: ‘Many institutions have moved away from traditional exams to make assessment more inclusive.’
The advantage is that these exams generally test more than the ability to cram information and are more accessible to those with mental or physical health issues.
However, this movement has coincided with another development in the world of ‘generative’ AI which lets users create reams of text with only a simple prompt.
As students work from home outside the eye of an invigilator, the option to use AI to cheat is much more available.
And while AI detectors do exist, these have proven to be extremely unreliable in real-life situations.
For example, a detector created by Turnitin, a program for managing student’s work, was found to be less than 20 per cent accurate when used on actual students.
Even with a very simple use of ChatGPT, the AI papers (blue) outperformed their human counterparts (orange) on almost every paper. In one module, P1-M2, the AI did better by an entire grade boundary
The researchers say this could spell the end of traditional exams as we know them as universities are forced to adapt.
Dr Scarfe says: ‘We won’t necessarily go back fully to hand-written exams, but global education sector will need to evolve in the face of AI.’
In their paper, the researchers suggest that exams may even need to start allowing the use of AI in exams, in order to avoid becoming outdated.
Since AI is almost impossible to detect and using AI looks more likely to become a necessary skill, the researchers argue that exams should not fight this new technology – much like how calculators have become more acceptable in exams.
The researchers write: ‘A “new normal” integrating AI appears inevitable. An “authentic form of assessment” will be one in which AI is used.’
Professor McCrum adds: ‘Solutions include moving away from outmoded ideas of assessment and towards those that are more aligned with the skills that students will need in the workplace, including making use of AI.’