Many teachers are unhappy with the AI revolution, and it’s hard to blame them: ChatGPT has proven that you can provide hints to the AI for, say, a high school essay, and the AI can give results in seconds. Sure, that essay may be riddled with errors, but hey, the homework is done. So when AI checkers promote themselves as a new line of defense against AI cheating, anyone affected can start using them. The problem is, they’re not perfect, and those imperfections are hurting people.
How Artificial Intelligence Detectors Work
All of the AI programs that are popular today (e.g. ChatGPT) are based on Large Language Models (LLM). LLMs are trained on large amounts of text and draw from that knowledge to respond to you. In short, all the AI does is predict what word should follow the last sentence it says, based on what it learns from its training. It doesn’t know what “snow is cold and fire is hot” means, but it knows that “hot” often follows “fire is”. (Again, super simple explanation.)
AI inspectors work in a similar way: they are trained on text datasets-some written by humans, some by robots, and some by both-and AI detectors look for signs of LLM-generated text, such as repeated language, probability of word choice, and tone. Just as you want to train AI to recognize patterns, developers want AI inspectors to recognize whether text was written by a bot.
These developers claim that their checkers can do this. But I’m not so sure. In fact, I asked ChatGPT if it could identify whether a given text was written by an AI or a human. It responded:
As an AI language model, I can generate text that may be difficult to distinguish from human-written text. However, it is not always possible to determine definitively whether a text was generated by an AI or written by a human, especially if the text has been well-written and edited.
There are some indicators of whether a text was generated by AI, such as unusual or repetitive patterns, lack of coherence, or unusual word choices. However, these indicators are not always present and can be overcome by advanced AI techniques.
It is also worth noting that the quality of AI-generated text is constantly improving, so distinguishing between human and AI-generated text may become increasingly difficult in the future. Ultimately, the best way to determine whether a text is generated by AI is to have it carefully analyzed by a trained expert in the field.
This is not an enthusiastic endorsement from the godfathers of AI bots. But let’s take a look at how these detectors work:
How artificial intelligence detectors perform in the field
Turnitin, whose plagiarism checker I remember stoked fears, claims that its software can now detect AI-generated content with 98% confidence. If true, that’s certainly high, and BestColleges tested the software, which was able to distinguish between human-generated, AI-generated, and hybrid writing. But even that statistic means that one in 50 students will have a teacher wrongly accuse a student of cheating using AI tools.
The Washington Post’s experience with Turnitin, however, was worse than that. The students helped the Post come up with 16 writing samples consisting of human-generated, AI-generated, and hybrid text. In the end, Turnitin got more than half of the samples, at least partially, wrong. It tagged six correctly, but dropped the ball on three others entirely. If this were a real course, Turnitin would have produced nothing but a mess.
GPTZero was one of the first AI generators to go viral, and failed the accuracy test with little experimentation. I tested it by writing a paragraph about ice cream in a neutral tone. It told me “Your text is likely to be written entirely by AI.”
Ice cream is a popular frozen dessert around the world. It is mainly eaten in warm or hot weather, such as in the summer, but can be enjoyed all year round. Ice cream comes in a variety of flavors and is often served with toppings such as candy, nuts, fruit or syrup.
But my ambiguous paragraphs are just the beginning. Another detector called ZeroGPT was confused when Redditor decided to analyze the U.S. Constitution. According to ZeroGPT, the Constitution is 92.26% written by AI. Who knew that the Philadelphia Convention relied so heavily on AI in creating the law of the land. (I guess that explains some of the amendments, anyway.)
However, one way to fool a detector is to have the AI-generated text pass through the AI again. for example, QuillB ot can rewrite the text for you, and has been used by students to avoid checkers. If an AI detector is looking for the “evenness” of a given text to determine if it was written by an AI, then having another AI add more diversity to the text will cause problems for the system. quillBot often appears in comments on TikToks discussing AI detectors in schools. Kids will find a way.
Artificial intelligence detectors are hurting innocent students
So far, these examples are all theories. But these checkers aren’t prototypes. They’re here, and they’re being used against real students. I’m sure these detectors have uncovered many students who have cheated using tools like ChatGPT, but they also wrongly accuse innocent students of doing the same and causing damage::
There is a “good” solution to this tweet, as the instructors admitted their mistake and retracted their accusations. But in other cases, teachers see the AI checker as gospel and stop discussing every “AI-generated” result:
I don’t deny that we are facing a new world. Large language models mean that students can plug in a paper prompt and receive a full paper in return (with varying levels of quality). But perhaps the fact that students can so easily fool the system suggests that the system needs to be changed, rather than a B-and-A id solution that punishes innocent students as easily as guilty ones.
Turnitin refers to these moments as “false positives” when Turnitin flags human-generated text as AI-written text. Lovely. But the company emphasizes that it “did not make a finding of misconduct;” rather, they provided the data and left the final decision to educators. Many people seem to have forgotten this disclaimer: In their eyes, if the AI checker says you’re a cheater, you’re a cheater.