Turnitin Terrors!

As an editor for academics, postgraduates, and nonfiction writers, I was given the opportunity in the last couple of months to see just how unreliable and terrifying Turnitin is. For those who don’t know, Turnitin is a standard plagiarism detector that many academic institutions use to measure the extent to which a writer has presented a published author’s work as their own.

When I majored in Anthropology in the early 80s, it was more than four words in a row; now, it is five. Some institutions choose a 10% similarity score, others 12%. These are arbitrary measures.

Witness This

The candidate did her first submission to Turnitin. The similarity score was 9%, acceptable for that institution, but there were swathes of text in a meticulously constructed 200-page literature review of already-developed models and frameworks that were flagged. As her editor, I supported her decision to rework some of those passages, and she did. To our surprise, the percentage went up 4%―beyond what was acceptable to the institution.

Intrigued, I compared the two reports. In the second report, pages and pages that were not flagged before were flagged. Most intriguing was the flagging of the opening sentence in the first report, but the same sentence in the second report was not. Whereas in the first report, literally nothing in the methodology chapter was flagged, in the second, passages were flagged, especially the section on sampling. There are only so many ways one can explain the difference between probability and non-probability sampling before one runs out of options. More intriguing still was that the research objectives and questions were flagged, as well as some verbatim interviews with participants and arbitrary phrases like “in Table 5.3.”

I wondered, “Has someone published her work in the month we have been working on it?” With the third report, at the eleventh hour, we managed to get the similarity score down to 10%—the very edge of submissability for that institution.

The arbitrariness of the measure (five words) and the percentage permitted is just part of the problem.

How many ways can a standard research claim be made?

First, it is easy to present five words in a row that are written in the same order as a published author because of how English is structured. Consider, for example, research report statements like, “A qualitative phenomenological approach was used in this study.” I suppose one can replace “used” with “applied” or “employed,” or start with an introductory phrase, “in this study,” ” but I know those have been used in numerous research reports because I have edited literally hundreds of research reports over the last 20 years, and there are not hundreds of ways to disclose the approach employed.

Second, due to the “publish or perish” mentality in academic contexts, so much has been published about so many topics in so many fields that any possible way to phrase the same ideas and link them has been used up. There is no new way of conveying your ideas and findings without being flagged by Turnitin, especially in a literature review. If a writer is looking to do an overview of a much-written-about field, like stress or leadership, that writer is at a disadvantage because many more published authors have tried to find different ways of saying the same thing. Moreover, in academic research, one is working with constructs and concepts that have been woven into models, frameworks, and theories. How many ways can a list of five components of a model be listed without presenting the list in the same order as an already-published author?

What about Voice and Cadence?

So what makes a formal research report original, rather than stolen? What makes the difference? I would claim it is in the cadence of the voice and the consistency of the cadence, bearing in mind that any writer is influenced by the voice of those they read.

Cadence is a level of language that Turnitin is not tuned into. And the irony? Turnitin is itself an AI program, and generative AI is the biggest plagiariser of all—it steals the content and voice of whatever has ever been published on the Web by anyone you could imagine. How cruel of academia to terrify students with such arbitrary measures of their contribution to knowledge and truth, and even more so, leave it to a machine that has no understanding of voice and how a voice is developed to make that judgment.  

Leave a comment