Chatbots and Human-like Language: A Review of a Case Study on the Plagiarism Detection System and Output Detector
A professor at a UK university will often run assignments through the plagiarism-detection program before handing them over to students. He put in an essay to see if his methods can detect the fakes. May warns that we are in an arms race with automation as Artificial Intelligence platforms get better at mimicking human speech. The GPT-2 Output Detector is one of the automated tools used to identify chatbot-generated content. In one preprint posted last month1, chatbot-written abstracts were able to fool both humans and software tools. An online plagiarism checker failed to detect 100% of the submitted papers, while the GPT-2 Output Detector and human readers missed about one-third.
“At the moment, it’s looking a lot like the end of essays as an assignment for education,” says Lilian Edwards, who studies law, innovation and society at Newcastle University, UK. Dan Gillmor, a journalism scholar at Arizona State University in Tempe, told newspaper The Guardian that he had fed ChatGPT a homework question that he often assigns his students — and the article it produced in response would have earned a student a good grade.
Its open-access user interface and ability to answer questions in Human-like language is one of the most revolutionary quality. The tool is not only able to discuss a wide range of topics using data scraped from the internet, but can perform a number of linguistic tricks, like writing in different styles and genres — from medieval quatrains to sitcom scripts.
“The quality of writing was appalling. The wording was awkward and lacked complexity. “I just logically can’t imagine a student using writing that was generated through ChatGPT for a paper or anything when the content is just plain bad.”
The survey participants were asked to share their thoughts on the potential of generative Artificial Intelligence and its use. Some predicted that the tools would have the biggest beneficial impacts on research by helping with tasks that can be boring, onerous or repetitive, such as crunching numbers or analysing large data sets; writing and debugging code; and conducting literature searches. “It’s a good tool to do the basics so you can concentrate on ‘higher thinking’ or customization of the AI-created content,” says Jessica Niewint-Gori, a researcher at INDIRE, the Italian ministry of education’s institute for educational research and innovation in Florence.
Edward Tian, a computer-science undergraduate at Princeton University in New Jersey, published a book in December of last year. This AI-detection tool analyses text in two ways. One is ‘perplexity’, a measure of how familiar the text seems to an LLM. Tian’s tool uses an earlier model, called GPT-2; if it finds most of the words and sentences predictable, then text is likely to have been AI-generated. The tool examines the variation in text, also known asburstiness, which is when text is more similar in tone, cadence and perplexity to that written by humans.
How many people will try to use ChatGPT? The New York City Department of Education’s Shutdown says it will not let anyone talk about science or technology
How many people use the chatbot will determine how necessary that will be. In its first week, more than one million people tried it. Although the current version is free, it might not be forever, and some students might find the idea of paying distasteful.
She thinks that education providers will adapt. She says that there is a lot of panic around new technology. “It’s the responsibility of academics to have a healthy amount of distrust — but I don’t feel like this is an insurmountable challenge.”
The New York City Department of Education has blocked access to ChatGPT on its networks and devices over fears the AI tool will harm students’ education.
A spokesperson for the department, Jenna Lyle, told Chalkbeat New York – the education-focused news site that first reported the story — that the ban was due to potential “negative impacts on student learning, and concerns regarding the safety and accuracy of content.”
“While the tool may be able to provide quick and easy answers to questions, it does not build critical-thinking and problem-solving skills, which are essential for academic and lifelong success,” said Lyle.
However, ChatGPT also suffers from failures common to all of the most recent AI language systems (known as large language models, or LLMs). Because it’s trained on data scraped from the internet, it often repeats and amplifies prejudices like sexism and racism in its answers. The system can make up information, from historical dates to scientific laws, and present it as fact.
But such adaptations will take time, and it’s likely that other education systems will ban AI-generated writing in the near future as well. Already some online platforms — like coding Q&A site Stack Overflow — have banned ChatGPT overs fear the tool will pollute the accuracy of their sites.
According to a post on the company’s website, May is considering adding oral components in his writings and expects programs such as Turnitin to incorporate plagiarism scans, something the company is working to do. Novak now assigns steps that document the writing process to outlines and drafts.
“Someone can have great English, but if you are not a native English speaker, there is this spark or style that you miss,” she says. The paper can shine, definitely, thanks to the ability of the chatbot.
As part of the Nature poll, respondents were asked to provide their thoughts on AI-based text-generation systems and how they can be used or misused. Some of the responses were selected.
‘Going Back to Handwritten Examinations’: Kai Cobbs reflects on the past, the present, and future of human-AI communication
“I’m concerned that students will seek the outcome of an A paper without seeing the value in the struggle that comes with creative work and reflection.”
There were students struggling with writing before the OpenAI release. Will this platform further erode their ability to communicate using written language? [Going] ‘back to handwritten exams’ raises so many questions regarding equity, ableism, and inclusion.”
I got my first paper yesterday. Quite obvious. Adapted the syllabus to note that oral defense of work that is suspected of not being original work of the author may be required.
In late December of his sophomore year, Rutgers University student Kai Cobbs came to a conclusion he never thought possible: Artificial intelligence might just be dumber than humans.
Using AI to improve scientific research manuscripts: a case study of Greene and Pividori, an expert on digital generative-text models
Various publishers (including Springer Nature, which publishes Nature) have scrambled to make clear their policies on using generative-text models. These models are often viewed as methods rather than co-authors because they don’t have the responsibility of the work. But questions remain. How do we know when a model is quoting something from a protected source because the vast amount of language that will be used in training it will include copyrighted material?
Hipchen is not the only one in this situation. Alison Daily, chair of the Academic Integrity Program at Villanova University, is also grappling with the idea of classifying an algorithm as a person, specifically if the algorithm involves text generation.
Daily believes that eventually professors and students are going to need to understand that digital tools that generate text, rather than just collect facts, are going to need to fall under the umbrella of things that can be plagiarized from.
In December, computational biologists Casey Greene and Milton Pividori embarked on an unusual experiment: they asked an assistant who was not a scientist to help them improve three of their research papers. Their aide suggested revisions in a single second. Each manuscript took about five minutes to review. In one biology manuscript, the helpers spotted a mistake in an equation. The trial didn’t always run smoothly, but the final manuscripts were easier to read — and the fees were modest, at less than US$0.50 per document.
I have used artificial intelligence for science writing before. My first real use of AI chatbots (beyond asking one to write lyrics to a song called ‘Eggy Eggy Woof Woof’ for my daughter) was when I got fed up with writing one part of a grant application. I was asked to explain the world-changing ‘impact’ that my science would have, if I was lucky enough to receive funding.
After the release of GPT3 in November of last year, the free version of the tool became famous because it was easy and quick to use. Other generative AIs can produce images, or sounds.
Some researchers think that LLMs are good at speeding up tasks such as writing papers or grants if there is a human oversight. Almira Osmanovic Frinstrm is a researcher at Sahlgrenska University Hospital and she said that scientists will not write long introductions for grant applications anymore. “They’re just going to ask systems to do that.”
But researchers emphasize that LLMs are fundamentally unreliable at answering questions, sometimes generating false responses. “We need to be wary when we use these systems to produce knowledge,” says Osmanovic Thunström.
The tools could cause confusion to naive users. Stack Overflow temporarily banned the use of chatGPT in December because they found themselves flooded with a lot of incorrect and seemingly persuasive answers. This could be a nightmare for search engines.
The researcher focused Elicit was able to get around the situation by helping queries for literature, and then briefly summing up each of the websites and documents that the engines find, so producing an output of apparently referenced content.
The companies building LLMs are well aware of the problems. The paper was titled Sparrow and was published in September last year by DeepMind which later told Time magazine that it would be released in private this year. Anthropic, DeepMind and OpenAI are not giving interviews for this article but they say that they solved some of the issues that have been posed in the past.
Concerns about the reliability of the tools and the chance of misuse were enough to temper the optimism. Many respondents were worried about the potential for errors or bias in the results provided by AI. “ChatGPT once created a completely fictitious literature list for me,” says Sanas Mir-Bashiri, a molecular biologist at the Ludwig Maximilian University of Munich in Germany. “None of the publications actually existed. I think it is very misleading.”
For years, ethicists have pointed out a safety concern: that without output controls, LLMs can easily be used to generate hate speeches and other harmful associations such as racists and sexists.
OpenAI tried to skirt many of these issues when deciding to openly release ChatGPT. It installed filters to try to get the tool to refuse to produce content for sensitive or toxic questions and restricted its knowledge base to 2021. To label screeds of toxic text requires human moderators. Journalists have reported that these workers are poorly paid and some have suffered trauma. Similar concerns over worker exploitation have also been raised about social-media firms that have employed people to train automated bots for flagging toxic content.
BLOOM was an alternative master’s degree that was released last year. The researchers tried to reduce harmful outputs by training it on a smaller selection of higher-quality, multilingual text sources. The team made its training data fully open. Researchers have urged big tech firms to responsibly follow this example — but it’s unclear whether they’ll comply.
Some LLMs trained on content found on the internet with less-than-clear permission, are confused about their legal status. Copyright and licensing laws currently cover direct copies of pixels, text and software, but not imitations in their style. There is awrinkle when those imitations are trained by ingesting the originals. The creators of some AI art programs, including Stable Diffusion and Midjourney, are currently being sued by artists and photography agencies; OpenAI and Microsoft (along with its subsidiary tech site GitHub) are also being sued for software piracy over the creation of their AI coding assistant Copilot. The outcry might force a change in laws, says Lilian Edwards, a specialist in Internet law at Newcastle University, UK.
Setting boundaries for these tools, then, could be crucial, some researchers say. Edwards suggests that existing laws on discrimination and bias (as well as planned regulation of dangerous uses of AI) will help to keep the use of LLMs honest, transparent and fair. “There’s loads of law out there,” she says, “and it’s just a matter of applying it or tweaking it very slightly.”
A separate idea is that AI content would come with its own watermark. Last November, Aaronson announced that he and OpenAI were working on a method of watermarking ChatGPT output. It has not yet been released, but a 24 January preprint6 from a team led by computer scientist Tom Goldstein at the University of Maryland in College Park, suggested one way of making a watermark. The idea is to use random-number generators at certain moments when the LLM is generating its output, in order to make a list of possible alternative words the LLM can choose from. The final text has a small trace of chosen words that can be identified statistically but aren’t obvious to a reader. Editing could defeat this trace, but Goldstein suggests that edits would have to change more than half the words.
Other products want to detect AI-written content. OpenAI had released a detector for GPT-2 and another detection tool in January. A tool developed by Turnitin, a developer of anti-plagiarism software, is important because it is already used by schools, universities and scholarly publishers around the world. The company says it is going to release a new software in the first half of this year.
An advantage of watermarking is that it rarely produces false positives, Aaronson points out. The text might be made with artificial intelligence if the watermark is present. Still, it won’t be infallible, he says. “There are certainly ways to defeat just about any watermarking scheme if you are determined enough.” Detection tools and watermarking can make it harder to use artificial intelligence dishonestly.
Generative AI tools for scientific research: An initial survey of the Nature readers’ use of chatbots and automated language models in the workshop on coping with false information
In the future, Eric Topol believes that cross-checking academic literature against images of body might be a way to aid in diagnoses of cancer, and understanding of the disease. But this would all need judicious oversight from specialists, he emphasizes.
Researchers are keen to experiment with using generative AI tools such as the advanced chatbot ChatGPT to help with their work, according to a survey of Nature readers. They are worried about the possibility of false information.
A considerable proportion of respondents — 57% — said they use ChatGPT or similar tools for “creative fun not related to research”. 27% of respondents said they had tried the most common use of science, which was brainstormed research ideas. Almost 24% of respondents said they use generative AI tools for writing computer code, and around 16% each said they use the tools to help write research manuscripts, produce presentations or conduct literature reviews. According to a survey, 10% of them use them to help write grant applications. (These numbers are based on a subset of around 500 responses; a technical error in the poll initially prevented people from selecting more than one option.)
The goal was to give a quick initial framework that could be updated into a more detailed final version.
Generative language models help people like me who don’t know English very well. It helps me write a lot more fluently and quicker than ever before. It is like having an editor by my side while I am writing a paper.
Talk with Grazia Lampugnani about the potential of ChatGPT to revolutionize the scientific research grant writing and review process: What could vaccine research have?
The key, many agreed, is to see AI as tool to help with work, rather than to replace work altogether. It can be used as a tool. It has to remain one of the tools. Its limitations and defects have always to be clearly kept in mind and governed,” says Maria Grazia Lampugnani, a retired biologist from Milan, Italy.
In my opinion, ChatGPT has the potential to revolutionize the process of writing scientific grants. Traditionally, writing a scientific grant has been a time-consuming and often frustrating process. Researchers spend countless hours crafting proposals, only to have them rejected by funding agencies. This can make it hard to progress in scientific research. The potential to change all of this has been shown by the results of a study. By using natural language processing and machine learning, ChatGPT can help researchers write more compelling and effective grant proposals. It can also help reviewers assess grant proposals more efficiently, allowing for a more efficient and fair grant review process. Of course, ChatGPT is not a magic solution to all of the challenges facing scientific research. It is possible to use it to improve the grant writing and review process.
So I asked ChatGPT: “What impact could vaccine research have?” and got 250 words of generic fluff. It suggested reducing the burden of disease, saving lives, improving global health and supporting economic development. None of what it said was in any way original or enormously surprising, but it was an excellent starting point, which I could then flesh out with specifics.
Does Artificial Intelligence Really Write a Grant Application? A Reflection of Nick Cave on the “Non-Analytical Improvement of Science and Research”
Our organization is committed to promoting and maintaining an equitable environment for all individuals and that’s why diversity is a core value.
This makes me wonder: if a section in the grant application that is written by an Artificial Intelligence really serves any purpose, does that section really exist? If a computer can churn out something deeply generic that still answers the question (more or less), why are we asking people to address the question at all? I think the answer is clear: these sections never really did serve a purpose and certainly don’t now. For science and grant writing to be improved, there needs to be some changes: first, the pointless sections should be eliminated; and second, the sections that remain need to be changed.
How should we use the time given? In the 70s, the automatic washing machine gave up time and was taken up with other household tasks. In 1974 the sociologist Joann Vanek argued that new devices had not changed the amount of time spent on housework. The key question is what impact time-saving devices have. Are we going to fill the time saved by AI with other low-value tasks, or will it free us to be more disruptive in our thinking and doing?
I have some unrealistically high hopes of what AI can deliver. I would like to have less of the low-engagement tasks take up my day, allowing me to do more of what I need to. I will be able to go home earlier because I will have more of the thought, writing and discussion done, rather than having to fit them all around the edges.
We are unlikely to arrive at these sunlit uplands without some disruption. Artificial Intelligence is going to change the labour market in a similar way that domestic appliances have changed the need for domestic staff. Some things will be done by an Artificial Intelligence that replaces people. The aim of the game is to not do a job that can be replaced by an AI program. I have convinced you that although Artificial Intelligence can write, it won’t replace me or others in my profession immediately. Nick Cave put it much more succinctly: I think things like this are what he sees. The good news is that artificial intelligence is not very good at telling jokes. I will leave you with its best effort.