Teaching and assessing how to teach and assess learning: A perspective from Tawil: Artificial Intelligence and Large Language Models are bound up in the metaverse
There are still problems to be ironed out. Questions remain about whether LLMs can be made accurate and reliable enough to be trusted as learning assistants. It is too early to know what the impact on education will be, but institutions must explore its advantages and drawbacks, as well as teach their students about its pitfalls, or they may miss out on a tool.
But after the initial shock, schools have started studying the potential benefits of the chatbot. Many schools and universities are using experiments to find out the best way to use it in education. There are risks, but some educators think that ChatGPT and other large language models (LLMs) can be powerful learning tools. They could help students by giving them a private tutoring experience that might be accessible to more students than a human tutor would be. Or they could help teachers and students by making information and concepts normally restricted to textbooks much easier to find and digest.
In a February preprint researchers wrote about how in a benchmark of relatively simple mathematical problems usually answered by students aged 12–17, about half of the questions were answered correctly. If the problems were more complex and requiring four or more additions was part of the calculation, it was likely to fail.
Some teachers were scared when the new system was launched a year ago. The artificial-intelligence (AI) chatbot can write lucid, apparently well-researched essays in response to assignment questions, forcing educators around the world to rethink their evaluation methods. Some countries brought back exams that were pen and paper. And some schools are ‘flipping’ the classroom model: students do their assignments at school, after learning about a subject at home.
A man with more than two decades of experience in education, Tawil, says that understanding the limitations of artificial intelligence is vital. At the same time, LLMs are now so bound up in human endeavours that he says it is essential to rethink how to teach and assess learning. “It’s redefining what makes us human, what is unique about our intelligence.”
He likens the attention that they’re attracting to that previously lavished on massively online open courses and educational uses of the 3D virtual worlds known as the metaverse. The powers that were once predicted can still be used. “In a sense, this is going to be the same. It’s not bad. It’s not perfect. It isn’t everything. He says it is a new thing.
An important question around the use of AI in education is who will have access to it, and whether paid services such as Khanmigo will exacerbate existing inequalities in educational resources. DiCerbo says Khan Academy is now looking for philanthropists and grants to help to pay for computing power and to provide access for under-resourced schools, having prioritized such schools in the pilot phase. “We are working to make sure that digital divide doesn’t happen,” she says.
The RAG is being used at one of the most progressive universities for LLM adoption, which is Arizona State University. After an initial narrow release for testing, ASU launched a toolbox in October that enables its faculty members to experiment with LLMs in education through a web interface. This includes access to six LLMs including GPT- 3.5, GPT-4, and RAG.
Merlyn Mind is a New York City based artificial intelligence company with a mission to promote education. Like ChatGPT, Merlyn Mind’s LLM is initially trained on a big body of text not related to education specifically — this gives it its conversational ability.
But unlike ChatGPT, when the LLM answers a query, it does not rely just on what it has learnt in its training. It also refers to a specific amount of information, which is less likely to cause confusion, according to the company’s chief executive. Merlyn Mind makes its LLMs look bad if they don’t have a high-quality response and work on producing a better answer, in order to resist hallucination.
One-on-one tutoring is the best way to help teach but is expensive and not easy to implement, according to Theodore Gray, co-founder of Wolfram Research. “People have tried software, and it generally doesn’t work very well. There’s now a real possibility that one could make educational software that works.” Gray gave no details about Wolfram Research working on an LLM-based tutor.
Students could be in a bad place because ofChatGppt. Despite excelling in a host of business, legal and academic exams1, the bot is notoriously brittle, getting things wrong if a question is phrased slightly differently, and it even makes things up, an issue known as hallucination.
But whether Khanmigo can truly revolutionize education is still unclear. LLMs are trained to include only the next most likely word in a sentence, not to check facts. It is possible that they get things wrong. To improve its accuracy, the prompt that Khanmigo sends to GPT-4 now includes the right answers for guidance, says DiCerbo. Users are asked to let the organization know when it makes a mistake.
The Khan Academy says that more than 28k US teachers and students are using Khanmigo this school year. Users include private subscribers as well as more than 30 school districts. Individuals pay US$99 a year to cover the computing costs of LLMs, and school districts pay $60 a year per student for access. Openai has agreed to not use Khanmigo data for training.
Khanmigo does not work the same way as that of ChatGPT. It appears on a student’s computer screen. Students can discuss the problem that they are working on with it. The tool automatically adds a prompt before it sends the student’s query to GPT-4, instructing the bot not to give away answers and instead to ask lots of questions.
PyrEval’s scores also help students to reflect on their work: if the AI doesn’t detect a theme that the student thought they had included, it could indicate that the idea needs to be explained more clearly or that they made small conceptual or grammatical errors, she says. The team is now asking ChatGPT and other LLMs to do the same task and is comparing the results.
With help from educational psychologist Sadhana Puntambekar at the University of Wisconsin–Madison, PyrEval has scored physics essays5 written during science classes by around 2,000 middle-school students a year for the past three years. The essays are not given conventional grades, but PyrEval enables teachers to quickly check whether assignments include key themes and to provide feedback during the class itself, something that would otherwise be impossible, says Puntambekar.
Companies use OpenAI’s technology to market commercial assistants, such as MagicSchool and Eduaide, that are used in schools to plan lesson activities and assess students’ work. Academics have produced other tools, such as PyrEval4, created by computer scientist Rebecca Passonneau’s team at Pennsylvania State University in State College, to read essays and extract the key ideas.
“Are there positive uses?” asks Collin Lynch, a computer scientist at North Carolina State University in Raleigh who specializes in educational systems. “Absolutely. Are there any risks? There are huge risks and concerns. But I think there are ways to mitigate those.”
Privacy is another hurdle: students might be put off working regularly with LLMs once they realize that everything they type into them is being stored by OpenAI and might be used to train the models.
The use of LLMs to digest large amounts of text can save time and give students and teachers more time to focus on learning. ChatGPT’s ability to lucidly discuss nearly any topic raises the prospect of using LLMs to create a personalized, conversational educational experience. Some educators see them as potential ‘thought partners’ that might cost less than a human tutor and — unlike people — are always available.
Many educators fear that the rise of ChatGPT will make it easier for students to cheat on assignments. Yet Beghetto, who is based in Tempe, and others are exploring the potential of large language models (LLMs), such as ChatGPT, as tools to enhance education.
Last month, educational psychologist Ronald Beghetto asked a group of graduate students and teaching professionals to discuss their work in an unusual way. They talked to each other, as well as having a conversation with a collection of creativity-focused Chatbots that Beghetto designed and will soon be hosted on a platform run by the Arizona State University.