News & Analysis

Chatbots vs Humans: The Winner Is…

A definitive winner in the battle between artificial intelligence and human intelligence

Artificial intelligence (AI) has been on the radar for close to two decades now, but the euphoria (or the fear depending on which side one is on) took center stage only after ChatGPT appeared last November. However, a new study now claims that while AI is passing every capability test, human creativity is still no match for it. 

The study published in Scientific Reports indicates that while AI has almost leveled off with human abilities to generate ideas, its abilities are nowhere close to the best minds. However, AI chatbots achieved higher average scores than humans in the Alternate Uses Task, a random test used to assess this ability. 

When Chatbots aced the alternate uses task 

Typically, such tests include questions like “Describe as many uses that you can think of for a box or a pencil or candle. In the current study, researchers began by asking OpenAI’s ChatGPT, GPT4 and Copy.Ai (built on GPT3) such questions within a matter of just thirty seconds. And the results were quite surprising. 

The prompts to these chatbots featuring large language models came up with original and creative users for these items. When the researchers added the value of quality over quantity in their prompts, the outcomes were better. Each chatbot was tested eleven times with four specific objects being mentioned. The research also got 256 humans to participate. 

It’s a computer test devised by humans 

Of course, the researchers did provide a caveat around what it actually means for a computer to pass tests devised by humans. “On the basis of the study, the clearest weakness in humans’ performance lies in the relatively high proportion of poor-quality ideas, which were absent in chatbots’ responses,” says the report in its summary. 

“This weakness may be due to normal variations in human performance, including failures in associative and executive processes, as well as motivational factors. It should be noted that creativity is a multifaceted phenomenon, and we have focused here only on performance in the most used task measuring divergent thinking,” the report added. 

Does it mean AI is developing human creativity?

While reviewing the research findings, a report published in the MIT Technology Review notes that these findings do not do not necessarily indicate that AIs are developing an ability to do something uniquely human. It could just be that AIs can pass creativity tests, not that they’re actually creative in the way we understand. 

Of course, such research efforts could provide the world with a better understanding of how humans and machines approach creative tasks. Researchers actually used two methods to assess the responses. The first one used an algorithm to rate how closely the suggestions were to the object’s original purpose. 

The second was a bit more complex and used the process of asking six human assessors who weren’t informed about the chatbot answers to evaluate each response on a scale of 1 to 5 on both creativity and originality. Based on these, the average scores of humans and AIs were calculated. And this is where the best-scoring human responses were higher than AI. 

It’s more about testing memory than knowledge

The research team also clarified that the study wasn’t an effort to prove the efficacy of AI systems in replacing humans. Simone Grassini, associate professor of psychology at the University of Bergen (Norway), who co-led the research says, “We’ve shown that in the past few years, technology has taken a very big leap forward when we talk about imitating human behavior. “These models are continuously evolving,” he says. 

However, Ryan Burnell, research associate at the Alan Turing Institute, was quoted by MIT as saying that any effort to prove that machines can perform well in tasks designed for measuring creativity in humans does not demonstrate that they’re capable of anything approaching original thought. Grassini says the same thing though a bit more philosophically. 

Burnell goes on to suggest that since the research team doesn’t know what data the chatbots were trained on, it was quite clear that the test was not to measure creativity but on the model’s past knowledge of such tasks. Of course, none of these arguments are doubting the ability of AI and GenAI to reduce human effort in mundane tasks. 

Anna Ivanoa, an MIT postdoctoral researcher studying large language models says the research is useful as it compares how machines and humans approach certain problems. She notes that though chatbots are good at completing specific requests, a slight tweak in a prompt could be enough to stop them from performing as well. 

She holds the view that the next milestone after such studies should be to examine the link between the task that AI models are being asked to complete and the cognitive capacity that we are trying to measure. One cannot assume that people and models solve problems in the same way, the academic concludes. So, who wins? For now, it looks like nobody does. 

Leave a Response