Can AI match human creativity?

With a rising proportion of music, art and film now generated by Artificial Intelligence (AI) tools, Henry Birt considers whether AI can ever truly match or surpass human creativity.

by Henry Birt

Research Analyst

On the 16th of January Sweden’s recording industry’s trade body removed a song from the national charts. It issued the following explanation: “if it is a song that is mainly AI-generated, it does not have the right to be on the top list.” The offending tune is credited to an artist called Jacub; however, it transpires it may originate from within a production team at Stellar Music, a Danish music publisher and marketer. The team at Stellar claim AI assisted in the process, but that the ideas and themes of the music come from a human team.

Similar examples of AI-generated music have gained popularity elsewhere. Commentators often highlight the generic style many of these tracks conform to and yet, ‘bands’ such as Velvet Sundown, a country folk group, amass millions of views on Spotify. Only once the details of their artificial production are released do many listeners sour.

Whether an acolyte or luddite, AI music forces us to ask: can AI be creative?

Defining creativity

Several attempts have been made to define creativity. Margaret Boden, a professor of cognitive science, settled on “new”, “surprising” and “valuable” as the three essential criteria. An earlier attempt by Morris Stein, an American sociologist, characterised a creative work as a “novel work that is accepted as tenable or useful or satisfying by a group in some point in time.” Many of the attempts to define the term have coalesced around a ‘standard’ definition as outlined by Mark Runco and Garrett Jaeger in the Journal of Creative Studies. Creativity, argue the pair, is generally accepted to require both originality and effectiveness.

Originality feels necessary, but not sufficient. Truly creative works must comprise some degree of originality, yet novelty alone is not enough. Four random words lifted from a dictionary may make something original and yet, for most, this would struggle to constitute creativity.

Something else is, therefore, required and this is captured within the effectiveness criterion. In Boden’s definition she defines effectiveness through the lens of value. Stein stresses more specifically that usefulness can be specific to a particular group. This added nuance gets right to the heart of the difficulty in defining creativity. Whether something is considered creative or not has a lot to do with the perspective from which one is looking.

What then makes something 'effective' for a particular group? Many emphasise motivation. The motivation for the creation of something original is, for many, the reason to classify it as truly creative. It seems reasonable to assume that Tracey Emin’s bed would struggle for acclaim, without an understanding of the motivations for creating it.

We’re left then looking for something both original and effective, with that efficacy for a particular group often stemming from the motivation for the original work.

Truly creative works must comprise some degree of originality, yet novelty alone is not enough.

Large Language Models 101

Before we can judge AI against this definition of creativity, we need a basic understanding of how Large Language Models (LLMs) ‘create’. LLMs are often described as next word prediction machines, but what does this actually mean?

Creating an LLM follows two phases: pre-training and post-training. Pre-training creates what is called the ‘base model’ and, to date, this is where most data-centre capacity has been used. Simply, the LLM ingests a huge amount of information from which it can learn. This is most often sourced from large portions of the internet. Oversimplifying slightly, the LLM tries to draw relationships between words. How they do this isn’t important here, what matters is that the LLM is trying to distil a very large set of data by understanding the relationship between words and using this to probabilistically guess the next word in a sentence.

If we entered the phrase ‘the cat sat on the’ the model might, based on all the historical data it has seen, guess the next word is 'mat'. The model has not been explicitly told this but has instead inferred this from the historical language it has seen.

With pre-training complete, we have a base model, and we can move to post-training. Here the model has captured the statistical regularities in the dataset which it can use for next word prediction.

In post training we are fine-tuning the model. In the case of systems such as ChatGPT, we train the base model on a smaller but higher quality dataset of interactions that illustrate how a helpful agent might interact. From these examples the LLM learns the style of response appropriate for a helpful agent.

Post training can also include reinforcement learning (RL), which can take several forms. This is easiest to understand using a maths example. The model is given a series of problems and is given the answer to each problem, but no workings. It must thus try multiple solution paths and learn from which approaches are most effective. This process significantly improves the model’s ability to answer complex questions.

When we go to interact with an LLM it can also draw on tools such as an internet browser to augment responses but much of the LLM’s behaviour remains grounded in the pre-training and post-training steps.

What is hopefully clear from this oversimplified example is that LLMs are fundamentally probabilistic entities and that the probabilities which drive the next word prediction are derived, in large part, from the historical data they are trained on.

Can AI create?

With our understanding of LLMs, it seems reasonable to suspect that they should be capable of originality. Novelty, again as described by Stein, is the “reintegration of already existing materials or knowledge, but when it is completed, it contains elements that are new.” The probabilistic process by which LLMs generate new material is bound to generate novel combinations of existing material.

Take for example AlphaGo, an AI program developed by DeepMind, a subsidiary of Google. AlphaGo was developed to play a board game called Go. Go is played on a 19x19 grid and is much more complex than chess. AlphaGo shocked viewers when, whilst playing world-class player Lee Sedol, it played an unconventional move which many thought to be a mistake. The move later proved to be strategically brilliant and swung the game in AlphaGo’s favour. This move had not yet been contemplated by Go-playing professionals and demonstrates some level of AI-generated originality.

Yet, whilst Go has a desired outcome that is easy to define, many creative pursuits don’t. What makes a song or painting truly effective?

For some, once they have seen behind the cloak, the music, film or artwork doesn’t feel the same.

Returning to our definition – it comes down to who this is effective for, and the Velvet Sundown example points us to where AI might fail to meet our definition. After being disclosed as an AI-generation, some fans’ enthusiasm for the band cooled. Having listened to AI music or watched AI-generated video content, I think you would often be hard pressed to tell the difference, yet what seems to matter is motivation.

For some, once they have seen behind the cloak, the music, film or artwork doesn’t feel the same. It is for this reason that AI may never truly monopolise creativity. Yet with 34% of music uploaded to Deezer (a streaming site) now AI-generated, and with 1 in 10 of the fastest growing YouTube channels showing solely AI-generated content, it seems this group might be smaller than we expect. There is a large portion of daily content consumption for which AI-generated content is probably ‘good enough.’

In addition, the humans prompting these LLMs are arguably still the same motivated creators. Take films, for example. It seems fair to assume a non-significant part of the motivation embodied in the work comes from the writer, director and producer. Does then the absence of actors and whole production teams reduce the motivation of the work anyway? A recent AI-generated work from directors Samir Mallal and Bouha Kazmi, which depicts the US attacks on Iran’s nuclear sites, doesn’t lack motivation.

And yet I remain convinced by the argument put forward by, amongst others, Azeem Azhar, tech analyst, writer and podcaster. His theory is that perfection in creation may become increasingly commoditised and that both criteria of our creativity definition (originality and effectiveness) may even be met in many cases. But whilst AI-generated content might be slick and increasingly error-free, precisely what will be missing is human imperfection.

Pottery has been mass produced since the late 18th century and yet hand thrown clay remains popular. Something about the human motivation and involvement, and the imperfections this creates, resonates. These imperfections are a feature not a bug. Whether AI replaces 20% or 90% of creative output, it seems unlikely to replace the extortionate artisanal pottery my mother so loves to receive for Christmas.

Understanding Finance

Helping clients understand what we do is key to building relationships. To explain some of the industry jargon that creeps into our world, we’ve pulled together a section of our site to help.