Go large or go home
Among the most hyped developments in tech in recent years is the increasing prevalence of AI systems that can understand and generate text, called language models. It’s understandable why — apart from seeming like it has leapt from the pages of a sci-fi novel, natural language processing (NLP) promises plenty of benefits and uses across many sectors. By making it easier for computers to interpret human language (and even talk back) it will make it easier to solve a variety of problems that were once difficult or impossible to tackle with software alone.
Pattr’s Conversational AI, which powers much of what we do, is driven by natural language processing. Our goal is to facilitate healthier, more productive online conversations — and to do that, we need to be able to understand all the weird and wonderful ways people talk to each other on social media. Our Conversation Health system relies on language models to rapidly identify harmful and abusive content on Facebook and Instagram — even when it is obscure enough that it wouldn’t be captured by a basic language filter. To do this, it needs to correctly identify intent, and not just banned words.
Much of the buzz and hype in the media has been directed at what are called large language models, or LLMs. To put it simply, large language models are trained on vast amounts of text data — often petabytes worth — and are often tens of gigabytes in size themselves. Many big tech companies, like Meta and Google, have either created or are working on their own LLMs.
Perhaps the most famous example is GPT-3, a deep learning model which boasts 175 billion parameters that was released by OpenAI in 2020 and now licensed exclusively to Microsoft. It can generate human-like text and even complete computer code with a short prompt. A New York Times review described GPT-3’s abilities as “amazing”, “spooky” and “humbling”.
Thanks to the vast datasets these large language models use for training, they can be pretty good at recognising input they haven’t explicitly been trained on, and can interpret and respond to a wide variety of scenarios. This ability to accept a huge variety of inputs is part of what makes them seem so ‘spooky’. These singular, gigantic models can be used quite effectively for all sorts of different tasks.