Meta’s AI lab has created a massive new language model that shares both the remarkable abilities and the harmful flaws of OpenAI’s pioneering neural network, Generative Pre-trained transform (GPT-3).
Additionally, the big tech company is letting researchers study the model, along with details on how it was built and trained.
Language AI models are part of a generative pre-trained transformer,” said Shashank Srivastava, who is leading Amazon’s A.I.-based anomaly detection product as senior product manager.
According to experts from analyst firm Info-Tech, a language model is a probabilistic model that learns to predict the next word in a sequence of words based on the preceding words. The model learns the associations between words, the patterns, and context of a sequence of words in phrases, sentences, or paragraphs. The most complex models can learn dependencies between words in the text where the words can occur in different parts of the text.
It’s part of the technology that predicts the next word you want to type on your mobile phone, allowing you to create a message quickly. More complex models can generate a summary of an article or even write poetry.
The latest language models are built using deep learning algorithms. For example, an autoregressive language model called GPT using deep learning was created by the research lab OpenAI.
“Facebook recently released their own open models and these models are trained on different datasets from the internet so, for example, emotion, sentiment, surveys, live chat logs. All the information that is available either within an organization or available on the internet,” Srivastava said.
This is one of the first times that a fully trained large language model will be made available to any researcher who wants to study it.
“We strongly believe that the ability for others to scrutinize your work is an important part of research. We really invite that collaboration,” says Joelle Pineau, the managing director at Meta AI, quoted in MIT Technology Review.
According to Srivastava, it’s not usual for companies to develop their own language models.
“I would say it’s not normal for companies to do so. But there have been a lot of shifts towards standardizing the AI ML model with unbiased inputs.”
For example, if you train a model with billions of images of a cat on a tree, now the model knows all the combinations of cat on a tree and what that will look like.
However, by letting researchers look at Meta’s language, the company is looking to remove the bias that could come with certain language models.
“A cat on a tree will have its own bias. Like what kind of cat it is, or what colour is it? I like a certain species of cat or breed of cat and I’m just training the model with that. So we have an open model, like the one Meta published, and the intent is to remove the bias.”
Srivastava said Meta’s new language AI model was trained using all data available on the internet.
According to Info-Tech AI experts Anu Ganesha and Irina Sedenko, both OpenAI’s GPT-3 and Meta’s Open Pre-trained Transformer (OPT) are data-driven models and use large-capacity models to fit massive amounts of data.
This means these are heavily influenced by the quality of training data, and this can even lead to catastrophic outcomes from the models. Unlike OPT, GPT-3 has been relatively well tested and has been shown to produce biased and disturbing content due to limited quality of the internet data.
Since the company’s everyday operations affect many users, it’s important that Meta’s OPT is well tested outside the company. Being able to have the language model tested by independent researchers will help avoid the heavy criticism of the way the company has been operating with its current algorithms.
For the benefit of Meta and its users, it is important that it allows researchers not affiliated with Meta to study its new AI language model OPT, Ganesha and Sedenko said.
Srivastava also thinks allowing the language model to be researched is a good thing.
“Publishing something to the research community, as a baseline, is a more positive thing that Meta has done. Now, the research community can take that and dig deeper into the model… and come up with a few more other insights and undiscovered patterns. It’s a good initiative from Meta’s side,” he said.