Are we at an inflection point on the road to developing more innovative AI/ML techniques, such as large language models (LLM) and related software? All the hype around ChatGPT and the subsequent announcements of similar products and massive investments by competitors suggest we are at the beginning of many exciting advances.
On the other hand, if you ignore the hype, is ChatGPT just one of many incremental steps that have occurred without fanfare during the past several years? Perhaps the only difference between the ChatGPT step and its predecessors is it’s a well-promoted step.
I believe ChatGPT and its peers are a significant AI/ML advance. Even more important, we are at the beginning of many exciting AI/ML advances. What will the next generation of LLMs look like? What should CIOs consider as they evaluate the potential of AI/ML advances for applications that can help their organizations? Here are the emerging advances.
Models will generate their own training data to improve themselves
Today’s LLMs are limited to processing the information their software has derived from scraping the web. It’s tempting to conclude that the web contains all the information there is or is needed to process LLM queries. However, that’s not true. For example:
- A lot of information sits behind a login screen, making it unavailable for scraping.
- A considerable amount of historical information exists only on paper in off-site storage boxes.
- Some literature, government documents and business information exists only on paper.
- Small amounts of information exist only as artifacts in museums.
Emerging LLMs are developing techniques to enhance their training data and thereby compensate for these gaps. For example:
- Some emerging LLMs can further process their answers to augment their training data to improve their subsequent query accuracy.
- Another emerging LLM method replaces the current manual fine-tuning of models with more advanced models that can generate their own natural language fine-tuning instructions.
- At least one research group asks the LLM model to reflect on its answer and refine it before presenting it.
This ability to generate their own training data will produce an enormous leap forward in the accuracy of LLMs’ output.
Models will fact-check themselves to reduce inaccuracies
Today’s LLMs regularly produce inaccurate, misleading or false output, even though they display it confidently. Researchers call such output “hallucinations.” This problem exists because the training data from the web includes innocent mistakes and deliberate misinformation.
- Are limited to the information their software derived previously from scraping the web up to a cut-off date selected by the LLM developer.
- Cannot access data from anywhere in real-time to answer queries.
Emerging LLMs are developing the ability to:
- Provide references and citations to support the accuracy of their output.
- Demonstrate that their output is based on credible sources.
- Submit queries to a search engine and then base their output on its results.
These developments will increase confidence in LLM output and overcome the current risk of unreliability.
Models will become sparser to improve performance
Today’s LLMs all use a surprisingly similar architecture, including dense models. Dense means that the model processes all of its many billions of parameters for every query.
As the value of size, or the number of parameters, has become apparent, LLMs have become larger and consume more computing resources to respond to queries. These factors have increased costs, caused congestion and decreased response times. To counteract these undesirable outcomes, the idea of sparse models has emerged.
Emerging LLMs divide their parameters into subject domains to create sparse models or multiple sub-models. LLMs then process just the sub-model parameters corresponding to the query’s domain.
The further development of sparse models will support LLMs with an increasing number of parameters while being less consumptive of computing resources.
Models will learn more about reasoning to become smarter
Today’s LLMs have achieved excellent performance for various tasks. However, LLMs:
- Have no awareness of the meaning of their output.
- Require extensive supervision effort to fine-tune their output.
- Respond poorly to queries that require reasoning, common sense, and implicitly learned skills.
Humans operate differently. We read various sources of information and then reflect on the topic or think through a problem. Unlike humans, LLMs cannot generate novel ideas and insights from data.
Emerging LLMs improve their performance by self-evaluating multiple reasoning paths available to produce the query output. Two of these reasoning paths are called “chain-of-thought (CoT)” and “zero-shot chain-of-thought.” They are invoked with a “Let’s think step by step” prompt. The benefit of these techniques is that they improve the accuracy of LLM output without requiring enhancements to the LLM.
While no one is suggesting that this small change produces LLMs that learn, this technique is a small step toward the goal of LLMs that can reason. We can expect more progress as research continues.
Models will run on purpose-built chips to become faster
Today’s LLMs run on large server farms with general-purpose high-performance CPUs and attached graphics processing units (GPUs). Critical features of high-performance servers that can handle the voracious appetite for computing resources that LLMs exhibit include:
- Multiple PCI-Express lanes to connect CPUs to GPUs.
- Multiple channels between the memory and each CPU.
Continuing developments in hardware that will improve LLM performance even as their demands for computing resources increase include:
- GPUs with increased capacity and performance with modest increases in cooling requirements and physical size.
- GPUs with increased video random access memory (VRAM) for improved image processing.
- Field-programmable gate arrays (FPGA) containing configurable programmable logic blocks that can accelerate the processing of the most prevalent instructions in AI/ML applications.
- Custom-designed application-specific integrated circuits (ASICs) for the matrix computations that AI/ML uses intensely.
- Links between GPUs to reduce CPU load.
- Higher performance storage systems such as fast NVMe drives to minimize data access elapsed time.
These hardware advances will allow LLM applications to maintain quick response times even though the underlying models are growing rapidly.
These and other AI/ML advances will improve the accuracy of the next generation of LLMs. That accuracy will build confidence and widen the implementation of LLMs for production applications.
What ideas can you contribute to help organizations evaluate the potential of AI/ML advances? We’d love to read your opinion. You can share that with us below. Select the checkmark for agreement or the X for disagreement. In either case, you’ll be asked if you also want to send your comments directly to our editorial team.