A realistic overview of Large Language Models

By Umut Can Alaçam

November 10, 2024

Artificial Intelligence (AI) has long been a fascinating area in computer science. The release of transformer-based models like GPT-3 and the public availability of ChatGPT in 2023 showed that AI is no longer confined to research labs—it has become a major focus for the tech industry. Yet, as with any technology surrounded by hype, it’s essential to approach AI’s perceived capabilities with skepticism and examine its true potential.

This post explores the history of AI, the evolution of its definitions, what Large Language Models (LLMs) are, and their capabilities, limitations, and potential impacts on life and industry.

A brief history of AI

To fully appreciate the AI models available today, it’s helpful to understand the historical development of AI.

The term “Artificial Intelligence” was coined by John McCarthy at the 1956 Dartmouth Conference, where he defined AI as:

“the science and engineering of making intelligent machines, especially intelligent computer programs.”

In the early days, programs like The Logic Theorist, General Problem Solver, and Samuel’s Checkers Program relied on symbolic logic and rule-based systems with limited learning capabilities. Interest in AI surged during the 1960s, with advancements such as the Perceptron (Frank Rosenblatt, 1958), which laid the foundation for modern neural networks. However, by the mid-1970s, AI funding and enthusiasm waned in what is now called the first “AI winter,” largely due to the failure of early systems to meet public and industry expectations.

AI reemerged in the 1980s with expert systems, regaining attention and ending the first AI winter. The 1990s introduced machine learning and neural networks, leading to applications like IBM’s Deep Blue, which famously defeated chess champion Garry Kasparov. By the 2000s, AI began integrating into daily technologies like search algorithms and robot cleaners. The 2010s saw a further AI transformation, with the explosion of Big Data fueling advances in data engineering and machine learning.

The 2017 Google Brain paper, Attention is All You Need by Vaswani et al., introduced the transformer model, the core of modern language models like ChatGPT. OpenAI’s GPT-3, launched in 2021, marked a new frontier, showcasing the impressive abilities of transformers in natural language understanding. Today, the impact of AI is more pronounced than ever, yet it’s crucial to understand what LLMs truly are, their limitations, and what we’ve achieved so far.

AI in Media: Shifting Perceptions

The term “AI” has evolved over time, shaped by media and marketing trends. Years ago, AI was closely associated with neural networks in applications like image recognition. Today, however, AI is often equated with creative tasks, such as generating text, images, and even music. This change in perception means that technologies once considered groundbreaking, such as image processing or voice recognition, are now seen as standard.

Thus, it’s essential to define what we mean when discussing AI, as media and marketing often exaggerate the capabilities and implications of new developments.

Recent advancements on AI

Recent AI advancements have centered around generative models, which leverage vast amounts of human-generated data to produce new content, such as text, images, and videos. The field of Natural Language Processing (NLP) has particularly flourished with the introduction of transformer models, which outperformed previous models like Long Short-Term Memory (LSTM) networks. This innovation marked a leap forward in NLP, yet it has also raised concerns about the potential impact on professions like design, translation, and engineering, as some fear AI may replace these jobs.

Capabilities of Large Language Models (LLMs)

Recent advancements in NLP, especially the development of transformers, have given LLMs impressive capabilities across various domains.

Research: LLMs can answer questions on topics ranging from simple recipes to complex theories, making them useful for everyday learning and exploratory research. However, they are prone to “hallucinations,” where they generate plausible but incorrect information. LLMs also lack the ability to evaluate sources, making them more suitable for general knowledge than rigorous research.

Writing: LLMs excel at generating coherent text, helping with grammar correction and improving narrative flow.

Translation: Transformer models’ context-awareness allows for more accurate translations, although they can struggle with nuanced or culturally specific expressions.

Code Generation: LLMs can generate code snippets and automate routine programming tasks, simplifying development processes. However, they are less reliable for complex tasks that require deep understanding or long-term code maintenance. The code they generate can also contain inefficiencies or overlook security best practices.are best suited for routine tasks and boilerplate code rather than for designing robust, production-ready software solutions.

Limitations of Transformers

Although LLMs that utilizes Transformer models are revolutionized NLP, they have limitations like any other machine learning model.

One of the most significant limitation of transformers is their context window is limited. Which means that the model can capture the contextual meaning of a limited amount of text. Therefore, current LLMs are not capable of capturing the context from long texts, such as Textbooks, or large software projects. Although, with enough vertical scaling of the models, this problem can be minor in the future. However, the scalability is also not a strength for the transformers, as the space complexity of the attention mechanism is growing exponentially.

Additionally, operating and training LLMs are quite expensive due to high computational complexity of the model. Which is still questionable in terms of profitability of these models. However, computational power and efficiency is also growing rapidly, therefore this problem can be overcome in near future.

Besides the technical limitations of the time-being, there are few criticism about LLMs. One of the most significant one is regarding to its reasoning capability. As in other machine learning models, transformer models are also do not perform a real reasoning as a human does, they make estimations based on statistics. Which limits their creativity, as an LLM actually does not know something we don’t know. It does not know the true meaning of the words, and it only generates the appropriate sequence of tokens for a given sequence of tokens. There are some hyper parameters for reducing the repetitive approaches and improving the creativity of the input, but they rely on randomness, and human creativity much more complicated then randomness.

One other criticism about LLMs are that we might have been experiencing the limits of the transformer architecture with the data we have. This skepticism is actually supported by the recent developments in the LLMs. When GPT-3 released in 2020, it was very impressive how it can produce reasonable results with very few examples. However, it had issues with the bias and hallucinations. OpenAI later improved the model with additional guardrails to reduce these issues and used a training data set designed for a chat context for ChatGPT product. In fact, the approach is the same, and we assume the design is similar, only changes are the scale of the training data set and additional mechanisms. Here we have to speculate about assuming the design is similar since OpenAI decided to keep the technology proprietary, and didn’t share the architecture of new models.

Additionally, many researchers skeptical about the capabilities of current neural network architecture in terms of achieving AGI.

Conclusion

The introduction of Large Language Models (LLMs) with the Transformer model architecture has revolutionized AI, significantly enhancing the capabilities of machine learning models. Modern language models, like GPT, have proven that AI tools can be useful across various fields, reducing the mental effort required for cognitive and creative tasks. However, it is essential to understand the capabilities of modern machine learning technologies and recognize their limitations. LLMs operate as advanced statistical engines, synthesizing patterns in the data they were trained on, but they lack genuine understanding, reasoning, or creativity akin to human intelligence.

The marketing of AI often exaggerates model capabilities, leading the media and public to be misled by illusions of reasoning and creativity that machine learning models appear to exhibit. It is also important to note that the meaning of “AI” in media has evolved over time, with every significant advancement in AI seen as heralding an age of robots and the potential end of human dominance.

While most concerns about AI are focused on fears of artificial intelligence taking over or rendering human labor obsolete, a more immediate risk lies in the control of information by proprietary language models. Language models like ChatGPT are fully controlled by tech companies, and their outputs can be adjusted according to corporate policies. If language models become the primary source of everyday information, they could enable select groups to control the information presented, reducing diversity of perspectives. Unlike the Internet, where everyone can share thoughts and information, language models are only accessible to some degree and are not truly democratic.

It is expected that language models will reduce the effort required for cognitive tasks in the near future. However, human creativity remains vital, as real advancements arise not only from analyzing existing knowledge but also from genuine reasoning and creativity. Therefore, we should aim to incorporate AI tools into our workflows to increase the value we create, improve efficiency, and enhance our creativity.

References

Attention is All You Need https://arxiv.org/abs/1706.03762
On the measure of intelligence https://arxiv.org/abs/1911.01547
LLM Leaderbord https://artificialanalysis.ai/leaderboards/models
The impact of code generation to the code quality https://arc.dev/talent-blog/impact-of-ai-on-code/