Unraveling the Magic: A Guide to Understanding Large Language Models for the Non-Tech Savvy

“Unlock the secrets of Large Language Models: Journey from your fingertips to AI-generated wisdom in milliseconds.”

source: Nano Banana

Introduction

In my previous article I discussed Building AI Systems with Atomic Agents: Reducing Costs and Boosting Efficiency. Today I want to take a step back and get back to basics. Let’s take a look at what a “Large Language Model” or LLM is, and what’s so magical about it.

In this article we will take a deep dive into LLMs. I will explain the steps that happen in plain and simple language and images. By the end of this article you should be able to understand and explain how LLMs work and be blown away of its capabilities. All this knowledge will help you to become a better prompt engineer so you can really use the magic of LLMs. Let’s go.

From Prompt to Prediction

You probably heard the words LLM and Gen AI a lot these days and nodded firmly during a conversation. But you didn’t really know how it works, you know you can type in some characters and words, and then you get an answer, it’s not really an answer it’s actually a prediction. The LLM predicts what should follow after the words you have given it.

How come that in a matter of seconds you get your answer? This is the part where I want to shed some light on the inner workings of an LLM. So you can learn and appreciate the work behind the curtain without getting lost in difficult technical lingo.

Step 1: The Prompt

The first thing that happens, is you taking action in the form of typing words into a chat window. Keep in mind the “you taking action” so we can already shortcut those dystopian predictions of terminators driving around shooting on motorcycles. Today AI can’t initiate an action by itself, they have to get input, your input.

Let’s continue, so you provide some words and these words we call a “Prompt”. The prompt is the input to the Large Language Model. It can be a single word, a sentence or a complete book. This prompt is the key that unlocks the model’s power to predict and generate a response. The more context and examples you provide the better the answer will be.

Step 2: Tokenization Means Breaking Down the Prompt

Just like programmers break down complex problems into smaller problems until they are small enough to take action on. So will the model break down your prompt into smaller parts which we call “Tokens”.

These tokens can be individual words or even characters, it depends on how the model is designed. This process is known as “Tokenization” and allows the model to understand and process the prompt better, as you can see in the screenshot below.

source: 3Blue 1Brown video “But what is a GPT? Visual intro to transformers | Chapter 5, Deep Learning” [https://www.youtube.com/watch?v=wjZofJX0v4M&t=749s](https://www.youtube.com/watch?v=wjZofJX0v4M&t=749s)

Step 3: Embedding is Transforming Words into Numbers

Here comes a bit of the magic, it’s not really magic but math. But for some of us, like me math is magic. The model will take each token and convert it into a unique number that represents a token. We call this number an “Embedding”.

Think of it like a secret message, where you and your friend decide what representation a word gets, so only you two understand the meaning and context of each word, only you two can decipher the message. The image below makes it a bit more visual.

source: 3Blue 1Brown video “But what is a GPT? Visual intro to transformers | Chapter 5, Deep Learning” [https://www.youtube.com/watch?v=wjZofJX0v4M&t=749s](https://www.youtube.com/watch?v=wjZofJX0v4M&t=749s)

Step 4: A Neural Network is the Model’s Brain

Our brain is connected via neurons, millions of neurons are connected to each other and signals are passed from one neuron to another. This is how we humans work, whatever input we get like sound, scent, touch, taste or vision is converted into signals and these signals go through our brain via the neurons, and we come to a conclusion.

The same goes for the model, because the model is mimicking the workings of our brain. So the neural network of the model is responsible for processing the embedding and generate an answer. It has several layers and each layer is responsible to understand the embedding, it’s context and meaning.

Step 5: Attention Means Focusing on What Matters

One of the key components of modern language models is the attention mechanism. It allows the model to focus on different parts of the prompt that are relevant to generate an answer. Think of it as the language model’s ability to pay attention to specific words or phrases, just like we do when engaging in a conversation.

The image below gives you an idea of how this works. We start at the back, where you see the prompt. And then it will pass through several layers, and step by step the model is discovering the meaning and context of the embedding.

source: 3Blue 1Brown video “But what is a GPT? Visual intro to transformers | Chapter 5, Deep Learning” [https://www.youtube.com/watch?v=wjZofJX0v4M&t=749s](https://www.youtube.com/watch?v=wjZofJX0v4M&t=749s)

Step 6: Probability Distribution — Weighing the Options

As the model processes the embedding through its neural network, it generates a set of scores called “Logits” for each possible “next word” in the sequence. These scores are then converted into probabilities, indicating the chance of each word being the most appropriate choice to be the next word in the sentence. The model is weighing its options, considering multiple possibilities before making a decision.

Step 7: The Model’s Decision-Making Process

The model is upfront trained with a great amount of words and has an Idea of how words are related or not. The embedding will have a value and the model will calculate a probability to see which word has the highest value to become the next word. This happens trough several layers and the model maintains the previous generated words and maintains the context. The model is “thinking” and measures each word to create an answer to your prompt, word by word.

Below you can see that Snape has the highest probability to be the next word after Professor. For those who know Harry Potter, they know he is the least favorite student of Snape and as you can see so does the model.

source: 3Blue 1Brown video “But what is a GPT? Visual intro to transformers | Chapter 5, Deep Learning” [https://www.youtube.com/watch?v=wjZofJX0v4M&t=749s](https://www.youtube.com/watch?v=wjZofJX0v4M&t=749s)

Step 8: Response Generation is when The Model Speaks

Now the model combines the selected words with the highest probability to form a complete response to your prompt. This generated text aims to be relevant, and meaningful, reflecting the language model’s understanding of the context and its ability to generate human-like language. And voilà! You get your answer.

The Marvels of Language Models

A language model is a fascinating entity that bridge the gap between human language and machine understanding. It enables us to interact with computers and machines in general, in a more natural and intuitive way, opening up a world of possibilities for communication, and problem-solving.

As we continue to advance artificial intelligence, it’s essential to approach this technology with an open mindset. We must recognize their potential to enhance our lives while also understanding their limitations and the importance of human judgment and ethics.

So, the next time you encounter a language model, whether it’s in a chatbot, a virtual assistant, or a creative writing tool, remember the incredible journey your words take from prompt to answer.

Embrace the magic of these models, but always remember the wisdom and creativity that defines us as humans.

If you want to dive deeper into this topic I recommend the 3Blue 1Brown series “But what is a neural network?” the work they have done is amazing. If you are amazed then do support them at https://3b1b.co/support.

AI Glossary: Mastering the Language of Artificial Intelligence

Here is an explanatory glossary of relevant AI terms with their
definitions:

1. Artificial Intelligence (AI): Computer systems that can perform tasks
that normally require human intelligence.
2. Machine Learning: A subset of AI where systems automatically
learn and improve from experience, without being explicitly
programmed.
3. Deep Learning: An advanced form of machine learning that uses
neural networks with multiple layers.
4. Natural Language Processing (NLP): The ability of AI to
understand, interpret, and generate human language.
5. Algorithm: A set of instructions or rules followed by an AI system
to perform a task or solve a problem.
6. Neural Networks: Computer systems modeled after the human
brain, used in deep learning.
7. Big Data: Large, complex datasets used to train and inform AI
systems.
8. AI Chat bot or AI Chat Agent: An AI program designed to simulate human conversations.
9. Machine Vision: The ability of AI to interpret and analyze visual
information.
10. Predictive Analytics: The use of AI to predict future trends and
outcomes based on historical data.
11. Automation: The use of AI to perform tasks with minimal human
intervention.
12. Data-Driven Decision Making: The use of AI analyses to inform
and guide business decisions.
13. Personalization: The use of AI to tailor experiences or
recommendations for individual users.
14. Sentiment Analysis: AI techniques to determine the emotional
tone of text or speech.
15. Robotics: The integration of AI into physical machines to perform
tasks.
16. Logit: Is a mathematical function used to predict the probability of something belonging to a certain category
17. Tokens: small pieces of text that AI systems use to understand and process language. They can be words, parts of words, or even punctuation marks.
18. Embedding: Represent words or phrases as numbers that computers can understand.
19. LLM — Large Language Model: type of AI that has been trained on vast amounts of text data. It can understand and generate human-like text,