Decoding AI Jargons With Chai

Artificial Intelligence (AI) is the umbrella field in computer science that’s all about building machines capable of things we usually associate with human smarts, learning, reasoning, recognizing patterns, and understanding language. A big breakthrough in AI has been Large Language Models (LLMs), which are essentially massive neural networks trained on huge troves of text so they can predict and generate human-like sentences. GPT (which stands for “Generative Pre-trained Transformer”) is one of the best-known families of these LLMs, first it “pre-trains” on a mountain of text to pick up grammar, facts, and word play, then it’s fine-tuned for specific jobs like chatting, translating, or summarizing. In short, GPT models are cutting-edge examples of LLMs, and LLMs are some of the most powerful tools in AI today.

Transformers

Back in 2017, a team led by Vaswani and friends blew everyone away with a new deep-learning design in their paper “Attention Is All You Need.” They called it the Transformer, and it quickly became a game-changer for anything involving language like ChatGPT, BERT, and other generative AI tools.

What makes the Transformer so clever is its “self-attention” trick, instead of reading words one after another, it looks at every word in a sentence and decides which ones matter most for understanding meaning. So in a sentence like “The cat that the dog chased was black,” it easily figures out that “was black” is talking about “the cat,” not “the dog.” That kind of long-distance understanding is exactly what older models struggled with and why Transformers have taken over the AI world.

Tokenization

Think of tokenization as the way a language model “cuts up” text into bite-sized pieces it can actually work with. Before you feed a sentence into a transformer-based model, you need to convert that sentence into a stream of numbers and you can’t just hand it raw characters or whole words. Tokenization sits in the middle, doing two jobs:

Splitting text into tokens
- A token might be a whole word (“elephant”), a common word-part (“elep” + “hant”), or even a single character or punctuation mark.
- Modern systems usually use subword schemes (like Byte-Pair Encoding or WordPiece) so rare words get broken into familiar chunks, while common words stay intact.
Mapping tokens to IDs
- Each token from the vocabulary is assigned a unique integer.
- Your model’s first step is to look up those integers and turn them into vectors it can crunch.

A quick example

Take the sentence:

“Transformers tokenize text cleverly.”

A subword tokenizer might produce:

["Transform", "er", "s", "token", "ize", "text", "clever", "ly", "."]

And then map that to IDs:

[1287, 54, 9, 3421, 17, 76, 881, 62, 4]

Those numbers become the first inputs your generative model “sees” when predicting the next word or crafting a response.

In short, tokenization is the unsung hero that bridges human words and machine-readable data making it possible for GenAI to learn language patterns and spin out fluent text.

Code example on how to tokenize and detokenize text with OpenAI’s tiktoken

import tiktoken

encoder = tiktoken.encoding_for_model("gpt-4o")

print("Vocab Size", encoder.n_vocab)  # 2,00,019 vocab size

text = "The cat sat on the mat"
tokens = encoder.encode(text)

print(tokens)  # [976, 9059, 10139, 402, 290, 2450]

my_tokens = [976, 9059, 10139, 402, 290, 2450]

# decoding the tokens back to text

decoded_text = encoder.decode(my_tokens)

print(decoded_text)  # The cat sat on the mat

Vocab Size

Vocab size is the “dictionary” a language model carries around. It’s simply the total number of unique tokens (words, word-pieces, punctuation marks, etc.) that the tokenizer knows. A bigger vocab means the model can represent more words (even rare ones) in a single piece, but it also makes the dictionary heavier to carry. A smaller vocab forces the tokenizer to chop uncommon words into smaller chunks so instead of “transformers” you might get “transform” + “ers” which keeps things compact but can add a few extra tokens here and there.

Encoder - Decoder

Encoder–Decoder is a two-step process, the encoder first takes whatever you give it say a sentence, an image, or some data and turns it into a neat, secret code (a list of numbers) that captures all the important flavors. Then the decoder comes in, reads that secret code, and whips up whatever you asked for maybe a translation, a summary, or brand-new text. In generative AI, this setup lets the model understand any input in its own “language” and then generate the right output, all in one smooth flow.

Vectors

Imagine each word or piece of text as a point on a map but instead of latitude and longitude, you’ve got dozens or even hundreds of numbered coordinates. Those coordinates make up a vector, which is just a fancy name for a list of numbers. In generative AI, every token (a word, part of a word, or punctuation) gets turned into one of these vectors so the model can do math on them. Nearby vectors mean similar meaning (“king” sits close to “queen”), and drawing lines between them lets the AI figure out relationships (“king”–“man” + “woman” ≈ “queen”). All those vectors are what the model actually works with when it learns language patterns and spins out new text.

Semantic Meaning

Think of semantic meaning as the “what it really means” underneath the words. In generative AI, it’s not just about matching tokens or spelling out sentences it’s about capturing the ideas, emotions, and context behind them. When you feed a sentence into a model, it converts each token into a vector that encodes its semantics—so “bank” in “river bank” and “bank” in “savings bank” end up in very different neighborhoods in vector space. That way, the AI can figure out what you actually mean and generate responses that make sense, not just string together words that look right.

Embeddings

Embeddings are the magic translator that turns words, sentences or even images into lists of numbers so the AI can “feel” how they relate. In a good embedding space, similar things end up as neighbors “coffee” and “tea” sit close, while “coffee” and “planet” hang out far apart. When you feed text through an embedding model, it learns to pack all the meaning, tone, topic, context into those number-packed vectors. Down the line, the AI can compare those vectors to find the most relevant responses, pull in related documents, or cluster ideas together, all because it’s doing math on those embedding coordinates instead of wrestling with raw text.

Positional Encoding

Imagine you’ve handed your AI a bag of mixed-up Scrabble tiles it knows each letter, but it has no clue what order they go in. Positional encoding is like tagging each tile with a little GPS coordinate before you toss them into the game, so the model can tell “H” came first, “E” second, and so on. In Transformer-based GenAI, we add these position tags (often with clever sine and cosine waves) to each word’s vector so the model sees not just the words themselves but also where they sit in the sentence. That way, it knows “I love chocolate” isn’t the same as “chocolate love I,” and it can weave together meaning in the right order when it writes or translates text.

Self Attention

Self-attention is your brain’s way of zeroing in on the most important parts of a sentence while you read it. Imagine you’re reading “When Sarah picked up the phone, she realized it wasn’t hers” your mind automatically links “she” back to “Sarah” and tunes out other words. Self-attention does the same thing in a Transformer for each word, it looks at every other word in the sentence, scores how relevant they are, and then mixes all those words together “weighted by importance” to build a richer understanding of each one. That lets the model capture long-range connections (“phone” ↔ “hers”) and keep track of context no matter how far apart words appear, which is exactly why Transformers excel at generating coherent, on-point text.

Multi head attention

Imagine you’re reading a story through several pairs of special glasses at once, one pair highlights emotional words, another zeroes in on names, a third tracks action verbs, and so on. That’s basically what multi-head attention does in a Transformer, instead of computing one single set of importance scores between words, it runs several “attention heads” in parallel, each learning to spot different kinds of relationships. By combining all those perspectives, the model builds a richer, more nuanced picture of the text so it can catch tone, syntax, and long-distance links all at once, rather than trying to squeeze everything through one narrow lens.

Softmax

Think of softmax as the stage manager in a generative AI system it takes the model’s raw “enthusiasm” scores for every possible next word and turns them into a neat lineup of probabilities that add up to 100%. Words the model really favors get bigger slices of the pie, while less likely words get smaller ones but nothing disappears entirely. When you ask the AI to continue a sentence, it rolls the dice against that probability list to pick the next token. In short, softmax takes rough scores and smooths them into a fair, interpretable probability distribution that your AI can actually sample from.

Temperature

Temperature is the “creativity dial” on your AI generator it tweaks how bold or cautious the model is when picking its next word. At a low temperature (say 0.2), the AI sticks close to its top favorite words output stays predictable and safe. Crank it up (around 1.0 or higher), and the AI spreads its bets, giving less-likely words a real shot so you get more surprising, colorful, or even off-the-wall responses. In short, temperature controls how “peaky” or “flat” the model’s probability curve is before it makes its choice, dialing in anything from buttoned-up accuracy to wild creativity.

Knowledge Cutoff

A model’s knowledge cutoff is the date it “froze” its training data, everything it learned about the world stops at that point. So if your AI’s cutoff is December 2023, it won’t know about events, discoveries, or pop-culture moments that happened after then. It means the model can still chat fluently, but it can’t give you the “latest” news or updates beyond its cutoff and might even guess wildly (or hallucinate) if you ask about more recent happenings.

Decoding AI Jargons With Chai

Transformers

Tokenization

A quick example

Code example on how to tokenize and detokenize text with OpenAI’s tiktoken

Vocab Size

Encoder - Decoder

Vectors

Semantic Meaning

Embeddings

Positional Encoding

Self Attention

Multi head attention

Softmax

Temperature

Knowledge Cutoff

Comments

More from this blog

Concepts of Prompting and its Techniques

Command Palette

Transformers

Tokenization

A quick example

Code example on how to tokenize and detokenize text with OpenAI’s tiktoken

Vocab Size

Encoder - Decoder

Vectors

Semantic Meaning

Embeddings

Positional Encoding

Self Attention

Multi head attention

Softmax

Temperature

Knowledge Cutoff

Comments

More from this blog