The only AI glossary you'll need this year

Artificial intelligence is reshaping the world and simultaneously crafting a new lexicon to explain its methods. In today’s product meetings, pitches, or panels, terms like LLMs, RAG, and RLHF are frequently mentioned, often leaving even tech-savvy individuals feeling a bit uncertain. This glossary aims to clarify these concepts, offering plain-English definitions of the AI terms you’re most likely to encounter, whether you’re involved in building, investing, or just trying to stay informed through sources like JS or related podcasts. We regularly update it as the field progresses, making it a living document akin to the AI systems it describes.

Artificial general intelligence, or AGI, is a vague concept, generally referring to AI with capabilities surpassing the average human in many tasks. OpenAI’s CEO, Sam Altman, described AGI as the “equivalent of a median human that you could hire as a co-worker.” Meanwhile, OpenAI’s charter sees AGI as “highly autonomous systems that outperform humans at most economically valuable work.” Google DeepMind’s definition is slightly different, viewing AGI as “AI that’s at least as capable as humans at most cognitive tasks.” If you’re confused, you’re not alone—experts at the forefront of AI research are, too.

An AI agent is a tool that employs AI technologies to complete a series of tasks on your behalf, beyond the capabilities of a basic AI chatbot. These tasks can include filing expenses, booking a restaurant table, or even writing and maintaining code. However, the term “AI agent” can mean different things to different people due to the evolving nature of the field. The infrastructure to support these capabilities is still under development. Essentially, an AI agent implies an autonomous system that may utilize multiple AI systems to perform complex tasks.

Think of API endpoints as “buttons” on the back of software that other programs can press to make it perform actions. Developers use these interfaces to create integrations, such as enabling one application to pull data from another or allowing an AI agent to control third-party services without manual operation. Most smart home devices and connected platforms have these hidden buttons, even if users don’t directly interact with them. As AI agents become more advanced, they increasingly find and use these endpoints autonomously, opening up new possibilities for automation.

For simple questions, a human brain can answer without much thought—like “which animal is taller, a giraffe or a cat?” However, some problems, like determining the number of chickens and cows a farmer has given 40 heads and 120 legs, require intermediary steps and perhaps a simple equation.

In AI, chain-of-thought reasoning for large language models involves breaking down a problem into smaller steps to enhance the quality of the final result. Although this process is slower, the answers are more likely to be accurate, especially in logic or coding contexts. These reasoning models are developed from traditional large language models and optimized for chain-of-thought thinking through reinforcement learning.

(See: Large language model)

A coding agent is a specialized AI agent applied to software development. Unlike a basic AI that suggests code for humans to review, a coding agent can autonomously write, test, and debug code, managing the iterative work developers typically handle. These agents can operate across entire codebases, identifying bugs, running tests, and applying fixes with minimal human oversight. Think of it as a tireless intern who requires some human review of their work.

Although somewhat ambiguous, the term “compute” typically refers to the computational power that enables AI models to function. This processing capability is essential for training and deploying powerful AI models. The term often serves as shorthand for the hardware providing this power—like GPUs, CPUs, TPUs, and other infrastructures forming the foundation of modern AI.

Deep learning is a subset of self-improving machine learning in which AI algorithms are designed with a multi-layered, artificial neural network (ANN) structure. This allows them to make more complex correlations compared to simpler machine learning models, such as linear models or decision trees. Deep learning algorithms draw inspiration from the interconnected pathways of neurons in the human brain.

Deep learning models can identify crucial characteristics in data independently, eliminating the need for human engineers to define these features. The structure supports algorithms that can learn from errors, improving outputs through repetition and adjustment. However, deep learning systems require large data sets (millions of data points) to produce good results and typically take longer to train, leading to higher development costs.

(See: Neural network)

Diffusion is the technology at the core of many AI models that generate art, music, and text. Inspired by physics, diffusion systems gradually “destroy” data structure—like photos, songs, and more—by adding noise until nothing remains. In physics, diffusion is spontaneous and irreversible, but diffusion systems in AI aim to learn a “reverse diffusion” process to restore the destroyed data, allowing recovery from noise.

Distillation is a technique to extract knowledge from a large AI model using a ‘teacher-student’ framework. Developers send requests to a teacher model and record the outputs, sometimes comparing them with a dataset for accuracy. These outputs train the student model to approximate the teacher’s behavior.

Distillation can create a smaller, more efficient model based on a larger one with minimal distillation loss. This approach was likely used to develop GPT-4 Turbo, a faster version of GPT-4. While all AI companies use distillation internally, some may use it to catch up with leading models. However, distillation from a competitor may violate AI API and chat assistants’ terms of service.

This process involves further training an AI model to enhance performance for a specific task or area, beyond its initial focus—typically by introducing new, specialized data. Many AI startups begin with large language models to develop commercial products, aiming to increase utility for a target sector by using fine-tuning based on their domain-specific knowledge and expertise.

(See: Large language model [LLM])

A GAN, or Generative Adversarial Network, is a machine learning framework that supports significant advancements in generative AI, such as creating realistic data, including deepfake tools. GANs use a pair of neural networks; one generates output based on its training data, which the other model evaluates.

These models are programmed to compete against each other. The generator aims to pass its output past the discriminator, while the discriminator tries to identify artificially generated data. This contest optimizes AI outputs to be more realistic without human intervention. GANs work best for specific applications, like producing realistic photos or videos, rather than general-purpose AI.

Hallucination is the AI industry’s term for models generating incorrect information. It’s a significant issue for AI quality. Hallucinations produce GenAI outputs that might be misleading and could lead to real-life risks—such as a health query returning harmful medical advice.

The problem of AI fabricating information is thought to stem from gaps in training data. This has driven a shift towards more specialized or vertical AI models—domain-specific AIs with narrower expertise—to reduce the risk of knowledge gaps and misinformation.

Inference is the process of running an AI model, allowing it to make predictions or draw conclusions from previously seen data. However, inference can’t occur without training; a model must learn patterns from a data set before effectively extrapolating from it.

Various hardware can perform inference, from smartphone processors to powerful GPUs and custom AI accelerators. However, not all can run models equally well. Very large models would take considerable time to make predictions on a laptop compared to a cloud server with high-end AI chips.

[See: Training]

Large language models, or LLMs, are the AI models powering popular assistants like ChatGPT, Claude, Google’s Gemini, Meta’s AI Llama, Microsoft Copilot, or Mistral’s Le Chat. Interacting with an AI assistant involves a large language model processing your request directly or using various tools, such as web browsing or code interpreters.

LLMs are deep neural networks consisting of billions of numerical parameters (or weights, see below) that learn the relationships between words and phrases, creating a language representation—a multidimensional map of words.

These models encode patterns found in billions of books, articles, and transcripts. When you prompt an LLM, it generates the pattern most likely fitting the prompt.

(See: Neural network)

Memory cache is an essential process that enhances inference (the process by which AI generates a response to a user query). Essentially, caching is an optimization technique to improve inference efficiency. AI relies on complex mathematical calculations that consume power each time they’re executed. Caching reduces the number of calculations a model needs to perform by storing certain calculations for future queries and operations. Among the various kinds of memory caching, KV (or key value) caching is well-known. It operates in transformer-based models, boosting efficiency and driving faster results by decreasing the time (and computational effort) needed to generate user responses.

(See: Inference)

Model Context Protocol, or MCP, is an open standard allowing AI models to connect to external tools and data—like your files, databases, or apps like Slack and Google Drive—without requiring a developer to create a custom connector for each pairing. It’s similar to a USB-C port for AI. Anthropic introduced MCP in 2024 before passing it to the Linux Foundation. It has since been adopted by OpenAI, Google, and Microsoft, becoming one of the fastest-spreading standards in recent AI history.

Mixture of Experts is a model architecture that divides a neural network into several smaller specialized sub-networks, or “experts,” and activates only a few for a specific task. Instead of routing every request through the entire model—like calling in your whole office for every question—an MoE model has a “router” that selects the right specialists for the task. This enables the construction of large models that remain relatively fast and cost-effective since only a fraction of the network is active at any one time. Mistral AI’s Mixtral model is a well-known example; OpenAI’s newer GPT models are also believed to employ this approach, although the company hasn’t officially confirmed it.

(See: Neural network, Deep learning)

A neural network refers to the multi-layered algorithmic structure underpinning deep learning—and more broadly, the surge in generative AI tools following the development of large language models.

While the concept of drawing inspiration from the brain’s interconnected pathways for data processing algorithms dates back to the 1940s, it was the rise of graphical processing hardware (GPUs)—thanks to the video game industry—that unlocked this theory’s potential. These chips are well-suited for training algorithms with many layers, allowing neural network-based AI systems to achieve superior performance in various domains, including voice recognition, autonomous navigation, and drug discovery.

(See: Large language model [LLM])

Open source refers to software—or increasingly, AI models—where the underlying code is publicly available for anyone to use, inspect, or modify. In the AI world, Meta’s Llama family of models is a notable example; Linux is the historical parallel in operating systems. Open source approaches enable researchers, developers, and companies worldwide to build on each other’s work, accelerating progress and enabling independent safety audits that closed systems can’t easily provide. Closed source means the code is private—you can use the product but can’t see how it works, as with OpenAI’s GPT models—a distinction that has become a major debate in the AI industry.

Parallelization involves performing multiple tasks simultaneously rather than sequentially—like having ten employees work on different project parts simultaneously instead of one employee doing everything in sequence. In AI, parallelization is crucial for both training and inference: modern GPUs are designed to perform thousands of calculations concurrently, which is why they became the industry’s hardware backbone. As AI systems grow more complex and models increase in size, the ability to parallelize work across many chips and machines has become vital in determining how quickly and cost-effectively models can be built and deployed. Research into improved parallelization strategies is now a distinct field of study.

RAMageddon humorously refers to a serious trend sweeping the tech industry: an increasing shortage of random access memory, or RAM chips, which power almost all the tech products we use daily. As the AI industry has flourished, major tech companies and AI labs—competing to have the most powerful and efficient AI—are purchasing so much RAM for their data centers that little is left for others. This supply bottleneck means what’s left is becoming more expensive, impacting industries such as gaming (where companies have had to raise console prices due to memory chip shortages), consumer electronics (where memory shortages could cause the largest dip in smartphone shipments in over a decade), and general enterprise computing (as companies can’t secure enough RAM for their data centers). The price surge is expected to cease only after the shortage ends, but there’s no clear sign of that happening soon.

Like AGI, recursive self-improvement is a threshold for AI’s potential intelligence and independence from humans. In the RSI scenario, AI models begin improving themselves without human intervention, rapidly increasing their capabilities and autonomy. Some view this as a cataclysmic moment akin to the singularity—when AI models become immune to outside intervention. However, RSI also describes a basic capability—can an AI model design its own successor?—which makes it easier for engineers to attempt its construction. Several AI startups have pursued recursively self-improving models, but most dismiss the apocalyptic implications, presenting RSI as the next research frontier.

Reinforcement learning is a method of training AI where a system learns through trial and error, receiving rewards for correct answers—similar to training a pet with treats, but the “pet” here is a neural network, and the “treat” is a mathematical signal indicating success. Unlike supervised learning, where a model is trained on a fixed dataset of labeled examples, reinforcement learning allows a model to explore its environment, take actions, and continuously update its behavior based on feedback. This approach has proven particularly effective for training AI to play games, control robots, and more recently, enhance the reasoning ability of large language models. Techniques like reinforcement learning from human feedback, or RLHF, are now central to how leading AI labs fine-tune their models to be more helpful, accurate, and safe.

Human-machine communication poses clear challenges—humans communicate using natural language, while AI programs execute tasks through complex data-driven algorithmic processes. Tokens bridge this gap: they are the fundamental building blocks of human-AI communication, representing discrete data segments processed or produced by an LLM. Created through a process called tokenization, which breaks down raw text into digestible units for a language model, tokens are akin to how a compiler translates human language into binary code comprehensible by a computer. In enterprise settings, tokens also determine cost—most AI companies charge for LLM usage on a per-token basis, meaning the more a business uses, the more it pays.

Tokens are small text chunks—often parts of words rather than entire words—that AI language models break language into before processing it. These are roughly analogous to “words” for understanding AI workloads. Throughput refers to the volume processed within a given timeframe, so token throughput measures how much AI work a system can handle concurrently. High token throughput is a crucial objective for AI infrastructure teams, as it dictates how many users a model can serve simultaneously and how quickly each receives a response. AI researcher Andrej Karpathy has described anxiety when his AI subscriptions sit idle—echoing the sentiment from his grad school days when expensive computer hardware wasn’t fully utilized—highlighting why maximizing token throughput is a field obsession.

Developing machine learning AI involves a process known as training, where data is fed into a model to learn patterns and generate useful outputs. It’s the system’s response to data characteristics that allows it to adapt outputs toward a desired goal—whether identifying cat images or producing a haiku on demand.

Training can be expensive due to the large volume of inputs required, which has been increasing. Hybrid approaches, such as fine-tuning a rules-based AI with targeted data, help manage costs without starting from scratch.

[See: Inference]

Transfer learning involves using a previously trained AI model as the starting point for developing a new model for a different but related task—allowing knowledge from earlier training cycles to be reapplied.

Transfer learning can drive efficiency savings by streamlining model development. It’s also useful when data for the target model task is limited. However, the approach has limitations. Models relying on transfer learning to gain generalized capabilities will likely need additional data training to perform well in their focus area.

(See: Fine tuning)

Validation loss is a metric indicating how well an AI model learns during training—a lower score is better. Researchers closely monitor it as a real-time report card, using it to decide when to stop training, adjust hyperparameters, or investigate potential issues. A key concern it flags is overfitting—a condition where a model memorizes its training data rather than learning patterns that can generalize to new situations. Think of it as the difference between a student who genuinely understands the material and one who merely memorized last year’s exam—validation loss helps reveal which type your model is becoming.

Weights are crucial in AI training because they determine the importance (or weight) assigned to different features (or input variables) in the data used for training, shaping the AI model’s output.

In essence, weights are numerical parameters defining what’s most significant in a dataset for a given training task. They function by multiplying inputs. Model training generally starts with randomly assigned weights, but these weights adjust as the model strives to achieve an output that closely matches the target.

For instance, an AI model predicting housing prices trained on historical real estate data for a target location might include weights for features like the number of bedrooms and bathrooms, whether a property is detached or semi-detached, or whether it has parking or a garage.

Ultimately, the weights the model assigns to each input reflect their influence on a property’s value based on the dataset.

This article is updated regularly with new information.

When you purchase through links in our articles, we may earn a small commission. This doesn’t affect our editorial independence.

The only AI glossary you’ll need this year

Popular Posts

Trump Warns European Countries: ‘You’re Destroying Your Heritage’ Because ‘You Want to Be Nice’

Trump threatens to go into Nigeria ‘guns-a-blazing’ if slaughter of Christians doesn’t stop

Your Genes Could Hold the Key to Stopping Type 1 Diabetes before It Begins

Meghan Markle Criticized Over ‘Rent-a-Royal’ Ambitions

Lisa Phillips to Depart as Director of New Museum

About US

Top Categories

Usefull Links