AI Glossary: Key Terms for Understanding AI -

By Eric Mersch

This AI glossary contains 100+ clear, plain English definitions of AI concepts, from foundational terms like machine learning and neural networks to emerging ideas like Generative Engine Optimization and synthetic data. Where relevant, we connect definitions to their implications for governance, compliance, and ROI.

Use this AI glossary to:

Speak the language of AI fluently in executive discussions.
Evaluate technology proposals with clarity.
Guide strategic planning, vendor selection, and risk assessment

Bookmark this as your AI reference guide. Make smarter strategic decisions, evaluate vendors with confidence, and engage in informed conversations with tech leaders.

A

Activation Function

An activation function is a mathematical operation used in neural networks to introduce non-linearity, allowing the network to learn complex patterns. Without it, the model would only learn simple, linear relationships.

Examples of non-linear relationships:

Economics: Diminishing returns – output increases at a decreasing rate.
Physics: The distance an object falls grows with the square of time.

Why it matters: Activation functions enable models to recognise complex shapes in images, detect sarcasm in language, or understand cause-and-effect patterns in data.

Algorithm

A step-by-step procedure for solving a problem or achieving a specific task. Algorithms are the foundation of all AI systems.

American Invitational Mathematics Examination (AIME)

AIME is a high-school exam and qualifier for the USA Mathematical Olympiad. AI researchers use AIME to measure AI model performance, specifically on the mathematical reasoning ability. ChatGPT-5 is reported to have scored a 94.6% on AIME 2025, ahead of other AI models such as Grok 4 (90.6%), Gemini 2.5 (85.8%), Claude Sonnet 4 (76.3%), and DeepSeek R1 (74.0%).

AI Agent

An AI agent is a system that perceives its environment, makes decisions, and takes actions to achieve goals, often autonomously.

Four key capabilities:

Perception: Collecting information via sensors, APIs, databases, or user queries
Reasoning: Deciding what to do using rules, logic, or search algorithms.
Action: Executing tasks in the physical or digital world.
Learning: Improving performance over time (see Reinforcement Learning)

Artificial Intelligence Optimization (AIO):

(See Generative Engine Optimization)

Artificial General Intelligence (AGI):

AGI is a theoretical form of AI that can understand, learn, and apply knowledge across any task at or above human-level intelligence. Unlike today’s specialized AI, AGI would be adaptable to any domain.

Key capabilities required for AGI:

General learning: Mastering knowledge across diverse contexts.
Transfer learning: Applying lessons from one domain to another.
Autonomous reasoning: Making decisions and planning independently.
(Debated) Self-awareness: A controversial fourth criterion, akin to science fiction concepts like Skynet in the Terminator movie franchise.

Timeline: Experts estimate AGI could emerge between 2040 and 2060. Until then, AI remains in the realm of Artificial Narrow Intelligence.

Artificial Narrow Intelligence (ANI) or Weak AI

A type of artificial intelligence that is designed and trained to perform a specific task or a narrow set of functions. ANI lacks the general reasoning skills of humans or AGI systems.

Examples:

Image recognition software
Fraud detection systems monitoring credit card transactions

Attention Mechanism

A neural network technique that allows the model to focus on different parts of its input when making predictions.

Example: In language translation, attention helps the model link each word in the source sentence to its correct counterpart in the target language.

B

Backpropagation

A method used to train artificial neural networks by adjusting weights to minimize the error between predicted and actual outputs.

Why it matters: Backpropagation enables the iterative learning that makes AI models more accurate over time.

Bayesian Network

A probabilistic graphical model that represents a set of variables and their conditional dependencies using a directed acyclic graph.

Example: In finance, a Bayesian network might model the probability of default based on interconnected risk factors like market volatility, credit history, and interest rates.

Bias (Three Definitions)

Bias Definition #1 – AI Output Bias

Errors in an AI system that favor certain outcomes due to flawed or incomplete training data or model design.

Common types:

Historical Bias: Past prejudices in training data (e.g., biased hiring data).
Sample Bias: Training data not representative of the real-world population.
Measurement Bias: Using poor proxies (e.g., healthcare costs as a measure of need).
Aggregation Bias: Combining dissimilar groups inappropriately.
Confirmation Bias: Favoring data that supports pre-existing beliefs.
Algorithmic Bias: Model design unintentionally favors certain results.

Bias Definition #2 – Technical Parameter:
A constant added to the weighted sum of inputs before applying an activation function. It shifts the output curve, helping the model fit real-world data more closely.

Bias Definition #3 – Content Summarization Bias

AI models tend to selectively prioritize and reframe specific details over others when summarizing content in a generative search answer, which can lead to potentially distorting emphasis, omitting key facts, or reducing brand visibility. We aim to structure content in a way that minimizes the negative impact of biases, which can significantly affect how much value (traffic, brand credit, authority) you gain from the citation. Below are some common content summarization biases:

Omission bias: The AI model may leave out brand names or unique qualifiers to make the summary shorter or more general.
Framing bias: The AI may highlight secondary points over the primary ones you wanted to emphasize as it tries to optimize for a user query.
Attribution loss: The AI might paraphrase content without explicitly mentioning the source.
Neutralization: The AI model may object to distinctive or opinionated language in your content and may respond with neutral, generic statements that neutralize your branding.

Black Box Conundrum

The challenge of understanding how complex AI systems, especially deep learning models, reach their decisions is significant. Inputs and outputs are visible, but the reasoning in between is often opaque.

Relevance: In regulated industries like finance, the inability to explain AI decisions can create compliance and governance risks.

C

Chain-of-Thought (CoT) Prompting

A prompt engineering method in artificial intelligence that improves the reasoning abilities of large language models (LLMs) by encouraging them to generate intermediate reasoning steps.

Chatbot

A software program that simulates human conversation using AI, particularly natural language processing (NLP) and machine learning. It understands and responds to text or voice inputs in a manner that mimics human interaction.

The most prominent examples of chatbot companies are:

OpenAI – ChatGPT
Anthropic – Claude
Google – Gemini (formerly Bard)
Perplexity AI – Perplexity Assistant (or just “Perplexity”)
Microsoft – Copilot
Mistral – Le Chat
Amazon – Q
xAI – Grok
DeepSeek Chat

FLG Partner Insights:

Closed Models

LLMs that restrict access to weights or training parameters are usually accessible only via paid APIs. The software industry equivalent would be proprietary code. Closed models differ from open-weight models, which make the weights publicly available and allow for downloading and customization. Open-weight models make the code and weights available.

Common Crawl

An open-source project that has been collecting scraped web data for 12 years, accumulating petabytes of data. It serves as a significant source of data for AI training. The current volume of Common Crawl data exceeds 250 billion web pages, equivalent to between 100 trillion and 500 trillion Tokens.

Convolutional Neural Network (CNN)

A type of neural network particularly effective for image and video recognition tasks.

Curation, or Human-in-the-Loop (HITL)

A process where humans review and improve data quality rather than relying solely on automated collection. This ensures accuracy, diversity, and relevance in AI training datasets.

D

Deep Learning

A subset of machine learning that uses multi-layered neural networks to identify patterns in large datasets.

Example: Fraud detection in banking by learning from millions of transaction patterns.

Deep Web

Internet content that is not indexed by search engines, such as academic databases and private archives.

Examples: BloombergGPT was trained on proprietary financial datasets, a type of deep web data. LexisNexis developed Lexis +AI using its massive, proprietary legal databases, including case law, statutes, and regulatory documents.

Decision Tree

A predictive model that uses a tree-like graph of decisions and their possible consequences.

Distillation

A process where a smaller, more efficient AI model is trained to replicate the behavior of a larger, more complex one.

Benefits include lower inference costs and energy usage comparable to those of an LLM. In the distillation process, a high-performing and resource-intensive model (the teacher) is used to generate outputs based on inputs. The student model is then trained to mimic these teacher outputs by matching the full probability distribution, allowing the smaller model to learn both decision-making patterns and uncertainty.

There is some evidence that the DeepSeek model used OpenAI’s LLM as a teacher. Notes: One, Distillation is an entirely different process compared to Fine Tuning. Two, the Distillation process is also referred to as Transfer Learning.

E

Elo Score

A numerical rating system used to compare performance, originally for chess, now applied to AI model benchmarking. It was initially designed for chess by Hungarian-American physicist Arpad Elo, and used in comparing which AI models outperform others on average.

Ensemble Learning

A technique that combines predictions from multiple machine learning models to improve accuracy and robustness.

Explainable AI (XAI) (Interpretable AI)

Techniques that make AI decisions understandable to humans, addressing the black box conundrum. Especially important in sectors like finance and healthcare, where regulatory compliance depends on transparency. Enterprise XAI models in use are IBM’s AI Explainability 360, Google’s What‑If Tool, and Microsoft’s InterpretML.

Feature Extraction

The process of transforming raw data into a set of meaningful features that machine learning algorithms can use.

Example: In credit risk analysis, extracting features like income, payment history, and debt-to-income ratio to train a loan approval model.

Few-Shot Prompting

A prompt engineering technique that involves adding examples within the prompt to demonstrate the desired format or style, guiding the AI to produce similar outputs.

Fine-Tuning

The process of taking a pre-trained model and continuing its training, typically using labeled, task-specific data, to specialize it for a new domain or function. The architecture and parameter count remain unchanged, but the model’s weights are increased to learn new terminology, style, or domain-specific knowledge. Thus, the fine-tuned model retains the original model’s power but can surpass its general accuracy on the target task.

Example: Fine-tuning a general-purpose LLM with financial reports to make it better at summarizing quarterly earnings calls.

Floating-Point Operation (FLOP)

A mathematical calculation.

Example: The product of two numbers is a FLOP, as shown here: 3.14159 × 2.71828 = 8.53973. This multiplication is one FLOP.

FLOPS per second (FLOPs/S)

A compute capacity metric that measures the number of mathematical operations a processor can perform every second. This sometimes presents as GFLOPS, TFLOPS, PFLOPS, or EFLOPS, which correspond to billions, trillions, quadrillions, or quintillions of FLOPs per second, respectively. It’s important to note that FLOPs/S is merely the theoretical maximum performance and that real-world performance will be lower.

FLOPs per Watt (FLOPs/W)

An energy efficiency metric that CFOs use to estimate compute expense. The most energy-efficient models operate in the single-digit Tera FLOPs per watt range. Nvidia’s H100 operates at 1.4 Tera FLOPs per Watt.

Frontier Math

A benchmark used to evaluate AI’s performance in advanced mathematics once models reach the performance ceiling of earlier tests like the Massive Multitask Language Understanding (MMLU).

FontierMath by Epoch AI comprises approximately 300 original math problems, covering the most significant branches of the subject. Half of the problems require a graduate-level education in math to solve, while the most challenging 25% of problems consist of the most advanced problems from the specific topic, meaning only today’s top experts could crack them, and even over multiple days.

Fuzzy Logic

A mathematical system that deals with reasoning that is approximate rather than precisely defined.

G

Generative AI (GenAI)

A form of artificial intelligence technology that creates new content by understanding the underlying patterns in the training data and then using this knowledge to generate new, original data points.

Example: Automatically generating financial commentary based on raw market data.

Generative Adversarial Network (GAN)

An AI process that utilizes two neural networks that compete against each other to improve their output over time. One generates data (the generator), and the other evaluates it (the discriminator), pushing each other toward better performance.

Generative Engine Optimization (GEO)

A methodology used to enhance a company’s visibility of its digital assets for AI-based web searches. With the increasing use of LLMs by consumers for web searches, GEO enables companies to ensure that their web-based information appears in search results. GEO is similar to Search Engine Optimization (SEO) for Google-based searches. Also referred to as Artificial Intelligence Optimization, or AIO.

Example: Updating governance policies and investor materials so they appear prominently when a CFO or board member queries an AI system about your company.

FLG Partner Insights: Why CFOs Must Rethink ROI in the Age of AI Search

Gradient Descent

An optimization algorithm used to minimize some function by iteratively moving in the direction of steepest descent as defined by the negative of the gradient.

H

Hallucination

When an AI generates false or misleading information, but presents it as fact.

Examples: Fabricated references, incorrect facts, and imaginary concepts, or an AI citing a non-existent financial report as evidence.

Jules White, Senior Advisor to the Chancellor on Generative AI & Professor of Computer Science at Vanderbilt University, disagrees with the term hallucination because the LLM is not making up facts, but rather trying its best to answer questions with the data available. He provides a metaphor that the hallucination response is like a college student writing information that he or she knows but is not relevant to the test question, to avoid leaving the answer blank.

Holistic Evaluation of Language Models (HELM)

A benchmark measuring AI performance across multiple dimensions, including accuracy, bias, and efficiency.

Heuristic

A practical problem-solving method that is “good enough” for immediate goals but not necessarily optimal.

Humanity’s Last Exam (HLE)

A challenging benchmark with 2,500 questions across 100+ subjects, designed to test the limits of AI reasoning.

As of this writing in the summer of 2025, the best LLMs that achieve more than 80% on the MMLU only achieve accuracies of 2% to 20% on the HLE.

Human-in-the-Loop (HITL) Curation

See Curation — involves human oversight in AI data training or decision-making.

I

Indexed Web Data

The publicly available portion of the internet that search engines crawl and store for fast retrieval.

The total volume of data is estimated at 500 trillion tokens. This data set can be defined by the following categories: internet population (general webpages, blogs, articles), books, reference works, and encyclopedias, academic papers (e.g., arXiv, PubMed), social media and forums (Twitter, Reddit, Quora, etc.), code repositories (GitHub, StackOverflow), and other structured or niche sources (news archives, legal documents, product data). By media type, approximately 40% of the data consists of text, 40% images, and 20% videos. Media type is important because text is the primary source of Tokens for AI training.

Inference

The process of using a trained AI model to make predictions or decisions on new data.

Example: Predicting a company’s quarterly earnings based on recent performance indicators.

Instance-Based Learning

A type of machine learning where the model stores instances of previously seen data and makes predictions based on similarity to these instances.

Internet Population

The total volume of human-generated text available online, both indexed and deep web. Measured in tokens.

The current internet population is estimated at 3,100 trillion tokens with an annual growth estimate of between 5% and 15%. This estimate is the sum of data from the indexed web and the deep web, although not all of this data meets the quality requirements for AI training. This estimate does not include several sources of data. The stocks of images, estimated at 300 trillion tokens, and video, at 1,350 trillion tokens, are excluded from peak data calculations because AI data training sets are primarily text-based. Images and videos are essential for training multimodal models, but are less critical to achieving AGI. The estimate does not include AI-generated data, which is expected to comprise an increasingly large portion of the internet population data but is currently a small component of internet data.

Interpretable AI

See Explainable AI (XAI) – methods that make AI’s decision-making process understandable to humans.

K

Knowledge Cutoff

The latest point in time the AI model was trained on impacting whether your content is known. Estimates for the main GPT‑5 model Knowledge Cutoff Data centers around October 1, 2024, while its smaller variants (like “mini” and “nano”) have an earlier cutoff, estimated at around May 31, 2024. This means that GPT-5’s understanding reflects the world as of mid-late 2024. Anything that occurred or was published after that, such as events, developments, or new content, is not part of its internal training. As of this writing, Anthropic’s Claude Sonnet 4, Claude Opus 4, and Claude Opus 4.1 models were trained on data up to March 2025, but the company states its reliable knowledge cutoff is the end of January 2025.

K‑Shot Learning

A training approach where a model learns to perform tasks with only a small number (k) of examples per class.

This contrasts with traditional settings that require thousands of examples per class. Related concepts include one-shot learning (a case where the model learns from a single example), few-shot learning, where the model learns from fewer than ten examples, and meta-learning, or “learning to learn,” where the model improves its own learning processes by learning from previous learning experiences.

L

Large Language Models (LLMs)

An AI system trained on vast amounts of text to understand and generate human language. LLMs are also referred to as foundation models because they can be distilled into smaller models, known as Small Language Models (SLMs), which can be run with fewer compute resources.

Examples: LLMs can automate due diligence, summarize financial reports, and draft governance documents.

Learning Rate

A hyperparameter in machine learning that controls how much to adjust the model in response to the estimated error each time the model weights are updated.

Linear Regression

A statistical method for modeling the linear relationship between a dependent variable and one or more independent variables.

Example: Predicting revenue growth based on historical sales data.

LMArena (formerly Chatbot Arena)

A platform that hosts anonymous, crowd-sourced, head-to-head AI model comparisons and calculates Elo scores from these results. As of August 2025, ChatGPT-5 is the leader with a 1481 Elo Score. You can keep track with the LMArena leaderboard. See Elo Score

M

Machine Learning (ML)

A subset of Artificial Intelligence that focuses on developing algorithms and statistical models that enable systems to automatically improve their performance on a specific task by learning from data, without being explicitly programmed.

Example: Detecting fraudulent transactions by learning from historical fraud cases.

Massive Multitask Language Understanding (MMLU)

A benchmark that was created to assess the performance of large language models (LLMs). MMLU challenges AI systems with 15,908 multiple-choice questions spanning 57 diverse subjects, including mathematics, history, law, medicine, and more, and provides a grade representing accuracy. Human domain experts can achieve an accuracy rate of 89%. By mid-2024, the best LLMs had achieved an accuracy of 88%, and ChatGPT-o1 scored 92.3%.

Going forward, new benchmarks, such as FrontierMath, Humanity’s Last Exam (HEL), and SWE-bench, are emerging with more complex, real-world tasks that will be needed to assess LLM performance.

Model Context Protocol (MCP)

An open standard introduced by Anthropic in November 2024. It defines how AI models (clients) can connect to external tools or data sources like APIs, databases, web scrapers, or even crawlers using a consistent, interoperable framework. Think of MCP like a USB-C port for AI: it standardizes how AI connects with everything else, whether it’s Google Drive, Slack, GitHub, a calendar API, or a web crawling service, without writing custom connectors for each case.

Mixture-of-Experts (MoE)

An AI architecture where multiple specialized models (experts) handle different parts of a task.

Engineers train each expert sub-model to specialize in a particular subset of the input data or a specific function, enabling the overall model to manage complex tasks more efficiently. MoE does not incorporate human curation; it’s purely a machine function. ‘

Model Routing

Theprocess of deciding which model (or set of models) should handle a given request, based on factors like, task type e.g., routing text classification requests to a classifier, and image tasks to a vision model; Complexity e.g., sending easy queries to a smaller, cheaper model, and harder ones to a larger, more capable model; Domain specialty e.g., routing legal text to a law-specialized LLM, and medical text to a healthcare-tuned model; Performance constraints e.g., picking the fastest model that still meets accuracy requirements for the request.

ChatGPT-5 is the most prominent example. It consists of several models that optimize for cost efficiency, response speed, and quality.

Multimodal

AI that can process multiple types of input, such as text, images, and audio, in a single model.

This capability enables AI to understand and generate information across various formats, much like humans use their different senses to perceive and interact with the world.

N

Neural Network

A system of interconnected nodes (“neurons”) that processes data in layers, inspired by the human brain.

The number of neurons per neural network is based on the number of data attributes in a data set.

Example: A dataset with 10 attributes (e.g., age, income, etc.) will have 10 input neurons.

Neuron

A single computational unit in a neural network that processes inputs and passes outputs to the next layer.

First, it receives a set of inputs from the network. The second task is multiplying each input by a weight, which is a measure of the input’s importance (see Weights). Then the neuron applies a bias (see Bias (Definition 2)), which is a neuron-generated input to the process that helps the output data more closely resemble real-life data. Finally, the neuron applies an activation function, which provides the final output from this four-task process. See Activation Function.

Natural Language Processing (NLP)

A subfield of AI that deals with the interactions between computers and humans in natural language. NLP involves the development of algorithms, models, and systems that can understand, interpret, and generate human language in a way that is meaningful and useful.

O

Overfitting

A modeling error in statistical learning where a function is too closely fit to a limited set of data points.

Open-Weight Models

AI models that make their learned parameters (“weights”) available for download and customization.

Think of these as open-source software. These models can be downloaded for offline use and customized for proprietary applications. Open-Weight Models differ from Closed Models, which restrict access to weights and are usually accessible only via paid APIs. Open-source models differ from Open-Weight and Closed Models by releasing the model architecture, training code, and often dataset details, in addition to the weights. Open Weight Model examples include Meta’s LLaMA and Mistral 7B/Mistral 8x7B, widely used by developers and researchers; DeepSeek R1, a distillation model developed by Chinese researchers; and OpenAI’s Open Weight Model, announced in August of 2025, which is expected to enable self-hosting and offline adaptation.

P

Parameter

A value that the model learns during training to improve performance.

There are two types of parameters: weights, which determine the influence one neuron has on another, and bias, which enables the model to learn. Parameter count is a proxy for an AI model’s performance. The greater the number of parameters a model has, the more capacity it has to model complex patterns. Of course, the same dynamics exist in resource intensity, meaning that the higher the parameter count, the greater the need for compute and data. The number of parameters typically defines large language models. As examples, ChatGPT-3 has 175 billion parameters while ChatGPT-4 is estimated to have over 1 trillion parameters. See Weights.

Peak Data

The point at which AI has been trained on all available high-quality human-generated data, thereby creating limits on future training improvements.

Currently, AI models have been trained on a small percentage of the total stock of high-quality, human-generated data. According to the publication What’s in GPT-5? A Comprehensive Analysis of Datasets Likely Used to Train GPT-5 by Alan D. Thompson of LifeArchitect.ai, ChatGPT-5 was trained on an estimated 70 trillion tokens, which is about 20% of all high-quality, human-generated data, a significant increase in the training data sets of recent models.

However, forecasts suggest that data for continued training will increase exponentially and that we will run out of data by a median year of 2028, with a range of 7 to 2 years.

Perceptron

A simple algorithm for binary classification, considered the foundation of neural networks.

Prompt Context Window

The maximum amount of text (measured in Tokens) that an AI model can process at one time when generating a response.

It includes everything sent to the model in a single interaction, i.e., the user’s input, any prior conversation history, and system or developer instructions, plus the model’s internal formatting. We define size with the metric, window. A small window model such as GPT-3.5 has ~16k tokens, which equates to about 12,000 words. A large window model, such as GPT-4o and Claude Opus 4, can handle up to 200k–300k tokens, which equates to about 150,000–225,000 words. So, if a user uploads a 100-page PDF into a model with a 32k token limit, content at the end may be ignored by the AI model in responding to the query.

Prompt Engineering

Crafting inputs to guide AI toward accurate, relevant, and consistent responses.

See Chain-of-Thought (CoT), Few-Shot, Zero Shot, and Role Assignment prompting for examples. Some of the prompts I use help me extract financial data and SaaS metrics from PDFs and create an Excel file. I detail how in this video:

R

Retrieval-Augmented Generation (RAG)

An AI technique that combines a language model with an information retrieval system to improve accuracy and relevance.

Process:

Retrieval — Find relevant external information.
Augmentation — Add this information to the user’s query.
Generation — Produce a final response using both the model’s training and the retrieved data.

Example: A due diligence tool that pulls the latest SEC filings before generating a company risk profile.

Reinforcement Learning (RL)

A learning method where an AI agent interacts with an environment, receiving rewards for good actions and penalties for bad ones.

Example: Training an autonomous trading bot to optimize returns while managing risk exposure.

Recurrent Neural Network (RNN)

A type of neural network designed to recognize patterns in sequential data, such as time series, speech, text, or video, by maintaining a memory of previous inputs through recurrent connections. This means LLMs can take into account not just the current input, but also data “learned” in prior sessions.

Example: Forecasting quarterly earnings based on past financial trends.

Responsible AI (RAI)

A framework for designing, building, and utilizing AI systems that align with human values and societal norms, ensuring fairness, transparency, accountability, privacy, reliability, and ethical compliance throughout the AI lifecycle.

Role Assignment

A prompt engineering technique in which the user instructs the AI to assume a specific role (e.g., “You are a Wall Street analyst”), allowing it to tailor responses to fit particular contexts or audiences. I use this prompt to have the LLM chatbot review earnings call transcripts for my comparable company set.

Small Language Model (SLM)

A type of natural language processing (NLP) model with a relatively small number of parameters compared to larger, more complex models. Compared to LLMs, SLMs require fewer computational resources, generate responses more quickly, and are easier to deploy. SLM performance has increased rapidly over the past three years and now approaches that of LLMs, making SLMs a viable alternative.

Spider 2.0

An evaluation framework designed to assess the performance of language models on real-world, enterprise-level text‑to‑SQL tasks. While large language models have conquered knowledge work in mathematics, coding, and reasoning, text-to-SQL remains stubbornly difficult. Venture Capitalist, Tomasz Tunguz, describes the challenges of coding SQL in his August 2025 article, Why AI Can’t Crack Your Database.

Supervised Fine-Tuning (SFT)

A training process in which a pre-trained model is further trained using supervised learning on a specific labeled dataset.

SWE-Bench (Software Engineering Benchmark)

A successor to MMLU, evaluating whether LLM-based agents can autonomously resolve real-world GitHub issues.

Synthetic Data

Artificially generated data that mimics real-world datasets but contains no actual personal records. The use of synthetic data may be increasing as the industry approaches peak data.

Example: Using synthetic transaction data to train fraud detection models without exposing customer information.

Surface Web (Indexed Web)

The portion of the internet that is publicly accessible and indexed by standard search engines, such as Google, Bing, and Yahoo. This means that any web page one can find through a typical search engine query,such as news sites, blogs, e-commerce platforms, and social media,is part of the Surface Web.

SWE-bench (Software Engineering Benchmark)

A performance benchmark used by AI researchers to evaluate an AI model’s ability to fix real bugs or implement missing functionality in real open-source software projects. SWE-bench is to coding what AIME is to math reasoning. The top-performing models as of August 17, 2025, are Claude Sonnet 4 (65%), Grok 4 (56.4%), and Gemini 2.5 (46.8%). ChatGPT-5 has not yet been tested. See AIME

T

Token

A unit of text processed by an AI model, often representing a word, subword, or character.

Tokens are a good way to measure data processing because LLMs operate on sequences of tokens. Each token is converted into a numerical vector through an embedding process, allowing the model to learn patterns and relationships within the data. Additionally, measuring data in tokens provides a standardized metric to assess computational requirements and calculate costs. I think of a Token as the atomic unit of LLM activity.

Transfer Learning

A machine learning method where a model trained on one task is reused as a starting point for a model on a second task. See Distillation.

Transformer

The LLM’s language processing capability takes the context of the query into account to comprehend the entire text. In technical terms, the transformer architecture allows the model to weigh the importance of different words in an input sequence, regardless of their position. The “T” in ChatGPT stands for Transformer.

U

Underfitting

A situation in machine learning where a model cannot capture the underlying trend of the data. As a result, the LLM performs poorly on both the training set data.

V

Validation

The process of evaluating a machine learning model on a separate dataset (validation set) to tune hyperparameters and prevent overfitting.

Vectorization

The process of converting data into numerical vectors that machine learning algorithms can process. Retrieval-Augmented Generation (RAG) requires input data to be vectorized.

Vibe Coding

An AI-assisted software development style popularized by Andrej Karpathy in early 2025. It describes a collaborative human-chatbot-based approach to creating software, where the developer describes a project or task to a large language model (LLM) and then evaluates the result, asking the LLM for improvements. Andrej describes the approach as “fully giving in to the vibes, embracing exponentials, and forgetting that the code even exists.”

W

Weak Learning

A learning process where each model performs slightly better than random guessing; multiple weak learners can be combined into a stronger model.

Weights

Numerical values in a neural network that determine the importance of each input in generating the output.

Example: When a user writes a prompt, each input is multiplied by a weight that the LLM assigns based on prior training. In effect, the LLM estimates the importance of each input, which produces a more accurate response. Mathematically, as an overly simplistic example, the LLM response to a query will look like this: response = (weight1 × input1) + (weight2 × input2) + bias. Note: In this context, “bias” refers to a constant term added to the weighted sum of the inputs in a neural network calculation. See definitions for Bias (Definition 2); Activation Function, Neuron.

Word Embedding

A technique for mapping words into vector space so that semantic relationships between words are preserved.

A technique used to represent words in a dense vector space where the distance and direction between vectors reflect the similarity and relationships among the corresponding words. The idea is to train the LLM to learn relationships between words. For example, “king” and “queen” or “dog” and “cat” will have similar vectors.

Z

Zero-Shot Chain of Thought (Zero-Shot CoT)

A prompting method that tells the AI to reason step-by-step without seeing examples first, improving logical consistency.

***

This glossary isn’t just a reference. Leverage it as a foundation for informed decision-making. Bookmark it! Use it to guide conversations with your teams, assess emerging technologies, and shape thoughtful, forward-looking strategies.

As AI evolves, new terms and techniques will emerge. Keep this glossary updated and revisit it often to ensure your knowledge remains current. In a market where AI capabilities can change quarterly, fluency in the language of AI is a strategic advantage.

About the Author

Eric Mersch has 25 years of finance experience in the technology industry, including CFO roles at public companies and numerous venture capital and private equity portfolio companies. He has worked with over 40 different SaaS companies and compiled his experience into his book, Hacking SaaS—An Insider’s Guide to Managing Software Business Success. His goal in writing the book is to educate SaaS professionals, thus shortening the apprenticeship of those new to SaaS.

His book, Hacking SaaS – An Insider’s Guide to Managing Software Success, is available on Amazon.

Eric Mersch

Eric Mersch has over 25 years of executive finance experience including twice serving in public company Chief Financial Officer roles. Eric is an equity partner at FLG Partners where he works as an Interim CFO to venture and private equity portfolio companies, specializing in Strategy and Operations, Strategic Planning, Equity…Read More

Sign Up for Periodic Updates

AI Glossary: A Guide for Financial Leaders to the Language of Modern Intelligence