LLM Glossary:
Adversarial Examples: Inputs purposefully crafted to mislead an LLM, designed to expose potential weaknesses
Agents: In the LLM world, an agent is the AI entity embodied by the language model, able to interact with its environment and carry out tasks.
Attention Mechanism: A technique allowing LLMs to focus on specific parts of the input, helping them process long and complex sequences.
Autoregressive: A type of model that produces text one word or token at a time, conditioned on the previously generated output.
Backpropagation: The core algorithm for training neural networks, where errors from the output are used to adjust weights and biases within the network.
Beam Search: A decoding strategy that explores multiple possible output sequences at each step, aiming to find high-quality generations
Bias: Unintended tendencies in an LLM that can lead to unfair or harmful outputs, often reflecting biases present in the training data.
BPE (Byte Pair Encoding): A data compression technique used for tokenizing text, which is useful for handling rare words.
Calibration: Making sure an LLM's confidence scores align with the actual likelihood its responses are correct.
Common Crawl: A massive dataset of web-scraped text, often used to train LLMs.
Context Window: The amount of previous text an LLM considers when making predictions.
Cosine Similarity: A measure of similarity between two vectors, frequently used to compare word embeddings or text representations.
Cross-entropy Loss: A common metric for evaluating language models, measuring the difference between predicted and actual word distributions.
Data Augmentation: Techniques used to increase the size and diversity of a training dataset by creating variations of existing data points.
Dataset: A collection of data used to train or evaluate a machine learning model.
Decoder: In a transformer model, the component that generates text based on the encoder's output.
Embedding: Dense numerical representations of words or tokens that capture their semantic meaning and relationships.
Encoder: In a transformer model, the component that processes input text and creates a contextualized representation.
Evaluation: The process of assessing the performance of an LLM using different metrics.
F1 Score: A metric combining precision and recall, often used to evaluate information retrieval or question answering tasks.
Factuality: An LLM's ability to produce text that is truthful and supported by evidence.
Fine-tuning: Adapting a pre-trained LLM to a specific task by training it on additional data.
Generation: The process of an LLM producing text output.
GPT (Generative Pre-trained Transformer): A series of powerful autoregressive language models developed by OpenAI, renowned for their impressive text generation abilities.
Hallucination: When an LLM generates text that is nonsensical or factually incorrect.
Inference: The process of using a trained LLM to generate text, translate languages, or answer questions.
KL Divergence (Kullback-Leibler): A measure of how one probability distribution differs from another, used in machine learning to assess model performance.
Knowledge graph: A structured representation of facts and relationships between entities, which can be used to augment LLM training.
Labeling: The process of annotating data with correct answers or categories for supervised learning.
LaMDA: Google's factual language model, known for its conversational abilities.
Large Language Model (LLM): A powerful neural network trained on massive amounts of text data, capable of generating text, translating, writing different kinds of creative content, and answering your questions in an informative way.
Masked Language Modeling (MLM): A training objective where the model learns to predict masked (hidden) tokens within a text sequence.
Meta-learning: A technique where a model 'learns to learn', enabling it to quickly adapt to new tasks
Model Architecture: The structural design of a neural network, including the types and configurations of layers used.
Multi-Task Learning: Training a model on several related tasks at once, with the goal of improving performance on each.
Natural Language Generation (NLG): The process of generating coherent, natural language text based on some input.
Natural Language Processing (NLP): The field of study focused on the interaction between computers and humans through natural language.
Natural Language Understanding (NLU): The ability of a machine to understand and interpret human language as it is spoken or written.
Neural Network: A computational system inspired by the network of neurons in the brain, designed to recognize patterns and solve complex problems.
N-gram: A contiguous sequence of n items (e.g., words) from a given sample of text or speech, used in statistical language models.
Overfitting: When a model learns the training data too well, including its noise and outliers, resulting in poor performance on new data.
Parameter: A variable of the model that the training process optimizes.
Perplexity: A measurement of how well a probability model predicts a sample, often used in language models to assess their quality.
Pre-training: The process of training a model on a large dataset before fine-tuning it on a specific task, to transfer knowledge and improve performance.
Prompt Engineering: The art of designing prompts to effectively interact with LLMs, optimizing the quality of their outputs.
Quantization: A technique to reduce the precision of the model's parameters, aiming to decrease its size and speed up inference times.
Recall: A metric that measures the ability of a model to identify all relevant instances in the dataset.
Recurrent Neural Network (RNN): A type of neural network where connections between nodes form a directed graph along a temporal sequence, enabling it to exhibit temporal dynamic behavior.
Reinforcement Learning: A type of machine learning where an agent learns to make decisions by performing actions and receiving feedback in the form of rewards or penalties.
Self-Attention: A mechanism in transformers that allows each position in the input sequence to attend to all positions in the previous layer of the model.
Semantic Analysis: The process of understanding the meaning and interpretation of words, phrases, and sentences in their context.
Sequence-to-Sequence (Seq2Seq) Models: Models that take a sequence of items (e.g., words) as input and produce another sequence of items as output, commonly used in translation and summarization.
Supervised Learning: A machine learning approach where the model is trained on a labeled dataset, learning to predict the labels from the input data.
Tokenization: The process of converting text into tokens, which are small pieces of the text, such as words or phrases.
Transfer Learning: A technique where a model developed for one task is reused as the starting point for a model on a second task.
Transformer: A type of model architecture that relies on self-attention mechanisms to process sequences of data, foundational to most current LLMs.
Unsupervised Learning: A type of machine learning that looks for previously undetected patterns in a dataset without pre-existing labels.
Vocabulary: The set of unique words, tokens, or characters that a model can recognize and generate.
Weight: A parameter within a neural network that transforms input data within the model's layers.
Zero-Shot Learning: The ability of a model to correctly perform tasks it has not explicitly been trained on, using knowledge gained during training.