General
- Ill-posed: an ill-posed problem is one that does not have a unique solution or whose solution is sensitive to small changes in the input. It lacks stability or robustness and may not have a well-defined solution.
- Form factor: particularly in hardware design and user interface design, form factor refers to the size, shape, and physical dimensions of a device or component. It often describes how easily a device can fit into a particular environment or how it can be handled by users.
- For example, in computing, form factor often refers to the size and shape of components such as motherboards, hard drives, or graphics cards. A device might be described as having a "small form factor" if it is designed to be compact and space-efficient, while a "rack-mounted form factor" suggests that a device is designed to fit into standard server racks.
- The term is more commonly used in discussions related to hardware design, industrial design, and user experience.
- Fault tolerance: this is the property of a system that enables it to continue operating properly in the event of the failure of some of its components. In computer science, fault tolerance is achieved through redundancy, error detection, and error recovery mechanisms.
- Ground truth: ground truth refers to the absolute truth or correct labeling of data used for training or evaluation purposes. It serves as a reference point against which the performance of algorithms or models is measured.
- Heuristics: heuristics are problem-solving techniques or rules of thumb that are used to quickly find approximate solutions when an exact solution is impractical or unknown.
- Heuristics are commonly used in algorithm design and optimization.
- No-Free-Lunch Theorem: The No-Free-Lunch (NFL) Theorem is a fundamental result in computer science and optimization theory that states that no single optimization algorithm outperforms all others across all possible problems.
- There is no universal best optimization algorithm; the performance of an algorithm depends on the specific characteristics of the problem it is applied to.
- Feature Engineering: In machine learning and data analysis, feature engineering is the process of selecting, transforming, or creating new features from raw data to improve the performance of machine learning models. Feature engineering involves identifying relevant information in the data and representing it in a way that is suitable for modeling.
LLM-specific
- Chain of Thought (CoT): A prompting technique where a language model is prompted to generate intermediate reasoning steps, not just the final answer, leading to more interpretable and potentially more accurate results.
- Retrieve and Generate (RAG): A hybrid approach that first retrieves relevant documents or knowledge and then generates a response using a language model that conditions on the retrieved information.
- Zero-shot Learning: The ability of a model to understand and perform tasks that it hasn't been explicitly trained to do, based on previously learned knowledge and context.
- Few-shot Learning: The training of models on a very small amount of data for a particular task, leveraging prior knowledge from the model's training data to generalize from these few examples.
- Prompt Engineering: The process of carefully crafting prompts to effectively elicit desired responses from a language model.
- Attention Mechanism: A neural network feature, frequently used in transformers, that allows the model to focus on different parts of the input sequentially, mimicking the selective attention humans can give to chunks of information.
- Transformer Architecture: A neural network design that relies entirely on self-attention mechanisms without recurrence, predominantly used in state-of-the-art language models.
- Tokenization: The process of converting input data into tokens that can be fed into a language model. Tokens are the atomic unit of analysis for the model, which could be words, sub-words, or characters.
- Self-Attention: Part of the transformer model's attention mechanism, where sequences align with themselves to compute representation, enabling the model to weigh the importance of different parts of the input.
- Fine-tuning: Adjusting the weights of an already pre-trained model on a new, typically smaller, dataset for better performance on a specific task.
- Language Model Pre-training: The process of training a language model on a large corpus of text without specific tasks in mind, allowing the model to learn a wide range of language patterns and knowledge.
- Autoregressive Models: A type of language model that generates text by predicting one next token at a time and conditioning on the previous ones.
- Masked Language Modeling (MLM): A training technique where some percentage of the input data is masked out and the model is tasked to predict the masked words. Used in the pre-training of models like BERT.
- Perplexity: A measurement of how well a probability model predicts a sample, often used in the context of language models to gauge their performance.
- Beam Search: An algorithm that explores a graph by expanding the most promising nodes in a limited set, often used in natural language generation tasks to find high probability sequences of words.
- Language Model Adaptation: The process of adapting a large, generic language model to a specific domain or task by additional training, often with data from that domain.
- Generation Strategies: Methods for deciding how to choose the next word when generating text from a language model, encompassing greedy decoding, beam search, nucleus sampling, etc.
- Nucleus Sampling: A text generation strategy that samples the next token from the set of most probable tokens, allowing for more diverse and human-like generated text.
- Embedding Space: A vector space where words or phrases from the vocabulary are mapped to vectors of real numbers, capturing semantic similarity, syntactic usage, and other linguistic patterns.
- Transfer Learning: Reusing a pre-trained model and adapting it to a new, but similar task by fine-tuning the model with a new dataset.