Lexicon

Actuator

A component that moves or controls a mechanism in response to input from a controller.

AdaptiveAI™

An AI system that creates a Configuration Space within Inference Engines so that Inference and Training outputs can be manipulated in real-time to meet performance and energy goals. Functionally, AdaptiveAI first makes inference engines configurable and then makes them computationally Self Aware, using Goals as First-class objects under software control.

AI Inference Optimization

The process of improving the speed, accuracy, or resource efficiency of machine learning inference tasks.

Algorithmic Complexity

A measure of the computational resources required for an algorithm to solve a problem.

Approximate Computing

A computing technique that trades accuracy for reduced resource consumption, improving energy efficiency and performance.

Artificial Intelligence (AI)

The simulation of human intelligence in machines, allowing them to perform tasks such as learning, reasoning, and problem-solving.

Asynchronous Processing

Execution of tasks independently and without waiting for others, improving performance in distributed systems.

Backpropagation

An algorithm in training neural networks that adjusts the weights by propagating errors backward through the network.

Bandwidth

The amount of data that can be transferred over a network or system connection in a given time.

Bayesian Model

A statistical model based on Bayes’ theorem, updating probability estimates as new evidence is available.

Big.LITTLE Architecture

A processor architecture combining high-performance and energy-efficient cores to balance power and performance.

Cache Coherency

Ensures consistency across shared resources like CPU caches, keeping data updated across processors.

Cache Miss

Occurs when the data requested by the processor is not found in the cache, causing delays.

CALOREE

A Seqitur™ system optimizing resource usage in heterogeneous mobile architectures by dynamically adjusting power and performance.

Cloud Computing

The delivery of computing services (e.g., storage, databases) over the internet, allowing on-demand resource access.

Concurrency

The simultaneous execution of multiple tasks or processes, improving system performance.

Concurrency Control

Ensures multiple tasks or transactions can occur simultaneously without causing conflicts or errors.

Congestion Control

Managing network traffic to prevent overloading and ensure efficient data transmission.

Configuration Space

The set of all possible system configurations that can be adjusted to optimize performance or energy use.

Constraint

A limit or requirement placed on a system’s behavior, such as energy consumption or accuracy.

Control Theory

A branch of engineering that deals with the behavior of dynamic systems, using feedback loops to maintain desired system states.

Core

The processing unit of a CPU that performs computational tasks; multicore processors have multiple cores for parallel processing.

Core Affinity

Binding a specific process to a particular core in a multicore processor, optimizing task performance.

Cost-Performance Trade-off

The balance between minimizing costs and maximizing system performance through resource allocation.

Cross-layer Optimization

An optimization approach considering multiple system stack levels (hardware, software, application) to achieve the best results.

Data Center

A facility housing computing systems and associated components for large-scale data processing.

Data Parallelism

A parallel computing technique where the same operation is applied to multiple data elements simultaneously.

Decision Tree

A machine learning model that predicts or classifies data using a tree-like structure of decisions.

Deep Learning

A subset of machine learning using multi-layered neural networks to model complex patterns in large datasets.

Distributed Learning

A method where machine learning tasks are spread across multiple machines to scale up training or inference.

Distributed System

A system in which multiple computers work together, sharing resources and tasks across a network.

Dynamic Configuration

Adjusting system settings (e.g., power, performance) during operation to optimize resource usage and performance.

Dynamic Optimization

The continuous adjustment of system parameters to maintain optimal performance during changing conditions.

Dynamic Power Management

Adjusting the power usage of a system in real-time to balance energy efficiency and performance.

Dynamic Task Allocation

The ability to distribute and reassign tasks among system resources in real-time based on workload changes.

Dynamic Voltage and Frequency Scaling (DVFS)

A power management technique that adjusts voltage and frequency to reduce energy consumption.

Edge Computing

A computing model where data processing occurs near the source of data, reducing latency by processing closer to users.

Elastic Computing

A system’s ability to automatically scale resources up or down based on workload demands.

Embedded System

A computing system dedicated to specific tasks within larger systems, often with real-time constraints.

Energy Budget

The total amount of energy allocated for use by a system, typically a key factor in mobile or low-power devices.

Energy Efficiency

Achieving performance objectives while minimizing energy consumption, essential in power-constrained systems.

Energy Harvesting

Capturing and storing energy from external sources like solar or kinetic to power devices.

Energy-Proportional Computing

A design principle where energy consumption scales in direct proportion to system utilization.

Energy Scaling

Adjusting a system’s energy consumption based on workload or performance requirements to improve efficiency.

Energy-Quality Trade-off

The balance between reducing energy consumption and maintaining the quality of computational outputs.

Exascale Computing

High-performance computing systems capable of performing at least one exaFLOP (billion billion calculations per second).

Execution Time

The amount of time taken by a computer to execute a task or run a program from start to finish.

Execution Pipeline

The series of stages through which instructions pass in a processor, from fetching to execution, affecting overall performance.

Exponential Backoff

A strategy where the time between retries is increased exponentially to prevent network congestion.

Failure Recovery

The system’s ability to restore normal operation after a failure without losing data or causing significant downtime.

Fault Detection

Methods for identifying and diagnosing errors or failures within a system to prevent larger issues.

Fault Management

Detecting, diagnosing, and responding to hardware or software faults to maintain system reliability.

Fault Tolerance

The system’s ability to continue operating correctly even if some components fail.

Feature Extraction

Identifying and selecting relevant data attributes to improve machine learning model performance.

Federated Learning

Distributed machine learning where data is kept locally, and models are trained collectively without sharing sensitive data.

Feedback Loop

A system mechanism that uses the system’s current output to adjust future performance. System status is responsive to changes in the workload but not to be considered as dynamic configuration to goals (not self-aware).

Floating Point Operations Per Second (FLOPS)

A measure of a computer’s performance in tasks involving real-number calculations.

Goal-Oriented Computing

A computational approach where systems are designed to meet specific, often changing, objectives like performance, accuracy, or energy use.

Goal Specification

The process of defining what a system should achieve, often in terms of constraints like energy, latency, or performance. May be diverse, simple or complex. May be applied globally or distributed. May be as diverse as the initial system design specification or of any other construction.

GPU (Graphics Processing Unit)

A specialized processor designed for rendering graphics, increasingly used for general-purpose computing in AI.

Gradient Descent

An optimization algorithm that adjusts model parameters to minimize errors by moving toward the steepest descent.

Graph Neural Network (GNN)

A neural network model designed to operate on graph-structured data.

Green Computing

Designing, using, and disposing of computers in ways that minimize environmental impact.

Hardware Acceleration

Using specialized hardware to perform specific computing tasks faster or more efficiently than a general-purpose CPU.

Heterogeneous Computing

Using different types of processors (e.g., CPU, GPU) to handle different parts of a workload for better performance.

Heterogeneous Memory Systems

Systems that combine different types of memory to optimize performance and storage capacity.

Heterogeneous Systems

Computing systems composed of different types of processors (e.g., CPU, GPU), each optimized for specific tasks to improve performance and energy efficiency.

High Availability (HA)

Systems designed for operational continuity with minimal downtime through redundancy and failover.

High-Performance Computing (HPC)

The use of supercomputers or parallel processing to perform large-scale computations quickly.

Hyperparameter Tuning

Optimizing the parameters that govern a machine learning model’s training process to improve performance.

Idle Power Consumption

The power consumed by a system when it is not actively processing tasks but still powered on.

Inference

The process of using a trained machine learning model to make predictions or decisions based on new data.

Inference Latency

The delay between submitting input to an AI model and receiving output, critical in time-sensitive applications.

Inference Time

The time taken by a machine learning model to make predictions or decisions based on new data inputs after training is complete.

Infrastructure as a Service (IaaS)

A cloud computing model that provides virtualized computing resources over the internet.

Interconnect

The communication link between processors, memory, and other components in a system, affecting data transfer rates.

I/O Bound

When a system’s performance is limited by its input/output operations rather than its computational power.

JouleGuard

A method for managing energy and performance trade-offs, using machine learning and control theory for energy-efficient computing.

Knobs (Configurable Parameters)

Adjustable settings in a system (e.g., clock speed, core usage) that control performance and resource allocation.

Latency

The time delay experienced between the initiation of a process (e.g., user input) and its completion or response.

Latency Constraint

A limit placed on how much time a system has to respond to inputs or complete a process.

Latency Overhead

Delays introduced by system overhead, increasing the time taken to complete tasks or data processing.

Latency Sensitivity

The degree to which system performance depends on minimizing delays in processing or communication.

Learning Rate

A hyperparameter that determines how quickly a machine learning model adjusts its parameters based on error measurements.

Load Balancing

The distribution of workloads across multiple computing resources to optimize performance and prevent overloading.

Load Distribution

The balancing of tasks across different system resources to prevent overloading any single component.

Load Shedding

Dropping or postponing lower-priority tasks to ensure high-priority tasks are processed efficiently during overload.

Low-Latency Optimization

Techniques to minimize delays in data processing, critical in real-time systems like video streaming.

Low-Power Mode

A system state where power consumption is reduced, typically extending battery life at the cost of reduced performance.

Machine Learning

A field of artificial intelligence where systems learn from data and improve their performance without being explicitly programmed for every task.

Memory Bandwidth

The amount of data that can be read or written to memory by a processor in a given time.

Memory Hierarchy

The organization of different types of memory (registers, cache, RAM, disk) to balance speed and cost efficiency.

Memory Management Unit (MMU)

A hardware component that manages memory access and virtual address translation in a system.

Microarchitecture

The internal structure of a processor, determining how instructions are executed and how performance is optimized.

Model Compression

Techniques to reduce the size and complexity of machine learning models, improving their efficiency.

Multicore Processor

A single computing unit with two or more independent processing cores for performing parallel processing.

Multithreading

Executing multiple threads concurrently within a single process to improve performance.

Network Latency

The delay in sending and receiving data across a network, affecting performance in time-sensitive applications.

Node

A single computing device or server within a distributed system or network.

Non-Volatile Memory (NVM)

Memory that retains stored data even when the system is powered off, such as flash storage.

Normalization

A data preprocessing technique that adjusts data values to a common scale, improving machine learning model training.

Offloading

Transferring computational tasks from one system or component to another, typically for optimization or load balancing.

On-Demand Computing

A computing model where resources are dynamically provided based on the current system or user needs.

Optimization

The process of making a system or design as effective or functional as possible under given constraints, such as performance or energy.

Optimization Algorithm

A mathematical method for adjusting system parameters to achieve the best possible performance.

Overfitting

When a machine learning model performs well on training data but poorly on unseen data, often due to excessive complexity.

Overhead

The extra processing or resource costs associated with managing or maintaining a system’s operation.

Packet Loss

Data packets lost during network transmission, leading to degraded performance and the need for retransmission.

Parallel Processing

The simultaneous execution of multiple tasks by dividing them among multiple processing units to improve speed.

Parallelism Granularity

The size of tasks executed in parallel, with fine-grained parallelism involving small tasks and coarse-grained parallelism involving larger ones.

Pipelining

A technique where multiple instruction stages are overlapped to improve instruction throughput in processor design.

Performance Bottleneck

A limitation in a system’s design that restricts its overall performance, often due to resource contention.

Performance Hash Table (PHT)

A data structure used to store and predict the performance and energy characteristics of different system configurations.

Performance Monitoring

Tracking system performance metrics (e.g., CPU usage) to ensure optimal operation.

Performance Prediction

Using models or algorithms to estimate system performance under different configurations or workloads.

Power Budget

The total amount of power allocated to a system or process, critical in optimizing energy efficiency.

Power Capping

Limiting a system’s power consumption to stay within an energy budget or prevent overheating.

Power Efficiency

Maximizing system performance while consuming the least possible energy.

Predictive Analytics

Using data and machine learning techniques to predict future events or outcomes based on historical data.

Predictive Control

Using models to predict future system behavior and adjust actions accordingly to meet performance goals.

Predictive Modeling

The use of statistical techniques and machine learning to forecast future system states or behavior based on historical data.

Proactive Scaling

Adjusting system resources in anticipation of future workload increases or performance needs.

Quantum Computing

A computing paradigm using quantum bits (qubits) and quantum mechanics principles for complex calculations faster than classical computers.

QoR (Quality of Result)

A measure of how well a system meets its performance or accuracy targets, especially in approximate computing.

Quality of Service (QoS)

A measure of the overall system performance in terms of reliability, responsiveness, and availability.

Real-Time Adjustment

The ability of a system to modify its behavior based on current data or conditions.

Redundancy

Extra components or systems that ensure continued operation in case of failure, improving reliability and availability.

Reinforcement Learning

A machine learning approach where an agent learns to make decisions by receiving rewards or penalties for its actions.

Reinforcement Signal

Feedback used by a reinforcement learning agent to guide its decision-making toward desired outcomes.

Resource Allocation

Distributing available resources (e.g., CPU, memory) to different tasks or processes based on demand.

Resource Contention

A situation where multiple tasks compete for the same computing resources, leading to reduced performance.

Resource Contention Management

Techniques to manage competition for resources between multiple tasks or processes to optimize performance.

Resource Efficiency

The effective use of system resources to achieve optimal performance with minimal waste.

Response Time

The total time taken by a system to react to input, including latency and processing time.

Runtime Optimization

The real-time adjustment of system parameters or resources to improve performance during execution.

Scheduling Algorithm

A set of rules determining the order in which tasks are executed by a processor, optimizing performance or resource use.

Scalability

The ability of a system to handle increased workloads or to be expanded by adding resources without compromising performance.

Scalable Vector Extensions (SVE)

An instruction set architecture extension improving performance in vector processing, used in scientific computing.

Scaling Factor

A ratio used to increase or decrease system performance or resource usage based on demand.

Scheduling Algorithm

A set of rules that determine the order in which tasks are executed by a processor, often used to optimize performance or resource usage.

Self-Aware™ Computing

A system’s ability to monitor and reason about its performance and adjust configurations autonomously to meet goals like energy efficiency given dynamic changes in the workload at any instant.

Self-Aware™ Capable Computing

A system that has a variable Configuration Space that has been designed to be computationally available to be manipulated by goals in the future.

Self-Optimizing System

A system that automatically adjusts its configuration and behavior to achieve optimal performance under changing conditions.

Server Cluster

A group of linked servers working together to handle workloads, improving system performance, reliability, or scalability.

Simulation

Using models to replicate system behavior under different conditions, often for performance testing or optimization.

Software as a Service (SaaS)

A cloud computing model providing software applications over the internet, allowing on-demand access without local installation.

Sparse Representation

A data representation where only a small subset of values is non-zero, reducing computation and memory requirements.

Surrogate Model

A simplified model used in place of a more complex one to approximate performance or behavior with less resource use.

System Bottleneck

A limiting factor within a system that reduces its overall efficiency or speed.

Task Parallelism

A form of parallel computing where different tasks or processes are executed simultaneously across multiple processors.

Task Scheduling

The process of assigning tasks to system resources at specific times to optimize performance.

Temporal Scalability

A system’s ability to maintain performance over time, even as workloads or conditions change.

Thermal Throttling

A mechanism reducing system performance to prevent overheating when temperatures exceed safe thresholds.

Throughput

The amount of work a system can process in a given period, often used to measure system performance.

Throughput Optimization

Maximizing the rate at which a system processes tasks or data to improve overall performance.

Trade-off Space

The set of possible compromises between conflicting objectives (e.g., performance vs. energy consumption) in system optimization.

Training Data

A dataset used to train a machine learning model, helping it learn patterns for predictions or decisions.

Underfitting

A machine learning problem where a model is too simple to capture data patterns, resulting in poor performance.

Virtual Machine (VM)

A software-emulated computer system allowing multiple operating systems to run on a single physical machine.

Virtualization

Creating a virtual version of a resource (e.g., server, storage) to allow multiple systems to share the same hardware.

Voltage Scaling

Reducing energy consumption by lowering the voltage supplied to a processor, usually at the cost of reduced performance.

Workload

The total set of tasks a system must handle, affecting performance and resource allocation.

Workload-Aware Optimization

Adjusting system configurations based on the specific characteristics and demands of the current workload.

Workload Diversity

The variety of tasks handled by a system, affecting optimization strategies for performance and energy efficiency.

Workload Sensitivity

How a system’s performance or energy use is affected by different types of workloads.