Glossary

This glossary defines key terms, acronyms, and concepts used throughout the MobilityCorp Architecture Documentation.

A

ADR (Architecture Decision Record)

Documented architectural decisions with context, alternatives, and rationale
Format: ADR-XX: Title
Example: ADR-15: Cloud Provider Selection

Agent (AI Agent)

Autonomous AI system that can perceive environment, make decisions, and take actions
Uses LLMs with tool calling capabilities
Examples: Trip Planning Assistant, Operations Copilot

Airflow (Apache Airflow)

Open-source workflow orchestration platform
DAG-based scheduling for batch pipelines
Used for ETL jobs, ML training pipelines, data validation
Integrated with Apache Beam for distributed processing

API Gateway

Service that handles all API requests, provides rate limiting, authentication
AWS API Gateway used in our architecture

Aurora PostgreSQL

AWS managed relational database, PostgreSQL-compatible
Used for transactional data (bookings, users, payments)

B

Batch Inference

ML predictions run on large datasets at scheduled intervals
Contrast with real-time inference
Examples: Weekly battery degradation predictions

Beam (Apache Beam)

Unified programming model for batch and streaming data processing
Provides distributed ETL capabilities
Works with Airflow for large-scale data pipelines
Used in Data Lakehouse transformations

Bedrock (AWS Bedrock)

AWS managed service for foundation models (LLMs)
Provides Claude, Llama, Mistral, Titan models
Used for conversational AI agents

Blue-Green Deployment

Deployment strategy with two identical environments (blue/current, green/new)
Traffic switched instantly from blue to green after validation
Enables zero-downtime updates

Bronze Layer

First layer in Medallion Architecture
Raw, unprocessed data from sources
Append-only, schema-on-read
Retention: 30 days

C

Canary Deployment

Gradual rollout strategy starting with small percentage of traffic
Monitor metrics before increasing traffic
Example: 5% → 10% → 25% → 50% → 100%

CAN Bus (Controller Area Network)

Vehicle communication protocol for sensors and control units
Provides rich telemetry (speed, RPM, battery, diagnostics)
Used in cars and vans, not scooters/eBikes

CQRS (Command Query Responsibility Segregation)

Pattern separating read and write operations
Optimizes for different access patterns
Used in high-traffic services (booking, fleet)

Claude (Claude 3.5 Sonnet)

Large Language Model by Anthropic
200K context window, excellent tool use
Our primary LLM via AWS Bedrock

D

Data Drift

Change in input data distribution over time
Causes model accuracy degradation
Detected by SageMaker Model Monitor

Data Lakehouse

Hybrid architecture combining data lake (S3) + data warehouse
Enables both batch analytics and real-time queries
Our implementation uses Medallion Architecture

Delta Lake

Open-source storage format providing ACID transactions on data lakes
Adds reliability and performance to Parquet files
Used in Silver and Gold layers

DynamoDB

AWS managed NoSQL database
Low-latency key-value store
Used for real-time operational data (vehicle status, conversations)

E

eBike (Electric Bike)

Electric bicycle with pedal assist
Max speed: ~25 km/h (EU regulation)
Battery-powered, requires manual battery swaps

Edge Computing

Processing data on device (vehicle) instead of cloud
Reduces latency, works offline, saves bandwidth
Used for safety-critical ML models

Event-Driven Architecture

Architecture where components communicate via events
Asynchronous, loosely coupled
Kafka/MSK as event backbone

Event Sourcing

Pattern storing all changes as sequence of events
Enables audit trail and temporal queries
Used for booking and payment services

F

Feature Store

Repository of ML features for training and serving
Online store (low-latency) + Offline store (historical)
SageMaker Feature Store used

Fitness Function

Quantifiable metric measuring architectural success
Examples: P95 latency < 200ms, uptime > 99.95%
Automated checks prevent architectural drift

G

GDPR (General Data Protection Regulation)

EU regulation for data protection and privacy
Requirements: data residency, right to erasure, consent
All our data stays in EU (Frankfurt + Ireland)

Geofence

Virtual geographic boundary
Triggers alerts when vehicles enter/exit zones
Used for operational areas and parking validation

Gold Layer

Third layer in Medallion Architecture
Business-ready aggregations and metrics
Optimized for BI and reporting
Retention: 3 years

H

HLD (High-Level Design)

Detailed technical documentation of system components
Includes data flow diagrams, sequence diagrams
More detailed than ADRs

Hybrid AI

Combination of traditional ML (XGBoost, LightGBM) + GenAI (LLMs)
Leverages strengths of each approach
Example: ML for predictions, LLM for explanations

I

IAM (Identity and Access Management)

AWS service for authentication and authorization
Role-based access control (RBAC)
Used for all service-to-service auth

IMU (Inertial Measurement Unit)

Sensor measuring acceleration and rotation
Includes accelerometer + gyroscope
Used for collision detection

IoT Core (AWS IoT Core)

AWS managed MQTT broker
Handles 50K+ vehicle connections
Routes telemetry to Kafka

K

Kafka (Apache Kafka)

Distributed event streaming platform
MSK (Managed Streaming for Kafka) on AWS
Our event backbone for telemetry and events

KPI (Key Performance Indicator)

Measurable value demonstrating business effectiveness
Examples: vehicle utilization rate, revenue per vehicle
Tracked in Gold layer aggregations

KYC (Know Your Customer)

Identity verification process
Required for user registration
Documents stored encrypted in S3

L

Lake Formation (AWS Lake Formation)

AWS service for data lake governance
Fine-grained access control (row/column level)
Tracks data lineage

Lambda (AWS Lambda)

AWS serverless functions
Event-driven, auto-scaling
Used for lightweight processing, API endpoints

LangChain

Framework for building LLM applications
Provides agents, tools, memory, chains
Our agentic AI framework

LightGBM

Gradient boosting framework by Microsoft
Fast, efficient for tabular data
Used for demand forecasting

LLM (Large Language Model)

AI model trained on vast text data
Examples: Claude 3.5, GPT-4, Llama 3.1
Used for conversational AI

M

MAPE (Mean Absolute Percentage Error)

ML accuracy metric for regression
Target: < 15% for demand forecasting
Lower is better

Medallion Architecture

Data lakehouse pattern with Bronze/Silver/Gold layers
Each layer adds value (cleaning, transformation, aggregation)
Our data platform design

Microservices

Architectural style with small, independent services
Each service owns its data and API
Enables independent scaling and deployment

MLOps

Practices for ML model lifecycle management
Training, versioning, deployment, monitoring
Implemented with SageMaker Pipelines

Model Drift

Degradation of model accuracy over time
Caused by data drift or concept drift
Triggers model retraining

Model Registry

Repository of trained ML models with metadata
Tracks versions, metrics, lineage
SageMaker Model Registry used

MSK (Managed Streaming for Kafka)

AWS fully managed Apache Kafka service
Our event streaming backbone
3-broker cluster in production

MQTT (Message Queuing Telemetry Transport)

Lightweight pub/sub protocol for IoT
Used for vehicle-to-cloud communication
IoT Core as MQTT broker

N

NFR (Non-Functional Requirement)

System quality attribute (performance, security, scalability)
Examples: 99.95% uptime, P95 latency < 200ms
Enforced by fitness functions

O

OCU (OpenSearch Compute Unit)

Pricing unit for OpenSearch Serverless
~$0.24/OCU-hour
Auto-scales based on workload

OpenSearch

AWS managed search and analytics engine
Used as vector database for RAG
Stores knowledge base embeddings

OTA (Over-The-Air)

Remote software/model updates
No physical access required

P

Parquet

Columnar storage format
Efficient for analytics workloads
Used in Bronze layer

PII (Personally Identifiable Information)

Data that identifies individuals
Examples: name, email, phone, address
Must be protected per GDPR

Point-in-Time Correctness

Feature Store capability ensuring features match training data time
Prevents data leakage
Critical for accurate ML models

Prompt Engineering

Crafting LLM prompts for optimal responses
Includes system prompt, few-shot examples, context
Iterative optimization process

Q

Quantization

Reducing ML model precision (FP32 → INT8)
4x smaller models, faster inference
Used for edge deployment

R

RAG (Retrieval-Augmented Generation)

Technique combining LLM with external knowledge base
Retrieves relevant documents, then generates response
Used for FAQ chatbot

ReAct (Reasoning + Acting)

Agent pattern alternating between thinking and tool use
Format: Thought → Action → Observation → repeat
Our agentic AI approach

Redshift

AWS data warehouse
SQL-based analytics on petabyte scale
Used for BI dashboards (Gold layer)

Redis (Amazon ElastiCache)

In-memory cache
Sub-millisecond latency
Used for hot data (availability, pricing)

RUL (Remaining Useful Life)

Predicted time until component failure
Example: Battery RUL = 45 days
Enables proactive maintenance

S

S3 (Simple Storage Service)

AWS object storage
Unlimited scale, 11 nines durability
Foundation of our data lakehouse

SageMaker

AWS ML platform
End-to-end: data prep, training, deployment, monitoring
Our MLOps backbone

Scooter (Electric Scooter)

Stand-up electric scooter
Max speed: ~20 km/h
Battery-powered, requires manual swaps

Silver Layer

Second layer in Medallion Architecture
Cleaned, validated, deduplicated data
Delta Lake format with ACID
Retention: 2 years

Spot Instances

AWS EC2 instances at 70-90% discount
Can be interrupted with 2-minute notice
Used for ML training jobs

STRIDE

Threat modeling framework
Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, Elevation of Privilege
Used in our threat model

T

Telemetry

Automatic measurement and transmission of data
Vehicle telemetry: GPS, battery, speed, IMU

Temporal

Open-source workflow orchestration platform
Event-driven, reliable execution with automatic retries
Used for real-time AI workflows and retraining triggers
Complements Airflow for event-based orchestration

TensorFlow Lite

Lightweight ML framework for mobile/edge
Optimized for constrained devices
Used for collision detection on edge

Time-Series Database

Database optimized for time-stamped data
TimescaleDB used for vehicle telemetry
Efficient queries on time ranges

U

Utilization Rate

Percentage of time vehicle is rented
Formula: (rental_time / total_available_time) × 100%
Target: +15% YoY improvement

V

VictoriaMetrics

High-performance time-series database
Drop-in replacement for Prometheus with better efficiency
Lower memory footprint and resource usage at scale
Used as primary metrics data store
Fully compatible with Prometheus protocols and Grafana

VPC (Virtual Private Cloud)

Isolated network in AWS
Security groups, subnets, route tables
All services deployed within VPC

X

X-Ray (AWS X-Ray)

Distributed tracing service
Tracks requests across microservices
Identifies performance bottlenecks

XGBoost

Gradient boosting library
Fast, accurate for structured data
Used for dynamic pricing model

Z

Zone

Geographic area for operations
Examples: "Frankfurt Downtown", "Airport District"
Used for demand forecasting and pricing

Acronyms Quick Reference

Acronym	Full Form
ADR	Architecture Decision Record
API	Application Programming Interface
ACID	Atomicity, Consistency, Isolation, Durability
AWS	Amazon Web Services
CAN	Controller Area Network
CDN	Content Delivery Network
CI/CD	Continuous Integration/Continuous Deployment
CQRS	Command Query Responsibility Segregation
DPU	Data Processing Unit
ECS	Elastic Container Service
ETL	Extract, Transform, Load
GDPR	General Data Protection Regulation
GPS	Global Positioning System
HA	High Availability
HLD	High-Level Design
IAM	Identity and Access Management
IMU	Inertial Measurement Unit
IoT	Internet of Things
KPI	Key Performance Indicator
KYC	Know Your Customer
LLM	Large Language Model
MAPE	Mean Absolute Percentage Error
ML	Machine Learning
MLOps	Machine Learning Operations
MQTT	Message Queuing Telemetry Transport
MSK	Managed Streaming for Kafka
NFR	Non-Functional Requirement
NLP	Natural Language Processing
OCU	OpenSearch Compute Unit
OTA	Over-The-Air
PII	Personally Identifiable Information
RAG	Retrieval-Augmented Generation
RBAC	Role-Based Access Control
REST	Representational State Transfer
RUL	Remaining Useful Life
S3	Simple Storage Service (AWS)
SLA	Service Level Agreement
SQL	Structured Query Language
TCO	Total Cost of Ownership
TLS	Transport Layer Security
VPC	Virtual Private Cloud

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Glossary

A

B

C

D

E

F

G

H

I

K

L

M

N

O

P

Q

R

S

T

U

V

X

Z

Acronyms Quick Reference

Related Documentation

FilesExpand file tree

GLOSSARY.md

Latest commit

History

GLOSSARY.md

File metadata and controls

Glossary

A

B

C

D

E

F

G

H

I

K

L

M

N

O

P

Q

R

S

T

U

V

X

Z

Acronyms Quick Reference

Related Documentation