|
| 1 | +# Dynamic Profiling Data Model |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +The Dynamic Profiling feature enables profiling requests at various hierarchy levels (service, job, namespace) to be mapped to specific host-level commands while maintaining sub-second heartbeat response times for 165k QPM (Queries Per Minute). |
| 6 | + |
| 7 | +## Architecture |
| 8 | + |
| 9 | +The dynamic profiling system consists of: |
| 10 | + |
| 11 | +1. **Profiling Requests** - API-level requests specifying targets at various hierarchy levels |
| 12 | +2. **Profiling Commands** - Host-specific commands sent to agents |
| 13 | +3. **Host Heartbeats** - Real-time host availability tracking (optimized for 165k QPM) |
| 14 | +4. **Profiling Executions** - Audit trail of profiling executions |
| 15 | +5. **Hierarchical Mappings** - Denormalized tables for fast query performance |
| 16 | + |
| 17 | +## Data Model |
| 18 | + |
| 19 | +``` |
| 20 | +┌─────────────────────────────────────────────────────────────────┐ |
| 21 | +│ DYNAMIC PROFILING DATA MODEL │ |
| 22 | +├─────────────────────────────────────────────────────────────────┤ |
| 23 | +│ │ |
| 24 | +│ ┌──────────────────┐ ┌──────────────────────┐ │ |
| 25 | +│ │ ProfilingRequest │────────▶│ ProfilingCommand │ │ |
| 26 | +│ │ │ │ │ │ |
| 27 | +│ │ - Request ID │ │ - Command ID │ │ |
| 28 | +│ │ - Service/Job/ │ │ - Host ID (indexed) │ │ |
| 29 | +│ │ Namespace/Pod │ │ - Target Containers │ │ |
| 30 | +│ │ - Duration │ │ - Target Processes │ │ |
| 31 | +│ │ - Sample Rate │ │ - Command Type │ │ |
| 32 | +│ │ - Status │ │ - Status │ │ |
| 33 | +│ └──────────────────┘ └──────────────────────┘ │ |
| 34 | +│ │ │ │ |
| 35 | +│ │ │ │ |
| 36 | +│ ▼ ▼ │ |
| 37 | +│ ┌──────────────────────┐ ┌─────────────────┐ │ |
| 38 | +│ │ ProfilingExecutions │ │ HostHeartbeats │ │ |
| 39 | +│ │ (Audit Trail) │ │ │ │ |
| 40 | +│ │ │ │ - Host ID │ │ |
| 41 | +│ │ - Execution ID │ │ - Service │ │ |
| 42 | +│ │ - Request/Command │ │ - Containers │ │ |
| 43 | +│ │ - Host Name │ │ - Workloads │ │ |
| 44 | +│ │ - Status │ │ - Last Seen ⚡ │ │ |
| 45 | +│ └──────────────────────┘ └─────────────────┘ │ |
| 46 | +│ │ |
| 47 | +│ ┌────────────────────────────────────────────────────────┐ │ |
| 48 | +│ │ HIERARCHICAL MAPPING TABLES │ │ |
| 49 | +│ │ (Denormalized for Fast Query Performance) │ │ |
| 50 | +│ ├────────────────────────────────────────────────────────┤ │ |
| 51 | +│ │ │ │ |
| 52 | +│ │ • NamespaceServices - Namespace → Service │ │ |
| 53 | +│ │ • ServiceContainers - Service → Container │ │ |
| 54 | +│ │ • JobContainers - Job → Container │ │ |
| 55 | +│ │ • ContainerProcesses - Container → Process │ │ |
| 56 | +│ │ • ContainersHosts - Container → Host │ │ |
| 57 | +│ │ • ProcessesHosts - Process → Host │ │ |
| 58 | +│ │ │ │ |
| 59 | +│ └────────────────────────────────────────────────────────┘ │ |
| 60 | +│ │ |
| 61 | +└─────────────────────────────────────────────────────────────────┘ |
| 62 | +``` |
| 63 | + |
| 64 | +## Key Features |
| 65 | + |
| 66 | +### 1. Hierarchical Request Mapping |
| 67 | + |
| 68 | +Requests can target any level of the hierarchy: |
| 69 | +- **Namespace Level**: Profile all services in a namespace |
| 70 | +- **Service Level**: Profile all containers in a service |
| 71 | +- **Job Level**: Profile specific job workloads |
| 72 | +- **Container Level**: Profile specific containers |
| 73 | +- **Process Level**: Profile specific processes |
| 74 | +- **Host Level**: Profile specific hosts |
| 75 | + |
| 76 | +### 2. Sub-Second Heartbeat Performance |
| 77 | + |
| 78 | +The `HostHeartbeats` table is optimized for 165k QPM: |
| 79 | +- Indexed on `host_id` for O(1) lookups |
| 80 | +- Indexed on `timestamp_last_seen` for quick staleness checks |
| 81 | +- Partial indexes on `service_name` and `namespace` for filtered queries |
| 82 | +- Lightweight updates with minimal locking |
| 83 | + |
| 84 | +### 3. Audit Trail |
| 85 | + |
| 86 | +`ProfilingExecutions` table maintains complete audit history: |
| 87 | +- Links to original requests and commands |
| 88 | +- Tracks execution lifecycle |
| 89 | +- Enables troubleshooting and compliance |
| 90 | + |
| 91 | +### 4. Denormalized Mappings |
| 92 | + |
| 93 | +Hierarchical mapping tables trade storage for query speed: |
| 94 | +- Pre-computed relationships |
| 95 | +- Eliminates complex JOINs |
| 96 | +- Enables fast request-to-host resolution |
| 97 | + |
| 98 | +## Database Tables |
| 99 | + |
| 100 | +### Core Tables |
| 101 | + |
| 102 | +#### ProfilingRequest |
| 103 | +Stores profiling requests from API calls. |
| 104 | + |
| 105 | +**Key Fields:** |
| 106 | +- `request_id` (UUID) - Unique identifier |
| 107 | +- `service_name`, `job_name`, `namespace`, etc. - Target specification |
| 108 | +- `profiling_mode` - CPU, memory, allocation, native |
| 109 | +- `duration_seconds` - How long to profile |
| 110 | +- `sample_rate` - Sampling frequency (1-1000) |
| 111 | +- `status` - pending, in_progress, completed, failed, cancelled |
| 112 | + |
| 113 | +**Constraints:** |
| 114 | +- At least one target specification must be provided |
| 115 | +- Duration must be positive |
| 116 | +- Sample rate must be between 1 and 1000 |
| 117 | + |
| 118 | +#### ProfilingCommand |
| 119 | +Commands sent to agents on specific hosts. |
| 120 | + |
| 121 | +**Key Fields:** |
| 122 | +- `command_id` (UUID) - Unique identifier |
| 123 | +- `profiling_request_id` - Links to original request |
| 124 | +- `host_id` - Target host (indexed for fast lookup) |
| 125 | +- `target_containers`, `target_processes` - Specific targets |
| 126 | +- `command_type` - start, stop, reconfigure |
| 127 | +- `command_json` - Serialized command for agent |
| 128 | + |
| 129 | +#### HostHeartbeats |
| 130 | +Tracks host availability and status. |
| 131 | + |
| 132 | +**Key Fields:** |
| 133 | +- `host_id` - Unique host identifier (indexed) |
| 134 | +- `host_name`, `host_ip` - Host details |
| 135 | +- `service_name`, `namespace` - Contextual info |
| 136 | +- `containers`, `jobs`, `workloads` - Current state |
| 137 | +- `timestamp_last_seen` - Last heartbeat (indexed) |
| 138 | +- `last_command_id` - Last command received |
| 139 | + |
| 140 | +**Performance:** |
| 141 | +- Designed for 165k QPM |
| 142 | +- Sub-second response times |
| 143 | +- Optimized indexes for common queries |
| 144 | + |
| 145 | +#### ProfilingExecutions |
| 146 | +Audit trail of profiling executions. |
| 147 | + |
| 148 | +**Key Fields:** |
| 149 | +- `execution_id` (UUID) - Unique identifier |
| 150 | +- `profiling_request_id` - Original request |
| 151 | +- `profiling_command_id` - Executed command |
| 152 | +- `host_name` - Where it executed |
| 153 | +- `started_at`, `completed_at` - Execution timeline |
| 154 | +- `status` - Execution status |
| 155 | + |
| 156 | +### Mapping Tables |
| 157 | + |
| 158 | +All mapping tables follow a similar pattern: |
| 159 | +- Primary key (ID) |
| 160 | +- Mapping fields (indexed) |
| 161 | +- Timestamps (created_at, updated_at) |
| 162 | +- Unique constraint on the mapping |
| 163 | + |
| 164 | +#### NamespaceServices |
| 165 | +Maps namespaces → services |
| 166 | + |
| 167 | +#### ServiceContainers |
| 168 | +Maps services → containers |
| 169 | + |
| 170 | +#### JobContainers |
| 171 | +Maps jobs → containers |
| 172 | + |
| 173 | +#### ContainerProcesses |
| 174 | +Maps containers → processes (includes process name) |
| 175 | + |
| 176 | +#### ContainersHosts |
| 177 | +Maps containers → hosts |
| 178 | + |
| 179 | +#### ProcessesHosts |
| 180 | +Maps processes → hosts |
| 181 | + |
| 182 | +## Setup and Installation |
| 183 | + |
| 184 | +### 1. Apply Database Schema |
| 185 | + |
| 186 | +```bash |
| 187 | +# Connect to PostgreSQL |
| 188 | +psql -U postgres -d gprofiler |
| 189 | + |
| 190 | +# Run the schema |
| 191 | +\i scripts/setup/postgres/dynamic_profiling_schema.sql |
| 192 | +``` |
| 193 | + |
| 194 | +### 2. Verify Tables Created |
| 195 | + |
| 196 | +```sql |
| 197 | +SELECT table_name |
| 198 | +FROM information_schema.tables |
| 199 | +WHERE table_schema = 'public' |
| 200 | + AND table_name LIKE '%profiling%' |
| 201 | + OR table_name LIKE '%heartbeat%'; |
| 202 | +``` |
| 203 | + |
| 204 | +Expected tables: |
| 205 | +- profilingrequest |
| 206 | +- profilingcommand |
| 207 | +- hostheartbeats |
| 208 | +- profilingexecutions |
| 209 | +- namespaceservices |
| 210 | +- servicecontainers |
| 211 | +- jobcontainers |
| 212 | +- containerprocesses |
| 213 | +- containershosts |
| 214 | +- processeshosts |
| 215 | + |
| 216 | +### 3. Verify Indexes |
| 217 | + |
| 218 | +```sql |
| 219 | +SELECT tablename, indexname |
| 220 | +FROM pg_indexes |
| 221 | +WHERE tablename IN ('profilingrequest', 'profilingcommand', 'hostheartbeats', 'profilingexecutions') |
| 222 | +ORDER BY tablename, indexname; |
| 223 | +``` |
| 224 | + |
| 225 | +## API Models |
| 226 | + |
| 227 | +Python Pydantic models are available in: |
| 228 | +``` |
| 229 | +src/gprofiler/backend/models/dynamic_profiling_models.py |
| 230 | +``` |
| 231 | + |
| 232 | +### Key Models |
| 233 | + |
| 234 | +#### Request Models |
| 235 | +- `ProfilingRequestCreate` - Create new profiling request |
| 236 | +- `ProfilingRequestResponse` - Response with all fields |
| 237 | +- `ProfilingRequestUpdate` - Update request status |
| 238 | + |
| 239 | +#### Command Models |
| 240 | +- `ProfilingCommandCreate` - Create command for agent |
| 241 | +- `ProfilingCommandResponse` - Command details |
| 242 | +- `ProfilingCommandUpdate` - Update command status |
| 243 | + |
| 244 | +#### Heartbeat Models |
| 245 | +- `HostHeartbeatCreate` - Register/update host heartbeat |
| 246 | +- `HostHeartbeatResponse` - Heartbeat details |
| 247 | +- `HostHeartbeatUpdate` - Update heartbeat timestamp |
| 248 | + |
| 249 | +#### Execution Models |
| 250 | +- `ProfilingExecutionCreate` - Create audit entry |
| 251 | +- `ProfilingExecutionResponse` - Execution details |
| 252 | +- `ProfilingExecutionUpdate` - Update execution status |
| 253 | + |
| 254 | +#### Query Models |
| 255 | +- `ProfilingRequestQuery` - Filter profiling requests |
| 256 | +- `HostHeartbeatQuery` - Filter heartbeats |
| 257 | +- `ProfilingExecutionQuery` - Filter executions |
| 258 | + |
| 259 | +## Usage Examples |
| 260 | + |
| 261 | +### Creating a Profiling Request |
| 262 | + |
| 263 | +```python |
| 264 | +from dynamic_profiling_models import ProfilingRequestCreate, ProfilingMode |
| 265 | +from datetime import datetime, timedelta |
| 266 | + |
| 267 | +# Profile all containers in a service |
| 268 | +request = ProfilingRequestCreate( |
| 269 | + service_name="web-api", |
| 270 | + profiling_mode=ProfilingMode.CPU, |
| 271 | + duration_seconds=60, |
| 272 | + sample_rate=100, |
| 273 | + start_time=datetime.utcnow(), |
| 274 | + stop_time=datetime.utcnow() + timedelta(seconds=60) |
| 275 | +) |
| 276 | +``` |
| 277 | + |
| 278 | +### Recording a Host Heartbeat |
| 279 | + |
| 280 | +```python |
| 281 | +from dynamic_profiling_models import HostHeartbeatCreate |
| 282 | + |
| 283 | +heartbeat = HostHeartbeatCreate( |
| 284 | + host_id="host-12345", |
| 285 | + host_name="worker-node-01", |
| 286 | + host_ip="10.0.1.42", |
| 287 | + service_name="web-api", |
| 288 | + namespace="production", |
| 289 | + containers=["web-api-container-1", "web-api-container-2"], |
| 290 | + jobs=["data-processing-job"], |
| 291 | + executors=["pyspy", "perf"] |
| 292 | +) |
| 293 | +``` |
| 294 | + |
| 295 | +### Querying Active Hosts |
| 296 | + |
| 297 | +```python |
| 298 | +from dynamic_profiling_models import HostHeartbeatQuery |
| 299 | +from datetime import datetime, timedelta |
| 300 | + |
| 301 | +# Find hosts seen in last 5 minutes in production namespace |
| 302 | +query = HostHeartbeatQuery( |
| 303 | + namespace="production", |
| 304 | + last_seen_after=datetime.utcnow() - timedelta(minutes=5), |
| 305 | + limit=100 |
| 306 | +) |
| 307 | +``` |
| 308 | + |
| 309 | +## Performance Considerations |
| 310 | + |
| 311 | +### Heartbeat Optimization (165k QPM Target) |
| 312 | + |
| 313 | +1. **Use Connection Pooling**: Minimize connection overhead |
| 314 | +2. **Batch Updates**: Update multiple heartbeats in single transaction |
| 315 | +3. **Index Usage**: Queries should use `host_id` or `timestamp_last_seen` indexes |
| 316 | +4. **Avoid Full Scans**: Always filter on indexed columns |
| 317 | + |
| 318 | +### Query Performance |
| 319 | + |
| 320 | +1. **Use Denormalized Tables**: Mapping tables eliminate expensive JOINs |
| 321 | +2. **Limit Result Sets**: Always use pagination with `limit` and `offset` |
| 322 | +3. **Index Coverage**: Ensure queries can use existing indexes |
| 323 | +4. **Monitor Query Plans**: Use `EXPLAIN ANALYZE` to verify performance |
| 324 | + |
| 325 | +### Maintenance |
| 326 | + |
| 327 | +1. **Regular VACUUM**: Prevent table bloat on frequently updated tables |
| 328 | +2. **Analyze Statistics**: Keep query planner statistics up to date |
| 329 | +3. **Monitor Index Usage**: Check `pg_stat_user_indexes` |
| 330 | +4. **Archive Old Data**: Consider partitioning or archiving old executions |
| 331 | + |
| 332 | +## Migration Path |
| 333 | + |
| 334 | +### Phase 1: Schema Deployment (Current) |
| 335 | +✅ Create database tables |
| 336 | +✅ Create Python models |
| 337 | +✅ Add indexes and constraints |
| 338 | + |
| 339 | +### Phase 2: API Endpoints (Next) |
| 340 | +- POST /api/profiling/requests |
| 341 | +- GET /api/profiling/requests |
| 342 | +- POST /api/profiling/heartbeats |
| 343 | +- GET /api/profiling/heartbeats |
| 344 | +- GET /api/profiling/executions |
| 345 | + |
| 346 | +### Phase 3: Agent Integration |
| 347 | +- Agent heartbeat loop |
| 348 | +- Command polling/pulling |
| 349 | +- Execution reporting |
| 350 | + |
| 351 | +### Phase 4: Request Resolution |
| 352 | +- Map requests to hosts |
| 353 | +- Generate host commands |
| 354 | +- Track execution lifecycle |
| 355 | + |
| 356 | +## References |
| 357 | + |
| 358 | +- Google Doc: [Dynamic Profiling Design](https://docs.google.com/document/d/1iwA_NN1YKDBqfig95Qevw0HcSCqgu7_ya8PGuCksCPc/edit) |
| 359 | +- SQL Schema: `scripts/setup/postgres/dynamic_profiling_schema.sql` |
| 360 | +- Python Models: `src/gprofiler/backend/models/dynamic_profiling_models.py` |
| 361 | + |
| 362 | +## Contributing to Intel Open Source |
| 363 | + |
| 364 | +This implementation is part of gProfiler Performance Studio's contribution to Intel's open source initiative. The dynamic profiling capability will enable: |
| 365 | + |
| 366 | +1. **Hierarchical Profiling**: Profile at any level (namespace, service, job, container, process, host) |
| 367 | +2. **Scalability**: Support 165k QPM with sub-second response times |
| 368 | +3. **Auditability**: Complete execution history for compliance and troubleshooting |
| 369 | +4. **Flexibility**: Extensible architecture for future profiling modes |
| 370 | + |
| 371 | +## License |
| 372 | + |
| 373 | +Copyright (C) 2023 Intel Corporation |
| 374 | + |
| 375 | +Licensed under the Apache License, Version 2.0 (the "License"); |
| 376 | +you may not use this file except in compliance with the License. |
| 377 | +You may obtain a copy of the License at |
| 378 | + |
| 379 | + http://www.apache.org/licenses/LICENSE-2.0 |
| 380 | + |
| 381 | +Unless required by applicable law or agreed to in writing, software |
| 382 | +distributed under the License is distributed on an "AS IS" BASIS, |
| 383 | +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| 384 | +See the License for the specific language governing permissions and |
| 385 | +limitations under the License. |
| 386 | + |
| 387 | + |
| 388 | + |
| 389 | + |
0 commit comments