Skip to content

Commit 86e6ca7

Browse files
Add dynamic profiling data models and PostgreSQL schemas
- Add PostgreSQL schema for dynamic profiling tables - Add Python Pydantic models for API requests/responses - Add comprehensive documentation for dynamic profiling - Add test scripts for profiling models - Add unit tests for profiling host status This contribution adds the foundational data model for Intel's dynamic profiling feature, including database schemas and API models.
1 parent ebfd0ab commit 86e6ca7

File tree

8 files changed

+3106
-0
lines changed

8 files changed

+3106
-0
lines changed

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,3 +70,7 @@ src/gprofiler/frontend/.vscode/settings.json
7070

7171
# package build
7272
/src/gprofiler-dev/build
73+
74+
# Testing and temporary files
75+
ROUTE_CHANGE/
76+
TEMP_FILES/

docs/DYNAMIC_PROFILING.md

Lines changed: 389 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,389 @@
1+
# Dynamic Profiling Data Model
2+
3+
## Overview
4+
5+
The Dynamic Profiling feature enables profiling requests at various hierarchy levels (service, job, namespace) to be mapped to specific host-level commands while maintaining sub-second heartbeat response times for 165k QPM (Queries Per Minute).
6+
7+
## Architecture
8+
9+
The dynamic profiling system consists of:
10+
11+
1. **Profiling Requests** - API-level requests specifying targets at various hierarchy levels
12+
2. **Profiling Commands** - Host-specific commands sent to agents
13+
3. **Host Heartbeats** - Real-time host availability tracking (optimized for 165k QPM)
14+
4. **Profiling Executions** - Audit trail of profiling executions
15+
5. **Hierarchical Mappings** - Denormalized tables for fast query performance
16+
17+
## Data Model
18+
19+
```
20+
┌─────────────────────────────────────────────────────────────────┐
21+
│ DYNAMIC PROFILING DATA MODEL │
22+
├─────────────────────────────────────────────────────────────────┤
23+
│ │
24+
│ ┌──────────────────┐ ┌──────────────────────┐ │
25+
│ │ ProfilingRequest │────────▶│ ProfilingCommand │ │
26+
│ │ │ │ │ │
27+
│ │ - Request ID │ │ - Command ID │ │
28+
│ │ - Service/Job/ │ │ - Host ID (indexed) │ │
29+
│ │ Namespace/Pod │ │ - Target Containers │ │
30+
│ │ - Duration │ │ - Target Processes │ │
31+
│ │ - Sample Rate │ │ - Command Type │ │
32+
│ │ - Status │ │ - Status │ │
33+
│ └──────────────────┘ └──────────────────────┘ │
34+
│ │ │ │
35+
│ │ │ │
36+
│ ▼ ▼ │
37+
│ ┌──────────────────────┐ ┌─────────────────┐ │
38+
│ │ ProfilingExecutions │ │ HostHeartbeats │ │
39+
│ │ (Audit Trail) │ │ │ │
40+
│ │ │ │ - Host ID │ │
41+
│ │ - Execution ID │ │ - Service │ │
42+
│ │ - Request/Command │ │ - Containers │ │
43+
│ │ - Host Name │ │ - Workloads │ │
44+
│ │ - Status │ │ - Last Seen ⚡ │ │
45+
│ └──────────────────────┘ └─────────────────┘ │
46+
│ │
47+
│ ┌────────────────────────────────────────────────────────┐ │
48+
│ │ HIERARCHICAL MAPPING TABLES │ │
49+
│ │ (Denormalized for Fast Query Performance) │ │
50+
│ ├────────────────────────────────────────────────────────┤ │
51+
│ │ │ │
52+
│ │ • NamespaceServices - Namespace → Service │ │
53+
│ │ • ServiceContainers - Service → Container │ │
54+
│ │ • JobContainers - Job → Container │ │
55+
│ │ • ContainerProcesses - Container → Process │ │
56+
│ │ • ContainersHosts - Container → Host │ │
57+
│ │ • ProcessesHosts - Process → Host │ │
58+
│ │ │ │
59+
│ └────────────────────────────────────────────────────────┘ │
60+
│ │
61+
└─────────────────────────────────────────────────────────────────┘
62+
```
63+
64+
## Key Features
65+
66+
### 1. Hierarchical Request Mapping
67+
68+
Requests can target any level of the hierarchy:
69+
- **Namespace Level**: Profile all services in a namespace
70+
- **Service Level**: Profile all containers in a service
71+
- **Job Level**: Profile specific job workloads
72+
- **Container Level**: Profile specific containers
73+
- **Process Level**: Profile specific processes
74+
- **Host Level**: Profile specific hosts
75+
76+
### 2. Sub-Second Heartbeat Performance
77+
78+
The `HostHeartbeats` table is optimized for 165k QPM:
79+
- Indexed on `host_id` for O(1) lookups
80+
- Indexed on `timestamp_last_seen` for quick staleness checks
81+
- Partial indexes on `service_name` and `namespace` for filtered queries
82+
- Lightweight updates with minimal locking
83+
84+
### 3. Audit Trail
85+
86+
`ProfilingExecutions` table maintains complete audit history:
87+
- Links to original requests and commands
88+
- Tracks execution lifecycle
89+
- Enables troubleshooting and compliance
90+
91+
### 4. Denormalized Mappings
92+
93+
Hierarchical mapping tables trade storage for query speed:
94+
- Pre-computed relationships
95+
- Eliminates complex JOINs
96+
- Enables fast request-to-host resolution
97+
98+
## Database Tables
99+
100+
### Core Tables
101+
102+
#### ProfilingRequest
103+
Stores profiling requests from API calls.
104+
105+
**Key Fields:**
106+
- `request_id` (UUID) - Unique identifier
107+
- `service_name`, `job_name`, `namespace`, etc. - Target specification
108+
- `profiling_mode` - CPU, memory, allocation, native
109+
- `duration_seconds` - How long to profile
110+
- `sample_rate` - Sampling frequency (1-1000)
111+
- `status` - pending, in_progress, completed, failed, cancelled
112+
113+
**Constraints:**
114+
- At least one target specification must be provided
115+
- Duration must be positive
116+
- Sample rate must be between 1 and 1000
117+
118+
#### ProfilingCommand
119+
Commands sent to agents on specific hosts.
120+
121+
**Key Fields:**
122+
- `command_id` (UUID) - Unique identifier
123+
- `profiling_request_id` - Links to original request
124+
- `host_id` - Target host (indexed for fast lookup)
125+
- `target_containers`, `target_processes` - Specific targets
126+
- `command_type` - start, stop, reconfigure
127+
- `command_json` - Serialized command for agent
128+
129+
#### HostHeartbeats
130+
Tracks host availability and status.
131+
132+
**Key Fields:**
133+
- `host_id` - Unique host identifier (indexed)
134+
- `host_name`, `host_ip` - Host details
135+
- `service_name`, `namespace` - Contextual info
136+
- `containers`, `jobs`, `workloads` - Current state
137+
- `timestamp_last_seen` - Last heartbeat (indexed)
138+
- `last_command_id` - Last command received
139+
140+
**Performance:**
141+
- Designed for 165k QPM
142+
- Sub-second response times
143+
- Optimized indexes for common queries
144+
145+
#### ProfilingExecutions
146+
Audit trail of profiling executions.
147+
148+
**Key Fields:**
149+
- `execution_id` (UUID) - Unique identifier
150+
- `profiling_request_id` - Original request
151+
- `profiling_command_id` - Executed command
152+
- `host_name` - Where it executed
153+
- `started_at`, `completed_at` - Execution timeline
154+
- `status` - Execution status
155+
156+
### Mapping Tables
157+
158+
All mapping tables follow a similar pattern:
159+
- Primary key (ID)
160+
- Mapping fields (indexed)
161+
- Timestamps (created_at, updated_at)
162+
- Unique constraint on the mapping
163+
164+
#### NamespaceServices
165+
Maps namespaces → services
166+
167+
#### ServiceContainers
168+
Maps services → containers
169+
170+
#### JobContainers
171+
Maps jobs → containers
172+
173+
#### ContainerProcesses
174+
Maps containers → processes (includes process name)
175+
176+
#### ContainersHosts
177+
Maps containers → hosts
178+
179+
#### ProcessesHosts
180+
Maps processes → hosts
181+
182+
## Setup and Installation
183+
184+
### 1. Apply Database Schema
185+
186+
```bash
187+
# Connect to PostgreSQL
188+
psql -U postgres -d gprofiler
189+
190+
# Run the schema
191+
\i scripts/setup/postgres/dynamic_profiling_schema.sql
192+
```
193+
194+
### 2. Verify Tables Created
195+
196+
```sql
197+
SELECT table_name
198+
FROM information_schema.tables
199+
WHERE table_schema = 'public'
200+
AND table_name LIKE '%profiling%'
201+
OR table_name LIKE '%heartbeat%';
202+
```
203+
204+
Expected tables:
205+
- profilingrequest
206+
- profilingcommand
207+
- hostheartbeats
208+
- profilingexecutions
209+
- namespaceservices
210+
- servicecontainers
211+
- jobcontainers
212+
- containerprocesses
213+
- containershosts
214+
- processeshosts
215+
216+
### 3. Verify Indexes
217+
218+
```sql
219+
SELECT tablename, indexname
220+
FROM pg_indexes
221+
WHERE tablename IN ('profilingrequest', 'profilingcommand', 'hostheartbeats', 'profilingexecutions')
222+
ORDER BY tablename, indexname;
223+
```
224+
225+
## API Models
226+
227+
Python Pydantic models are available in:
228+
```
229+
src/gprofiler/backend/models/dynamic_profiling_models.py
230+
```
231+
232+
### Key Models
233+
234+
#### Request Models
235+
- `ProfilingRequestCreate` - Create new profiling request
236+
- `ProfilingRequestResponse` - Response with all fields
237+
- `ProfilingRequestUpdate` - Update request status
238+
239+
#### Command Models
240+
- `ProfilingCommandCreate` - Create command for agent
241+
- `ProfilingCommandResponse` - Command details
242+
- `ProfilingCommandUpdate` - Update command status
243+
244+
#### Heartbeat Models
245+
- `HostHeartbeatCreate` - Register/update host heartbeat
246+
- `HostHeartbeatResponse` - Heartbeat details
247+
- `HostHeartbeatUpdate` - Update heartbeat timestamp
248+
249+
#### Execution Models
250+
- `ProfilingExecutionCreate` - Create audit entry
251+
- `ProfilingExecutionResponse` - Execution details
252+
- `ProfilingExecutionUpdate` - Update execution status
253+
254+
#### Query Models
255+
- `ProfilingRequestQuery` - Filter profiling requests
256+
- `HostHeartbeatQuery` - Filter heartbeats
257+
- `ProfilingExecutionQuery` - Filter executions
258+
259+
## Usage Examples
260+
261+
### Creating a Profiling Request
262+
263+
```python
264+
from dynamic_profiling_models import ProfilingRequestCreate, ProfilingMode
265+
from datetime import datetime, timedelta
266+
267+
# Profile all containers in a service
268+
request = ProfilingRequestCreate(
269+
service_name="web-api",
270+
profiling_mode=ProfilingMode.CPU,
271+
duration_seconds=60,
272+
sample_rate=100,
273+
start_time=datetime.utcnow(),
274+
stop_time=datetime.utcnow() + timedelta(seconds=60)
275+
)
276+
```
277+
278+
### Recording a Host Heartbeat
279+
280+
```python
281+
from dynamic_profiling_models import HostHeartbeatCreate
282+
283+
heartbeat = HostHeartbeatCreate(
284+
host_id="host-12345",
285+
host_name="worker-node-01",
286+
host_ip="10.0.1.42",
287+
service_name="web-api",
288+
namespace="production",
289+
containers=["web-api-container-1", "web-api-container-2"],
290+
jobs=["data-processing-job"],
291+
executors=["pyspy", "perf"]
292+
)
293+
```
294+
295+
### Querying Active Hosts
296+
297+
```python
298+
from dynamic_profiling_models import HostHeartbeatQuery
299+
from datetime import datetime, timedelta
300+
301+
# Find hosts seen in last 5 minutes in production namespace
302+
query = HostHeartbeatQuery(
303+
namespace="production",
304+
last_seen_after=datetime.utcnow() - timedelta(minutes=5),
305+
limit=100
306+
)
307+
```
308+
309+
## Performance Considerations
310+
311+
### Heartbeat Optimization (165k QPM Target)
312+
313+
1. **Use Connection Pooling**: Minimize connection overhead
314+
2. **Batch Updates**: Update multiple heartbeats in single transaction
315+
3. **Index Usage**: Queries should use `host_id` or `timestamp_last_seen` indexes
316+
4. **Avoid Full Scans**: Always filter on indexed columns
317+
318+
### Query Performance
319+
320+
1. **Use Denormalized Tables**: Mapping tables eliminate expensive JOINs
321+
2. **Limit Result Sets**: Always use pagination with `limit` and `offset`
322+
3. **Index Coverage**: Ensure queries can use existing indexes
323+
4. **Monitor Query Plans**: Use `EXPLAIN ANALYZE` to verify performance
324+
325+
### Maintenance
326+
327+
1. **Regular VACUUM**: Prevent table bloat on frequently updated tables
328+
2. **Analyze Statistics**: Keep query planner statistics up to date
329+
3. **Monitor Index Usage**: Check `pg_stat_user_indexes`
330+
4. **Archive Old Data**: Consider partitioning or archiving old executions
331+
332+
## Migration Path
333+
334+
### Phase 1: Schema Deployment (Current)
335+
✅ Create database tables
336+
✅ Create Python models
337+
✅ Add indexes and constraints
338+
339+
### Phase 2: API Endpoints (Next)
340+
- POST /api/profiling/requests
341+
- GET /api/profiling/requests
342+
- POST /api/profiling/heartbeats
343+
- GET /api/profiling/heartbeats
344+
- GET /api/profiling/executions
345+
346+
### Phase 3: Agent Integration
347+
- Agent heartbeat loop
348+
- Command polling/pulling
349+
- Execution reporting
350+
351+
### Phase 4: Request Resolution
352+
- Map requests to hosts
353+
- Generate host commands
354+
- Track execution lifecycle
355+
356+
## References
357+
358+
- Google Doc: [Dynamic Profiling Design](https://docs.google.com/document/d/1iwA_NN1YKDBqfig95Qevw0HcSCqgu7_ya8PGuCksCPc/edit)
359+
- SQL Schema: `scripts/setup/postgres/dynamic_profiling_schema.sql`
360+
- Python Models: `src/gprofiler/backend/models/dynamic_profiling_models.py`
361+
362+
## Contributing to Intel Open Source
363+
364+
This implementation is part of gProfiler Performance Studio's contribution to Intel's open source initiative. The dynamic profiling capability will enable:
365+
366+
1. **Hierarchical Profiling**: Profile at any level (namespace, service, job, container, process, host)
367+
2. **Scalability**: Support 165k QPM with sub-second response times
368+
3. **Auditability**: Complete execution history for compliance and troubleshooting
369+
4. **Flexibility**: Extensible architecture for future profiling modes
370+
371+
## License
372+
373+
Copyright (C) 2023 Intel Corporation
374+
375+
Licensed under the Apache License, Version 2.0 (the "License");
376+
you may not use this file except in compliance with the License.
377+
You may obtain a copy of the License at
378+
379+
http://www.apache.org/licenses/LICENSE-2.0
380+
381+
Unless required by applicable law or agreed to in writing, software
382+
distributed under the License is distributed on an "AS IS" BASIS,
383+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
384+
See the License for the specific language governing permissions and
385+
limitations under the License.
386+
387+
388+
389+

0 commit comments

Comments
 (0)