Skip to content

Commit 7a1c288

Browse files
committed
feat: add support for multiple TCX file inputs
Enable processing multiple TCX workout files with automatic sample merging and conflict resolution. Includes foundation for upcoming Final Cut Pro timeline synchronization.
1 parent 97fb293 commit 7a1c288

File tree

8 files changed

+131581
-21
lines changed

8 files changed

+131581
-21
lines changed

docs/conversations/2025-05-25-sample-index-data-structure.md

Lines changed: 559 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 125 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,125 @@
1+
---
2+
status: accepted
3+
date: 2025-06-02
4+
decision-makers: @limulus
5+
---
6+
7+
# Sample Index Data Structure for Timeline Synchronization
8+
9+
## Context and Problem Statement
10+
11+
The tcx2webvtt tool needs to synchronize TCX workout data with video editor timeline exports from Final Cut Pro. The challenge is to efficiently map timestamped workout samples to video clips, where:
12+
13+
- Multiple TCX files may be involved (different clips from different workouts)
14+
- Clips may span multiple workouts
15+
- Workouts may overlap (e.g., multiple devices recording the same activity)
16+
- Samples need to be extracted in sequential order for WebVTT output
17+
- Only samples that correspond to actual video time should be included
18+
19+
How should we structure the sample data to enable efficient range queries and sequential output generation?
20+
21+
## Decision Drivers
22+
23+
- **Efficient range queries**: Need to quickly find all samples within a time range
24+
- **Sequential access**: Must iterate through samples in chronological order for WebVTT generation
25+
- **Multiple source handling**: Support samples from multiple TCX files with potential overlaps
26+
- **Merge capability**: Handle conflicts when multiple workouts provide the same metric at the same time
27+
- **Simplicity**: Keep the implementation straightforward and maintainable
28+
29+
## Considered Options
30+
31+
- **Heap-based structure**: Store all samples in a min-heap sorted by timestamp
32+
- **Sorted array with binary search**: Maintain a sorted array of samples with binary search for range queries
33+
- **Time-indexed map**: Use a Map keyed by timestamp with grouped samples
34+
35+
## Decision Outcome
36+
37+
Chosen option: "Sorted array with binary search", because it provides the best balance of query performance and implementation simplicity for our use case.
38+
39+
### Consequences
40+
41+
- Good, because binary search enables O(log n) lookup for range boundaries
42+
- Good, because sorted arrays provide efficient sequential iteration
43+
- Good, because the structure is simple to understand and maintain
44+
- Bad, because insertions require re-sorting (but this happens only during initial load)
45+
46+
## Pros and Cons of the Options
47+
48+
### Heap-based structure
49+
50+
A min-heap that maintains samples in priority order by timestamp.
51+
52+
- Good, because efficient for extracting minimum element
53+
- Bad, because cannot efficiently query ranges without extracting all elements
54+
- Bad, because no support for random access to find samples at specific times
55+
- Bad, because would require extracting and re-inserting elements for iteration
56+
57+
### Sorted array with binary search
58+
59+
Maintain all samples in a sorted array and use binary search for range queries.
60+
61+
```typescript
62+
class SampleIndex {
63+
private samples: IndexedSample[] = []
64+
private sortedTimestamps: number[] = []
65+
66+
addWorkout(filename: string, samples: Sample[], ignoredMetrics?: Set<SampleMetric>) {
67+
// Add samples and maintain sorted order
68+
}
69+
70+
getSamplesInRange(start: Date, end: Date): Sample[] {
71+
// Binary search to find range boundaries
72+
}
73+
}
74+
```
75+
76+
- Good, because O(log n) time complexity for finding range boundaries
77+
- Good, because simple sequential iteration through ranges
78+
- Good, because memory-efficient (just arrays)
79+
- Neutral, because requires sorting after bulk insertions
80+
- Bad, because O(n) insertion time if adding samples incrementally (not an issue for our batch loading)
81+
82+
### Time-indexed map
83+
84+
Use a Map structure keyed by timestamp milliseconds with arrays of samples.
85+
86+
```typescript
87+
private samplesByTime: Map<number, Map<SampleMetric, Sample[]>> = new Map()
88+
```
89+
90+
- Good, because O(1) lookup for exact timestamps
91+
- Good, because natural grouping of samples at the same time
92+
- Bad, because requires additional index for sorted iteration
93+
- Bad, because more complex data structure to maintain
94+
- Bad, because higher memory overhead
95+
96+
## More Information
97+
98+
The implementation will include:
99+
100+
1. **Conflict Resolution**: When multiple workouts provide the same metric at the same timestamp, the tool will support:
101+
102+
- Ignoring specific metrics from certain files via CLI options
103+
- Merging samples with appropriate conflict resolution
104+
105+
2. **Sample Structure**: The existing Sample class structure (one metric per Sample object) will be preserved:
106+
107+
```typescript
108+
export class Sample<T extends SampleMetric = SampleMetric> {
109+
constructor(
110+
public readonly time: Date,
111+
public readonly metric: T,
112+
public readonly value: SampleValue<T>
113+
) {}
114+
}
115+
```
116+
117+
3. **CLI Interface**: Support for specifying ignored metrics per file:
118+
```bash
119+
tcx2webvtt --fcp project.fcpxmld \
120+
workout1.tcx \
121+
workout2.tcx:ignore=heartRate,cadence \
122+
workout3.tcx:ignore=location
123+
```
124+
125+
This design prioritizes query performance and simplicity, which aligns with the tool's primary use case of sequentially processing video timeline clips and extracting corresponding workout samples.

0 commit comments

Comments
 (0)