Skip to content

Commit 9f45947

Browse files
Alpharelotas
authored andcommitted
Add RFC189
1 parent 990f947 commit 9f45947

File tree

3 files changed

+133
-0
lines changed

3 files changed

+133
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,3 +67,4 @@ See [mechanics](mechanics.md) for more detail.
6767
| RFC#177 | [Skip CI in github integration](rfcs/0177-Skip-ci-integrations.md) |
6868
| RFC#180 | [Github cancel previous tasks](rfcs/0180-Github-cancel-previous-tasks.md) |
6969
| RFC#182 | [Allow remote references to .taskcluster.yml files processed by Taskcluster-GitHub](rfcs/0182-taskcluster-yml-remote-references.md) |
70+
| RFC#189 | [Batch APIs for task definition, status and index path](rfcs/0189-batch-task-apis.md) |

rfcs/0189-batch-task-apis.md

Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
# RFC 189 - Batch APIs for task definition, status and index path
2+
* Comments: [#189](https://github.com/taskcluster/taskcluster-rfcs/pull/189)
3+
* Proposed by: @Alphare and @ahal
4+
5+
# Summary
6+
7+
Add API endpoints to query the definition, status or index paths of multiple
8+
tasks in a single call.
9+
10+
## Motivation
11+
12+
When looking at Decision task profiles in Gecko, it was noticed that nearly 70%
13+
of the runtime (representing ~3 minutes) was spent waiting on queries to two
14+
Taskcluster APIs:
15+
16+
1. `/task/<taskId>/status`
17+
2. `/task/<indexPath>`
18+
19+
Each individual call is fairly quick, but Gecko Taskgraph's optimization phase
20+
can make thousands of these requests. Creating an API that can return all the
21+
information Taskgraph needs in a handful of API requests would greatly speed up
22+
the overall time the Queue and Index services spend looking things up in the
23+
database, as well as the time Gecko Decision tasks spend waiting on the
24+
network.
25+
26+
Note: Taskgraph doesn't actually use the `task/<taskId>` endpoint here, but
27+
this endpoint is adjacent to the other two, so for consistency it may make
28+
sense to implement a batch API for that as well.
29+
30+
### Proof of Concept
31+
32+
A proof of concept was created whereby the requests to Taskcluster were simulated
33+
such that all data could be obtained in a single API call. The overal Decision task
34+
time was reduced by ~3 minutes.
35+
36+
# Details
37+
38+
The following new APIs will be created:
39+
40+
## `queue.tasks([<taskId>])`
41+
42+
- Endpoint: `/tasks`
43+
- HTTP GET:
44+
- Request body consisting of a JSON object:
45+
```
46+
{
47+
"taskIds": [<taskId>]
48+
}
49+
```
50+
- Response body:
51+
```
52+
{
53+
"tasks": {
54+
<taskId>: <same format as `queue.task(<taskId>)`>
55+
},
56+
"continuationToken": <continuation token>
57+
}
58+
```
59+
60+
## `queue.statuses([<taskId>])`
61+
62+
- Endpoint: `/tasks/status`
63+
- HTTP GET:
64+
- Request body consisting of a JSON object:
65+
```
66+
{
67+
"taskIds": [<taskId>]
68+
}
69+
```
70+
- Response body:
71+
```
72+
{
73+
"statuses": {
74+
<taskId>: <same format as `queue.status(<taskId>)`>
75+
},
76+
"continuationToken": <continuation token>
77+
}
78+
```
79+
80+
## `index.findTasksAtIndexes([<indexPath>])`
81+
82+
- Endpoint `/tasks/indexes`
83+
- HTTP GET:
84+
- Request body consisting of a JSON object:
85+
```
86+
{
87+
"indexes": [<indexPath>]
88+
}
89+
```
90+
- Response body:
91+
```
92+
{
93+
"tasks": [<same format as `index.findTask(<indexPath>)`>]
94+
"continuationToken": <continuation token>
95+
}
96+
```
97+
98+
Each endpoint will return up to 1000 results. If this number is exceeded, a
99+
`continuationToken` will be provided.
100+
101+
There are no compatibility or security concerns, all new APIs are essentially
102+
wrapping existing APIs.
103+
104+
## Open Questions
105+
106+
1. Do we bother implementing `/tasks` as well even though Taskgraph wouldn't
107+
benefit much?
108+
2. Should `/tasks/indexes` also allow listing multiple tasks under multiple
109+
namespaces? Or should we enforce index paths pointing to specific tasks?
110+
3. Should we bother with continuationTokens? Or simply set a limit and force
111+
consumers to chunk their own task ids and index paths if they exceed the
112+
limit?
113+
114+
# Implementation
115+
116+
<Once the RFC is decided, these links will provide readers a way to track the
117+
implementation through to completion, and to know if they are running a new
118+
enough version to take advantage of this change. It's fine to update this
119+
section using short PRs or pushing directly to master after the RFC is
120+
decided>
121+
122+
* [Original feature request issue](https://github.com/taskcluster/taskcluster/issues/6738)
123+
124+
# Addendum
125+
126+
1. Command used for Gecko profiling:
127+
```
128+
py-spy record -F --idle --format speedscope -o output.json -- ./mach taskgraph morphed -p taskcluster/test/params/mc-onpush.yml
129+
```
130+
2. Profiling results: ![20231211_10h54m47s_grim](https://github.com/taskcluster/taskcluster/assets/9445758/62c400cc-a125-4f08-b7dd-c8bc9a9e9a6d)
131+
3. Proof of concept profiling results: ![20231211_10h55m01s_grim](https://github.com/taskcluster/taskcluster/assets/9445758/1849c8a1-fcc0-403b-acaf-ea997c875505)

rfcs/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,3 +55,4 @@
5555
| RFC#177 | [Skip CI in github integration](0177-Skip-ci-integrations.md) |
5656
| RFC#180 | [Github cancel previous tasks](0180-Github-cancel-previous-tasks.md) |
5757
| RFC#182 | [Allow remote references to .taskcluster.yml files processed by Taskcluster-GitHub](0182-taskcluster-yml-remote-references.md) |
58+
| RFC#189 | [Batch APIs for task definition, status and index path](0189-batch-task-apis.md) |

0 commit comments

Comments
 (0)