Skip to content

Conversation

@Kobzol
Copy link
Member

@Kobzol Kobzol commented Jul 6, 2025

We have started hitting this a few times today when uploading the self-profile artifacts to S3:

upload failed: ../../../tmp/.tmp1aGYxm to s3://rustc-perf/self-profile/67394/derive/check/full/self-profile-32456021.mm_profdata.sz SSL validation failed for https://rustc-perf.s3.us-west-1.amazonaws.com/self-profile/67394/derive/check/full/self-profile-32456021.mm_profdata.sz EOF occurred in violation of protocol (_ssl.c:2406)

I don't think it's worth it to crash/interrupt the whole collection when a self-profile file can't be uploaded, we use them very sparsely anyway, and if the unlikely case happened that an important self-profile case was missing, we can just rerun locally.

@Kobzol Kobzol requested a review from Mark-Simulacrum July 6, 2025 10:22
@Mark-Simulacrum
Copy link
Member

unlikely case happened that an important self-profile case was missing, we can just rerun locally

I'm not sure I agree. Running locally isn't completely trivial -- you need to stand up the site, figure out the right arguments, etc.?

I believe this error is not entirely unknown (rust-lang/promote-release#77) but unfortunately we've never been able to narrow down the exact conditions that trigger it (including IPs we're hitting, maybe a tcpdump) to help AWS fix it.

I think we should add a retry to the upload for now rather than just ignoring the upload failure. If there's something more streamlined compared to panic that we should do (maybe treat it as a benchmark failure?) that's fine with me too.

@Kobzol
Copy link
Member Author

Kobzol commented Jul 6, 2025

I'm not sure I agree. Running locally isn't completely trivial -- you need to stand up the site, figure out the right arguments, etc.?

I mean, sure, but for this to happen, you would have to see a benchmark run with a non-trivial regression, where we failed to upload the self-profile for that regression specifically. We don't look at something like 99% of uploaded self-profiles, they are just never accessed.

We had 8 failures already today, which costs a lot of time rerunning the benchmarks.. not sure what's happening.

@Kobzol Kobzol mentioned this pull request Jul 6, 2025
@Kobzol
Copy link
Member Author

Kobzol commented Jul 6, 2025

Merging as a hotfix, to unblock the collector. S3 upload seems to be acting up today..

@Kobzol Kobzol merged commit ee0b821 into rust-lang:master Jul 6, 2025
11 checks passed
@Kobzol Kobzol deleted the dont-panic-s3 branch July 6, 2025 14:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants