Skip to content

Conversation

@doanac
Copy link
Contributor

@doanac doanac commented Apr 8, 2025

No description provided.

@doanac doanac force-pushed the aws-runners branch 2 times, most recently from 217fe1f to 686c66c Compare April 21, 2025 18:33
@doanac
Copy link
Contributor Author

doanac commented Apr 21, 2025

@lool - this is probably going to be close to what i'll try and get merged. It would be good to get your initial feedback.

Ignore how long the workflow takes. Its a known network configuration issue the IT team is working on fixing.

@lool
Copy link
Contributor

lool commented Apr 21, 2025

@doanac IIUC, in theory this should pretty much be the same as before, except running in AWS instead of GCP

The yaml looks fine, I see the arm64 runner is named something arm64, which is good; I wonder how the amd64 ones be named? It would be nice if they have amd64 in the names.

The path to the fileserver has changed, but it's pretty much transparent here.

The only difficulty I have is in accessing the build log. NB: I currently don't have a Qualcomm AWS account.

My dream would be:

  • anyone can access build logs with no login, and we can opt to make build logs private (or vice-versa); perhaps this should follow the private status of the repo?
  • the build log is sent live to github
  • the build log is as nice to read as the github one, with bookmarks / foldable sections

@doanac
Copy link
Contributor Author

doanac commented Apr 21, 2025

All the build logs are viewable except for the "AWS CodeBuild" log. That's a known issue they are trying to address but its content have nothing to do with our build and can be ignored.

@lool
Copy link
Contributor

lool commented Apr 21, 2025

Ah nevermind, indeed, github log is just there :)

Comparing the two, I guess we'll have the same expected github features and UX.

Looking at build performance, the AWS runners are 30% slower in the largest step of building the image, but the artifacts upload phase is 6x faster. Overall the build was 15% slower, which is acceptable.

What's the plan, do you want us to run these builds next to the github runner ones, or should we switch the default to the AWS ones when we're ready?

I'm landing a few changes related to fileserver in the RB1 pull request which I would really like to land, but otherwise happy to start using AWS arm64 runners.

@lool
Copy link
Contributor

lool commented Apr 21, 2025

Forgot to ask: do we have access to all instance types? I'm curious if there's one with nested virt that would perform better than QEMU for this particular workflow.

@doanac
Copy link
Contributor Author

doanac commented Apr 21, 2025

I don't know enough about AWS machinery to know if its possible to do nested virt. I'm doubtful.

On performance - most of the slowness is network I/O. They are routing traffic in an inefficient way but know what to change to make that better. I think they'll be almost the same. On file upload - it happens during a "magic" step that's async to this - so there's no real performance number you'll get on this right now.

I was thinking you could run them side by side for a bit to get confidence in it. You'll also need them side by side because LAVA isn't yet ready to use their output. However, it does duplicate code during the interim.

doanac added 3 commits May 2, 2025 11:56
This is an exact copy of debos.yml so that we can have a clear view of
what is being changed for AWS while also making it easier to rebase onto
future changes that might happen while this is being reviewed and
tested.

Signed-off-by: Andy Doan <[email protected]>
Signed-off-by: Andy Doan <[email protected]>
@lool
Copy link
Contributor

lool commented May 5, 2025

Thanks for refreshing! As you saw, workflows are seeing somewhat large updates; there's another effort with LAVA CI that will require shuffling things a bit, and then I think it will be quieter at least on the image workflows for a while.

When I looked at the first proposal, the differences between the GCP hosted runners and the AWS ones were very minor (IIRC, basically the tags to select runners and the path to to volumes passed to the container image), so it should be easy to forward port this at that point.

(Do let me know if you think we should merge this now as to give it enough exposure though)

@doanac
Copy link
Contributor Author

doanac commented May 5, 2025

its close but I found out on Friday there's a new builder they are going to give us that should be a little better than this on. I'm waiting for the info on that and then I'll let you know

@doanac doanac closed this May 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants