GitHub CI notifications and main branch

I know things are in flux, but some early feedback on whatever is being setup right now.

I got an email after I a merged someone’s PR:

I don’t have much context here, but:

  1. it reports an issue with a “premerge” run which is confusing since it is actually a “push” on master.
  2. There isn’t much information about the actual commit, I don’t know which one of mine this is about, there is the hash in the email title but it’s not clickable and I also wouldn’t want to need to click to have this information.
  3. The only link I have is: [MLIR] Documentation updates and typo fixes (#125273) · llvm/llvm-project@3266f9c · GitHub ; where scrolling somehow tells me that this is an infra failure.
  4. The buildbot only notifies when the build was green and my change make it red. Here it seems it’s not the case: that would be a lot of noise for little signal. Figuring out the state of main isn’t straightforward from the only link I have, I needed to figure out that when I got back to the workflow, I must filter branch:main at the top right, then click “next page” until I find my change to see if the previous one was already broken.
  5. This filtering and history of the main branch would lump all the test/configs in this workflow, right now both linux and windows. That means that if they are broken in an interleaved fashion it’s impossible to easily figure out the range of broken commit on one or the other.
1 Like

This is coming from the premerge pipeline being run against main. We are planning on running postcommit testing against main using Github actions to eliminate any infrastructure differences that could potentially produce different results. This has been a significant problem for the existing premerge infrastructure where the configurations differ enough that there have been a multitude of instances where one was broken or the other was not. I believe this was communicated at some point in the past on Discourse and was discussed during our premerge talk at the 2024 US Dev meeting.

I don’t believe there is an easy for us to change the job name based on whether or not it is precommit or postcommit testing. I’m not sure this is a large regression over the current setup. The buildbots that currently test the premerge infra are named premerge-monolithic-linux and premerge-monolithic-windows which could be confusing unless you already know that buildbots only run postcommit.

This just seems to be poor UI on Github’s side. More information in the email would probably be nice, but the link does have the rest of the information/links to everything else.

That page should have just about everything needed for diagnosis. That was an infra failure that has since been fixed. There are also links to the individual jobs and logs. If it was a build failure or test failures rather than an infra failure, there would have been more detailed information available directly on that page.

Yes, that is a regression over the current system. It should be possible to make it only report failures on newly breaking changes. Github doesn’t give us precise control over email notifications from within the workflow though, so we would just have to make it flag the job as green/tell it to ignore failures in the step. Ideally we want to work on advancing efforts like [RFC] Add a warning when bypassing the premerge testing so that main is broken quite rarely. It should also be possible to add links to easily check the state of main at the previous commit if that’s helpful.

I’m not really sure what you mean? They show up as separate jobs when looking at the jobs on an individual commit:

Like all of the significant changes to workflows impacting the community, this deserves a proper RFC IMO.

That should only indicate a different docker image? I don’t quite see the coupling to using GitHub to trigger the build script?

Yes, I’m slightly concerned about the “poor UI from GitHub” which will keep biting us all the way right now.

OK, is this already planned?

I’m not looking at knowing the state of one commit in particular, instead I want the state of one config across the history of the repo.
This the view I am using the most for post-merge, every single time a bot is broken and I have to understand why I am looking for this view: Buildbot

This view is a must for a post-merge system IMO, and I don’t know where to find this in the GitHub UI?

Alright. I guess I’ll write one up, hopefully in time for the area infra meeting this Thursday.

It’s a completely different testing setup too. One we can theoretically replicate through buildbot, but it’s not guaranteed and we get additional maintenance overhead from having to maintain both.

I do not think it is too terrible in most cases, just something that will take a bit of time to get used to.

Not soon. If it is something that the community deems blocking/quite important I can move it up the priority list.

It isn’t as easily accessible, but definitely exists.

Do you have an example to make it concrete what you mean? If you have the same docker image, and you run the same script to trigger a build+test, why would one pass but not the other?

That’s good to know! Can you point out where/how to get this?

Maybe we can remove “pre-merge” from the name then? There is nothing specific left to “pre-merge” in the job at this point.

That should work fine. I think the main issue is that the current premerge just uses a generic CMake builder inside zorg rather than running the actual premerge script. I’m not entirely sure that’s the only difference though, and the last couple of times differences have come up I have not spent a lot of time digging into them. We lose some nice to haves like autoscaling by going that route though that I will comment on more thoroughly when I write up an RFC.

Actions tab → premerge option in the left hand column → add branch:main event:push to the search query.

I’m not sure Github supports different job names between events, although it might be possible. We’re using the same exact workflow file and just setting it to run both on pull requests and pushes to main.

Right: I’m saying we can just name it “windows build & testing” without “pre-merge”, for both the pre-merge and post-merge. Because adding “pre-merge” to the name becomes not longer relevant for this job, and the context where we see it is enough to discriminate (when you see “windows build” on your PR, there is no ambiguity that it is “pre-merge testing”).

Thanks!

By the way isn’t this incompatible with the following? We would never see a range as far as I can understand from what you’re proposing?

Yes. I’m trying to say we might not be able to trivially do that. I will look into what Github supports and see if we can make that happen.

The following would be interfacing with the Github API where we do not run into any UI problems. All the data is there, it’s just a matter of whether or not it’s easily visible.

1 Like

You were saying we can’t name it differently between event, but I’m not suggesting this, I was thinking of just changing the name everywhere.
We may be talking past each other right now though.

Looking at the implementation it may just be this diff:

diff --git a/.ci/metrics/metrics.py b/.ci/metrics/metrics.py
index a5ee893650d6..db5961e0c810 100644
--- a/.ci/metrics/metrics.py
+++ b/.ci/metrics/metrics.py
@@ -21,7 +21,7 @@ SCRAPE_INTERVAL_SECONDS = 5 * 60
 # Lists the Github workflows we want to track. Maps the Github job name to
 # the metric name prefix in grafana.
 # This metric name is also used as a key in the job->name map.
-GITHUB_WORKFLOW_TO_TRACK = {"LLVM Premerge Checks": "github_llvm_premerge_checks"}
+GITHUB_WORKFLOW_TO_TRACK = {"Build & Test": "github_llvm_premerge_checks"}
 
 # Lists the Github jobs to track for a given workflow. The key is the stable
 # name (metric name) of the workflow (see GITHUB_WORKFLOW_TO_TRACK).
@@ -29,8 +29,8 @@ GITHUB_WORKFLOW_TO_TRACK = {"LLVM Premerge Checks": "github_llvm_premerge_checks
 # name.
 GITHUB_JOB_TO_TRACK = {
     "github_llvm_premerge_checks": {
-        "Linux Premerge Checks (Test Only - Please Ignore Results)": "premerge_linux",
-        "Windows Premerge Checks (Test Only - Please Ignore Results)": "premerge_windows",
+        "Linux Checks (Test Only - Please Ignore Results)": "premerge_linux",
+        "Windows Checks (Test Only - Please Ignore Results)": "premerge_windows",
     }
 }

I suspect that would be enough to fix all the labels in the screenshot I posted in the initial message of this thread.

Similarly, these labels would look good on a pull-request here I believe:

would look like:

Ah, sorry for misunderstanding you initially. I think that definitely makes sense to do. I’ll put a patch up (it’s slightly more complicated than what you have there).

1 Like