Skip to content

Conversation

@rfredette
Copy link
Contributor

Before attempting to publish a domain to a zone, check if that domain is already being published to the same zone.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 7, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 7, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

for _, existingRecord := range records.Items {
// we only care if the domain name is published by a different record, so ignore the matching record if it
// already exists.
// TODO: There's got to be a better way to match the same object
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compare by UID instead of name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still a relevant comment, but I moved the function, which made github think it's outdated. I'll make this change in my next update

}
} else if isRecordPublished {
condition, err = r.replacePublishedRecord(zones[i], record)
} else if isDomainPublished, err = domainIsAlreadyPublishedInZone(context.Background(), r.cache, record, &zones[i]); err != nil {
Copy link
Contributor

@Miciah Miciah May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to declare isDomainPublished outside the else if clause. Better to keep isDomainPublished and err scoped to the else if clauses.

Suggested change
} else if isDomainPublished, err = domainIsAlreadyPublishedInZone(context.Background(), r.cache, record, &zones[i]); err != nil {
} else if isDomainPublished, err := domainIsAlreadyPublishedInZone(context.Background(), r.cache, record, &zones[i]); err != nil {

Edit: Discussed on a call. Line 388 uses the err value. This logic is a bit subtle and could use some refactoring.

@rfredette rfredette force-pushed the no-conflicting-dns branch from 9fece4b to d38f7bf Compare May 15, 2025 17:33
func (r *reconciler) MapOnRecordDelete(ctx context.Context, o client.Object) []reconcile.Request {
deletedRecord, ok := o.(*iov1.DNSRecord)
if !ok {
log.Info("failed to read DNS record")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
log.Info("failed to read DNS record")
log.Infof("Got unexpected object; expected type DNSRecord, got type %T", o)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or something more like this:

Suggested change
log.Info("failed to read DNS record")
log.Error(nil, "Got unexpected type of object", "expected", "DNSRecord", "actual", fmt.Sprintf("%T", o))

Comment on lines 86 to 88
// When a DNS record is deleted, there may be a conflicting record that should be published. Watch exclusively for
// deletes, and queue a reconcile request for the appropriate conflicting record, if applicable.
if err := c.Watch(source.Kind[client.Object](operatorCache, &iov1.DNSRecord{}, handler.EnqueueRequestsFromMapFunc(reconciler.MapOnRecordDelete), predicate.Funcs{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make the comment a little more explicit that, yes, we have two watches on the same resource (dnsrecords), and the reason is so that we can have a predicate and mapfunc to do something special for deletes.

Comment on lines 86 to 91
// When a DNS record is deleted, there may be a conflicting record that should be published. Watch exclusively for
// deletes, and queue a reconcile request for the appropriate conflicting record, if applicable.
if err := c.Watch(source.Kind[client.Object](operatorCache, &iov1.DNSRecord{}, handler.EnqueueRequestsFromMapFunc(reconciler.MapOnRecordDelete), predicate.Funcs{
CreateFunc: func(e event.CreateEvent) bool { return false },
DeleteFunc: func(e event.DeleteEvent) bool { return true },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment with your findings that the delete event happens when the object is actually deleted, not when it is merely marked for deletion (that is, when deletionTimestamp is set).

// Test_publishRecordToZonesMergesStatus verifies that publishRecordToZones
// correctly merges status updates.
func TestPublishRecordToZonesMergesStatus(t *testing.T) {
func Test_publishRecordToZonesMergesStatus(t *testing.T) {
Copy link
Contributor

@Miciah Miciah May 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestPublishRecordToZonesMergesStatus is an appropriate name for the test as there is no publishRecordToZonesMergesStatus function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably I am also missing something here, but is there a new test case that will check if the condition is set?

@rfredette rfredette changed the title Don't publish duplicate DNS records OCPBUGS-31521: Don't publish duplicate DNS records Aug 13, 2025
@openshift-ci-robot openshift-ci-robot added jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Aug 13, 2025
@openshift-ci-robot
Copy link
Contributor

@rfredette: This pull request references Jira Issue OCPBUGS-31521, which is invalid:

  • expected the bug to target either version "4.20." or "openshift-4.20.", but it targets "4.19.z" instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Before attempting to publish a domain to a zone, check if that domain is already being published to the same zone.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@rfredette rfredette marked this pull request as ready for review August 13, 2025 21:21
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 13, 2025
@openshift-ci openshift-ci bot requested review from Thealisyed and miheer August 13, 2025 21:22
iov1.AddToScheme(scheme)
fakeClient := fake.NewClientBuilder().
WithScheme(scheme).
WithIndex(&iov1.DNSRecord{}, dnsRecordIndexFieldName, func(o client.Object) []string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (no need to fix now!): do we care about making this indexer function some sort of utils/specific function that can be used both on the operatorCache.IndexField and on fakeCache to keep consistency?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean a util function that the controller logic and test logic would share? That does make sense, though I do caution against re-using controller logic in tests if doing so could mask a defect in the controller logic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the idea was to make some util/shared function, but also I see your concerns here, so makes sense also to not share and in case something changes on the main reconciliation logic, the test that has a different cache logic will catch the regression.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent here is to use the same index function that was added in lines 117-122. Since the fake cache doesn't go through all the setup steps that the actual one does, it needed to be added manually. In this case, having the logic match what's used in the actual controller probably is the way to go.

@candita
Copy link
Contributor

candita commented Aug 20, 2025

/assign @rikatz

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 28, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from rikatz. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Before attempting to publish a domain to a zone, check if that domain is
already being published to the same zone.
@rikatz
Copy link
Member

rikatz commented Aug 28, 2025

/cc
Re-asking my review, will do it first thing tomorrow morning!

@openshift-ci openshift-ci bot requested a review from rikatz August 28, 2025 20:06
oldestExistingRecord := iov1.DNSRecord{}
for _, existingRecord := range otherRecords.Items {
// Exclude records that are marked for deletion.
if !existingRecord.DeletionTimestamp.IsZero() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just for safety (sorry for realizing this just now): DeletionTimestamp is a nullable field / a pointer (https://github.com/kubernetes/apimachinery/blob/d74026bbe3beeff64c3dc7259a29be7708aa834f/pkg/apis/meta/v1/types.go#L209) and as so, I would recommend checking if it is null, and then checking if it is zero.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The IsZero method has an nil check on its receiver, so I think the caller can omit the nil check?

// IsZero returns true if the value is nil or time is zero.
func (t *Time) IsZero() bool {
if t == nil {
return true

I would be happy with a unit test case in lieu of a nil check.

@Miciah
Copy link
Contributor

Miciah commented Sep 22, 2025

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Sep 22, 2025
@openshift-ci-robot
Copy link
Contributor

@Miciah: This pull request references Jira Issue OCPBUGS-31521, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.21.0) matches configured target version for branch (4.21.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira ([email protected]), skipping review request.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@rhamini3
Copy link
Contributor

marking bug as verified, since it is fixed through pre-merge testing

  1. Create a gateway with the same ingress controller domain and check if it is published
  status:
    observedGeneration: 1
    zones:
    - conditions:
      - message: Domain name is already in use
        reason: DomainAlreadyInUse
        status: "False"
        type: Published
      dnsZone:
        tags:
          Name: ci-ln-ti0k73k-76ef8-jkss7-int
          kubernetes.io/cluster/ci-ln-ti0k73k-76ef8-jkss7: owned
    - conditions:
      - message: Domain name is already in use
        reason: DomainAlreadyInUse
        status: "False"
        type: Published
      dnsZone:
        id: Z00287062J1ITQ61DDU2Z
  1. Delete the gateway and confirm that upstream dnsrecord and routes are not affected
% oc get dnsrecord -A
NAMESPACE                    NAME               AGE
openshift-ingress-operator   default-wildcard   145m

% curl -I oauth-openshift.apps.ci-ln-ti0k73k-76ef8.aws-2.ci.openshift.org
HTTP/1.1 302 Found
content-length: 0
location: https://oauth-openshift.apps.ci-ln-ti0k73k-76ef8.aws-2.ci.openshift.org/
cache-control: no-cache

iamin@iamin-mac cluster-ingress-operator % curl -I canary-openshift-ingress-canary.apps.ci-ln-ti0k73k-76ef8.aws-2.ci.openshift.org 
HTTP/1.1 302 Found
content-length: 0
location: https://canary-openshift-ingress-canary.apps.ci-ln-ti0k73k-76ef8.aws-2.ci.openshift.org/
cache-control: no-cache

/verified by rhamini3

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Sep 22, 2025
@openshift-ci-robot
Copy link
Contributor

@rhamini3: This PR has been marked as verified by rhamini3.

In response to this:

marking bug as verified, since it is fixed through pre-merge testing

  1. Create a gateway with the same ingress controller domain and check if it is published
 status:
   observedGeneration: 1
   zones:
   - conditions:
     - message: Domain name is already in use
       reason: DomainAlreadyInUse
       status: "False"
       type: Published
     dnsZone:
       tags:
         Name: ci-ln-ti0k73k-76ef8-jkss7-int
         kubernetes.io/cluster/ci-ln-ti0k73k-76ef8-jkss7: owned
   - conditions:
     - message: Domain name is already in use
       reason: DomainAlreadyInUse
       status: "False"
       type: Published
     dnsZone:
       id: Z00287062J1ITQ61DDU2Z
  1. Delete the gateway and confirm that upstream dnsrecord and routes are not affected
% oc get dnsrecord -A
NAMESPACE                    NAME               AGE
openshift-ingress-operator   default-wildcard   145m

% curl -I oauth-openshift.apps.ci-ln-ti0k73k-76ef8.aws-2.ci.openshift.org
HTTP/1.1 302 Found
content-length: 0
location: https://oauth-openshift.apps.ci-ln-ti0k73k-76ef8.aws-2.ci.openshift.org/
cache-control: no-cache

iamin@iamin-mac cluster-ingress-operator % curl -I canary-openshift-ingress-canary.apps.ci-ln-ti0k73k-76ef8.aws-2.ci.openshift.org 
HTTP/1.1 302 Found
content-length: 0
location: https://canary-openshift-ingress-canary.apps.ci-ln-ti0k73k-76ef8.aws-2.ci.openshift.org/
cache-control: no-cache

/verified by rhamini3

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 26, 2025

@rfredette: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp-ovn 4822baa link false /test e2e-gcp-ovn
ci/prow/okd-scos-e2e-aws-ovn 4822baa link false /test okd-scos-e2e-aws-ovn
ci/prow/e2e-aws-operator 4822baa link true /test e2e-aws-operator
ci/prow/e2e-aws-ovn 4822baa link true /test e2e-aws-ovn
ci/prow/hypershift-e2e-aks 4822baa link true /test hypershift-e2e-aks
ci/prow/e2e-aws-ovn-serial 4822baa link true /test e2e-aws-ovn-serial
ci/prow/e2e-aws-ovn-upgrade 4822baa link true /test e2e-aws-ovn-upgrade
ci/prow/e2e-azure-operator 4822baa link true /test e2e-azure-operator

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@candita
Copy link
Contributor

candita commented Nov 6, 2025

E2E test failure seen on this, reported by @rikatz in https://issues.redhat.com/browse/OCPBUGS-64675.

 --- FAIL: TestAll/serial/TestGatewayAPI/testGatewayAPIDNS (341.29s)
            --- PASS: TestAll/serial/TestGatewayAPI/testGatewayAPIDNS/multipleGatewaysSameListenerHostname (161.16s)
            --- FAIL: TestAll/serial/TestGatewayAPI/testGatewayAPIDNS/gatewayListenersWithOverlappingHostname (180.09s)

panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x1e2adcf]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants