Skip to content

Conversation

MatthiasWerning
Copy link

@MatthiasWerning MatthiasWerning commented Jul 25, 2024

... by maintaining only one registration across all connection recoveries

Proposed Changes

Replaced registrations in TimerBasedCredentialRefresher by custom class to maintain only one registration across Connection recovery. Callback will be updated for the registration on each subsequent Register call.

Since multiple dictionary operations are now required to be atomic, locking is now done via a helper object for each TimerBasedCredentialRefresher instance and passed to the registration to check if it has been disposed yet by the Unregister method.

Types of Changes

What types of changes does your code introduce to this project?
Put an x in the boxes that apply

Checklist

CLA will be an issue within my company which I cannot promise will be resolved in a few days (this has been an issue earlier with another CLA which also isn't finally resolved yet). Unfortunately I'm not able to make decisions on behalf of my company.

  • I have read the CONTRIBUTING.md document
  • I have signed the CLA (see https://cla.pivotal.io/sign/rabbitmq)
  • All tests pass locally with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)
  • Any dependent changes have been merged and published in related repositories

@pivotal-cla
Copy link

@MatthiasWerning Please sign the Contributor License Agreement!

Click here to manually synchronize the status of this Pull Request.

See the FAQ for frequently asked questions.

@lukebakken lukebakken self-requested a review July 25, 2024 18:22
@lukebakken lukebakken self-assigned this Jul 25, 2024
@lukebakken lukebakken added this to the 7.0.0 milestone Jul 25, 2024
@MatthiasWerning
Copy link
Author

Okay, so everything is done from my side, including local testing from within my components and adding a test case for the fixed behavior.

I triggered all necessary instances internally at my company to get the CLA signed, I hope this can be resolved next week at latest.

If you have any feedback or suggestions for code changes let me know.

@MatthiasWerning MatthiasWerning marked this pull request as ready for review July 26, 2024 12:59

public ICredentialsRefresher.NotifyCredentialRefreshedAsync Callback { get; set; }

public TimerRegistration(object lockObj, ICredentialsRefresher.NotifyCredentialRefreshedAsync callback)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh, this is tricky and potentially tricky to debug if errors happen, you use the same lock object in your main type.
The general recommendation is not to use the same lock object for multiple resources ands that's the case here, in the main type for your dictionary and here for your elapsed logic.

https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/statements/lock#guidelines

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually something I noticed as well, but I'm quite unsure on what would be an alternative. Unregistering may happen at any time, even when the timer elapsed event is being triggered in parallel. When removing the lock the worst case would be, that the timer runs one more cycle and then shuts itself down. I guess this is acceptable in order to avoid passing the lock object down. Any other ideas?

}

provider.Refresh();
ScheduleTimer(provider);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this called again?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timer is initialized using AutoReset = false which leads to the timer Elapsed event only being thrown once and never again. Therefore a new timer needs to be scheduled to keep the "chain" of timers running until it is unregistered by the owning TimerBasedCredentialRefresher.

I did not alter this behavior from the version before this PR, see here:
https://github.com/rabbitmq/rabbitmq-dotnet-client/blob/main/projects/RabbitMQ.Client/client/api/ICredentialsRefresher.cs#L126

Copy link
Contributor

@Tornhoof Tornhoof Jul 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay, I'm not really familiar with that specific timer, there are simply too many in .NET.
If you call ScheduleTimer again, you need to make sure you dispose any existing old timer, so something like this might be necessary (line 180+)

{
    var oldTimer = _timer;
    _timer = newTimer;
   oldTimer.Dispose();
}

I'm not sure, but I think using it with autoreset might be cleaner.

Just for my understanding, there are possibly multiple ICredentialsProvider provider and each might have a different refresh timeout? That's why there are multiple timers and it's not possible to have like the least common multiple timeout (besides 1s or something similar) timer and since old versions of .NET are supported, there is no PriorityQueue which might simplify that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know the timer has to be disposed, next commit will include that.

From my perspective (I'm not a C# expert) using a chain of self triggering timers is a cleaner approach than a single persisting timer firing in a specific interval. This is because the elapse event method may take an indefinite amount of time to complete. So theoretically if the elapse method requires more time than the interval itself we get some type of congestion when multiple elapse methods run and update stuff. Of course in this scenario we would have other problems as well. Thus the elapse method may never take longer than 1/3 of the credentials lifetime (JWT for example).
Also see this article here for an more in-depth example (JS, but basically transferable to this use case here): https://reallifejs.com/brainchunks/repeated-events-timeout-or-interval/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that makes sense, it might also make sense to look into making the ICredentialsProvider properly async in 8.0 and then looking at better ways to do periodic refresh, i.e. PeriodicTimer or some maybe even a dumb refresh/delay loop might be enough.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might also make sense to look into making the ICredentialsProvider properly async in 8.0

No time like the present ... before version 7 ships!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No time like the present ... before version 7 ships!

Yeah, I just took a good long look at it, there is quite a bit of change there, including figuring out the problem this PR solves, just with the added complexity of async and different target frameworks (only .NET 6 supports PeriodicTimer, which is the only real async capable timer, including the fact that we don't know the timer period until the token has been refreshed atleast once and it might change while the timer is running), then a lot of interfaces and related types need to be changed.
And probably a different concept for static credentials (basic auth) vs. dynamic stuff (oauth2).

I tried out the above here:
Tornhoof@8ec63c1

I personally don't know how much in use the OAuth2 stuff is, I've never used it with RabbitMQ, imho the OAuthClient requires quite a bit of overhaul (see my changes above), a lot of disposes are missing, questionable use of System.Text.Json and the whole sync over async code is questionable at best.

If you really want to release 7.0 this month, I'd recommend moving this to 8.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate your input, and I'll discuss with @MarcialRosales and other Team RabbitMQ members next week.

{
if (provider.ValidUntil == null)
{
throw new ArgumentNullException("ValidUntil of " + nameof(provider) + " was null");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nameof(provider) will just say "provider", you probably want provider.GetType().Name

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will resolve this in the next commit.

@lukebakken
Copy link
Collaborator

@Tornhoof @MatthiasWerning thanks for looking into this issue. The original author, @MarcialRosales is also reviewing.

@MatthiasWerning
Copy link
Author

Made code changes and refactorings as suggested by @Tornhoof .

I hope my manager will be able to sign the CLA this week, it's pretty much out of my hand as of right now.

@danielmarbach
Copy link
Collaborator

Isn't there something off with the refresher design that requires addressing instead?

  • It seems unregister is never called, but it is required to be implemented by the implementor of the interface
  • The relationship between the provider, the callback and the connection seems to be not expressed in the design and might cause very complicated implementations

Let me try to explain myself a bit more.

I had a brief look at the issue description and at this PR. I thought in order to have one timer it would be possible to change the concurrent dictionary to use a Lazy<Timer> with execution and publication synchronization mode. Then there is only ever one timer created per connection provider. This might make sense from a relationship perspective but then I tried to reason around what the relationship between the provider and the callbacks is and found out that essentially there will be one timer per provider but a callback per connection.

The callbacks per connection though are never really expressed, which makes it difficult to have deterministic behavior when to remove callbacks etc.

I might not have spent enough time with the design but something is smelly.

@danielmarbach
Copy link
Collaborator

@lukebakken it seems AutoRecovery tries to abort the inner connection if it was opened but never disposes the inner connection. Is that intentional?

Because if the auto recovery connection would deterministically dispose the inner connection, this could be used to unregister the callback.

But then you'd still have the problem that you have no deterministic point to dispose the timer and that while the new inner connection is opened refresh might be called on the old connection when the timer is due, which might not be desirable.

I also wonder if it wouldn't be simpler to switch to System.Threading.Timer or Periodic Timer. At least with the System.Threading.Timer you could use the change method in the callback method to make the timer fire again and wouldn't have to recreate new timers again and again, which might also simplify the code. But even with those timers you would require disposing of the disposable timer which brings me back to thinking the refresher concept might need an some more thinking.

@lukebakken
Copy link
Collaborator

it seems AutoRecovery tries to abort the inner connection if it was opened but never disposes the inner connection. Is that intentional?

I don't even see where the inner connection is aborted, to be honest. Not disposing the inner connection is certainly an oversight.

@MarcialRosales @MatthiasWerning I have opened the following PR to this PR:

MatthiasWerning#1

I couldn't see the rationale for having multiple credentials provider registrations in TimerBasedCredentialRefresher so I removed that. There are also some other changes that I'd like to get reviewed.

I'm going to discuss this feature with the original implementor, @MarcialRosales, tomorrow.

@danielmarbach
Copy link
Collaborator

danielmarbach commented Jul 30, 2024

@lukebakken I had another brief look and it seems the whole existing design seems a bit raw. For example, the token refreshing in any production grade implementation would always do some sort of IO-bound call which is quite likely async. The credential provider isn't.

The IOAuth2Client implementation is also not async and the token uses a weird mixture of casing on the properties.

Next up, it seems there is always a one-to-one relationship by the refresher and the provider currently in the code. So why can't the design be simplified to basically initialize the refresher with the provider and let the code register delegates. Just spit balling ideas at the top of my head something like

    public interface ICredentialsRefresher
    {
        void Initialize(ICredentialsProvider provider);
        IDisposable Register(NotifyCredentialRefreshed callback);
    }

Then the connection can dispose the returned registration, which then removes it from the internal registration list in a thread safe manner.

Another approach could be to essentially marry everything together by having a token provider only that the client always calls when it needs a token and let it up to the implementation to do the necessary async fetching, caching etc.

See for example how Azure does it https://github.com/search?q=repo%3AAzure/azure-sdk-for-net%20GetTokenAsync&type=code

https://github.com/Azure/azure-sdk-for-net/blob/2030bb1452084ecc16de0637ec320e0af64ae4d2/sdk/core/Azure.Core/src/TokenCredential.cs#L22

I think having a single TokenProvider interface that returns a ValueTask with a token is probably the best approach to simplify the code

Again I have only spent a few minutes looking at things due to time constraints on my end so take this with a grain of salt.

@lukebakken
Copy link
Collaborator

@danielmarbach thank you for the feedback!

@danielmarbach
Copy link
Collaborator

FYI you are aware the v6.x version has the same problem?

@lukebakken
Copy link
Collaborator

FYI you are aware the v6.x version has the same problem?

Yes, of course.

@lukebakken
Copy link
Collaborator

@MatthiasWerning - thanks for all of the work you have put into this. I thought about the whole "credential refresh" feature and I think I have a totally different implementation in mind. I'll open a new PR with that when it's done and will tag you, @danielmarbach and @MarcialRosales when it's ready (later today, hopefully).

I'm going to close this PR but please don't delete your branch just yet, in case I'm way off-base 😸

@lukebakken lukebakken closed this Jul 31, 2024
@lukebakken lukebakken removed this from the 7.0.0 milestone Jul 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants