Skip to content

draft: feat: blue/green#244

Open
JuanLeee wants to merge 1 commit intoaws:mainfrom
JuanLeee:feat/bg
Open

draft: feat: blue/green#244
JuanLeee wants to merge 1 commit intoaws:mainfrom
JuanLeee:feat/bg

Conversation

@JuanLeee
Copy link
Contributor

Summary

Description

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

});

if (EnabledFileLog)
if (true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder to revert this change before merge unless this is intentional

{
return this.pluginManager.IsPluginActive(pluginName);
}
catch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, could this.pluginManager.IsPluginActive(pluginName) actually throw errors?


protected static readonly string AuroraPostgreSqlBgStatusQuery =
"SELECT * FROM " +
$"pg_catalog.get_blue_green_fast_switchover_metadata('aws_dotnet_driver')";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could we use the full wrapper name instead of aws_dotnet_driver?


public static bool IsBlueGreenConnectionDialect(IDialect dialect)
{
return dialect is AuroraMySqlDialect or AuroraPgDialect or RdsMySqlDialect or RdsPgDialect;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Say customer is using RdsMySQLDialect. Will this code error out on dialect is AuroraMySqlDialect because AuroraMySqlDialect is not registered?

try
{
using var cmd = conn.CreateCommand();
cmd.CommandText = existenceQuery;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to set a command timeout for all these internal queries?

@kenrickyap
Copy link
Contributor

I think the csproj file is missing for BlueGreenConnection and BlueGreenConnection.Tests projects.


namespace AwsWrapperDataProvider.Plugin.BlueGreenConnection.BlueGreenConnection;

public class BlueGreenConnectionPluginService : PluginService
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when is this BlueGreenConnectionPluginService used?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, I don't see it being used so we can probably remove it.


protected int GetValueHash(int currentHash, string val)
{
return currentHash * 31 + val.GetHashCode();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we multiply by 31 + val.GetHashCode()? I see jdbc also does this but it seem quite arbitrary


if (!BlueGreenStatusMapping.TryGetValue(value.ToUpperInvariant(), out var phase))
{
throw new ArgumentException($"Unknown blue/green deployment status: {value}");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to add to resx file


Logger.LogTrace(Resources.SuspendConnectRouting_Apply_SwitchoverCompleteContinueWithConnect, (this.GetNanoTime() - holdStartTime) / 1_000_000);
}
catch (OperationCanceledException)
Copy link
Contributor

@kenrickyap kenrickyap Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is an OperationCanceledException ever thrown by Delay? base delay seems to just return when cancellationToken.IsCancellationRequested.

This might be an issue as we might be returning null when cts is cancelled, which assumes the apply was successful.

return DateTime.UtcNow.Ticks * 100;
}

protected void Delay(long delayMs, BlueGreenStatus? bgStatus, string bgdId, CancellationToken cancellationToken)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth turning Delay to an async function too and use Task.Await instead of Thread.Sleep, the reason being is that Task.Await will free up the thread pool and allow other thread workers to use the thread, while Thread.Sleep keeps hold of the thread.


Logger.LogTrace(Resources.SuspendExecuteRouting_Apply_SwitchoverCompletedContinueWithMethod, methodName, (this.GetNanoTime() - holdStartTime) / 1_000_000);
}
catch (OperationCanceledException)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above

Copy link
Contributor

@kenrickyap kenrickyap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOICE WORK :):) added some comments

protected static readonly string RdsPgBgStatusQuery =
$"SELECT * FROM rds_tools.show_topology('aws_dotnet_driver')";


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra blank line

"SELECT * FROM " +
$"pg_catalog.get_blue_green_fast_switchover_metadata('aws_dotnet_driver')";

protected static readonly string RdsMySqlTopologyTableExistsQuery =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aurora and RDS MySQL have the same queries. Maybe we can just use the same variables here.


namespace AwsWrapperDataProvider.Plugin.BlueGreenConnection.BlueGreenConnection;

public delegate void OnBlueGreenStatusChange(BlueGreenRoleType role, BlueGreenInterimStatus interimStatus);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious if we could just move this to BlueGreenStatusMonitor since it is only used there.


private int monitorResetOnInProgressCompleted; // 0 = false, 1 = true
private int monitorResetOnTopologyCompleted;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra blank line

: long.Parse(PropertyDefinition.BgIntervalIncreasedMs.DefaultValue!);
this.statusCheckIntervalMap[BlueGreenIntervalRate.HIGH] =
!props.TryGetValue(PropertyDefinition.BgIntervalHighMs.Name, out var high)
? long.Parse(PropertyDefinition.BgIntervalHighMs.DefaultValue!)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: theres a blank space at the end of this line

List<IConnectRouting> connectRouting,
List<IExecuteRouting> executeRouting,
IDictionary<string, BlueGreenRoleType> roleByHost,
IDictionary<string, (HostSpec, HostSpec)> correspondingNodes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IDictionary<string, (HostSpec, HostSpec?)>

if (!this.blueDnsUpdateCompleted || Interlocked.CompareExchange(ref this.allGreenNodesChangedName, 0, 0) == 0)
{
// New connect calls to blue nodes should be routed to green nodes.
foreach (var x in this.roleByHost.Where(x => x.Value == BlueGreenRoleType.SOURCE && this.correspondingNodes.ContainsKey(x.Key)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious if you could just do

var (blueHost, value)

instead of var x here for easier processing later on


// Check whether green host is already been connected with blue (no-prefixes) IAM host name.
List<HostSpec> iamHosts;
if (this.IsAlreadySuccessfullyConnected(greenHost, blueHost))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HostSpec iamBlueHost = this.hostSpecBuilder.CopyFrom(greenHostSpec).WithHost(BlueGreenConnectionUtils.RemoveGreenInstancePrefix(greenHost)).Build();
if (this.IsAlreadySuccessfullyConnected(greenHost, iamBlueHost.Host))

Comment on lines +1056 to +1057


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra blank lines

}

public override async Task<DbConnection> ForceOpenConnection(
HostSpec hostSpec,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HostSpec?

}

public override async Task<DbConnection> OpenConnection(
HostSpec hostSpec,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

: await pluginService.OpenConnection(this.SubstituteHostSpec, props, plugin, false);
}

bool iamInUse = pluginService.IsPluginInUse("IamAuthPlugin");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be having constants instead of hard-coded strings.

Also I don't think we should be using the class name, it would be better to do the plugin code.


public bool IsPluginActive(string pluginName)
{
return this.plugins.Any(p => p.GetType().Name.Contains(pluginName, StringComparison.OrdinalIgnoreCase));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not the biggest fan of using the class name as a string.

Perhaps something like this would be better?

public bool IsPluginActive<T>() where T : IConnectionPlugin
{
    return this.plugins.Any(p => p is T);
}

bool iamInUse = pluginService.IsPluginInUse<IamAuthPlugin>();

This would require you to have a dependency tho with IamAuthPlugin. But if you can't import the class like Java, then we should do something like GO and use the plugin code instead. It's better cause that would remain constant to what users use.

public bool IsPluginActive(string pluginCode)
{
    return this.activePluginCodes.Contains(pluginCode);
}

where plugin codes is the codes used. You can make this an array, or a hash set to be more effecient. You would probably need to make this when building the plugin chain and parsing the dns.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya I agree. This was only done because I wanted to make a seperate package but that doesnt seem needed.

Comment on lines +22 to +24
protected static readonly string AuroraMySqlBgTopologyExistsQuery =
"SELECT 1 AS tmp FROM information_schema.tables WHERE" +
" table_schema = 'mysql' AND table_name = 'rds_topology'";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, what's the purpose of making these into helpers and not being in the dialect classes?
Also, curious why BG is another project? For visibilty, main purpose of the other drivers separating the plugins is to avoid having SDK dependencies if they don't need it. I don' tthink blue green has these dependencies.

Comment on lines +52 to +55
AuroraMySqlDialect => CheckExistenceQueries(connection, AuroraMySqlBgTopologyExistsQuery),
AuroraPgDialect => CheckExistenceQueries(connection, AuroraPostgreSqlBgTopologyExistsQuery),
RdsMySqlDialect => CheckExistenceQueries(connection, RdsMySqlTopologyTableExistsQuery),
RdsPgDialect => CheckExistenceQueries(connection, RdsPgTopologyTableExistsQuery),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't very OOP-like imo and goes against it, we should be depending on their underlying implementation and use the dialect class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be fixed when it is moved to one package

Comment on lines +31 to +35
// Network-bound methods that might fail and trigger failover
"DbConnection.Open",
"DbConnection.OpenAsync",
"DbConnection.BeginDbTransaction",
"DbConnection.BeginDbTransactionAsync",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should really keep a list of these in the driver dialect or something to make it all consistent

this.pluginService = pluginService;
this.props = props;
this.providerSupplier = providerSupplier;
this.bgdId = PropertyDefinition.BgdId.GetString(this.props);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we normalize this like jdbc? Require it to be non null, trim, and also lowercase?

And curious, can users ever pass in null through the properties? This may blow up and we will get a NPE if we pass this in to routing.apply. Perhaps we should make bgId string instead of string? if possible


protected virtual long GetNanoTime()
{
return DateTime.UtcNow.Ticks * 100;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the right function or unit? Ticks are already measured in 100 nanoseconds. Multiplying this by 100 would not be correct.

If we want to be more precise we should use Stopwatch.GetTimestamp() instead.

Comment on lines +71 to +74
public static readonly AwsWrapperProperty BgConnectTimeout = new(
"bgConnectTimeoutMs",
"30000",
"Connect timeout (in msec) during Blue/Green Deployment switchover.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also see this in jdbc, but it doesn't look like this is configurable to the user?

protected readonly HostSpec initialHostSpec;

private readonly CancellationTokenSource concellationTokenSource = new();
private readonly SemaphoreSlim sleepWaitObj = new(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is using a semaphore the correct approach? These are often used for counts. If we have multiple NotifyAlls and it accumulates, then the subsequent delays will be omitted.

}
});

await this.openConnectionTask;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi, this will wait for the task to finish. At that point, is there a reason to make it async? JDBC does completely in a different thread in parallel. If you want, we can make it simpler and just make this process synchronous and just remove the Task.run().

Comment on lines +621 to +624
this.props[PropertyDefinition.Host.Name] = connectionHostSpecCopy.Host;
this.props[PropertyDefinition.Port.Name] = connectionHostSpecCopy.Port.ToString();
this.hostListProvider = this.pluginService.Dialect.HostListProviderSupplier(this.props, (PluginService)this.pluginService, this.pluginService);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is mutating the actual props, did you mean to use the copy?

this.role,
hostListProperties["ClusterId"]);

var connectionHostSpecCopy = this.connectionHostSpec;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a copy? Or are we just setting the reference to the same object? We need this to be a full copy.

protected readonly Dictionary<BlueGreenIntervalRate, long> statusCheckIntervalMap;
protected readonly HostSpec initialHostSpec;

private readonly CancellationTokenSource concellationTokenSource = new();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: should be cancellationTokenSource


protected long GetNanoTime()
{
return DateTime.UtcNow.Ticks * 100;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment with ticks here.

+ divider
+ string.Join("\n", this.phaseTimeNano
.OrderBy(y => y.Value.TimestampNano)
.Select(x => string.Format("{0,28} {1,18} ms {{2," + maxEventNameLength + "}}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the double-braces in {{2," + , intentional?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants