Skip to content

feat: bulk operations, connection pooling, and migration CLI enhancements#19

Merged
joshsmithxrm merged 49 commits intomainfrom
feature/cli-implementation
Dec 23, 2025
Merged

feat: bulk operations, connection pooling, and migration CLI enhancements#19
joshsmithxrm merged 49 commits intomainfrom
feature/cli-implementation

Conversation

@joshsmithxrm
Copy link
Owner

Summary

Major feature release adding high-performance bulk operations, intelligent connection pooling, and migration CLI tooling.

Bulk Operations (PPDS.Dataverse)

  • Parallel batch processing using CreateMultiple, UpdateMultiple, UpsertMultiple APIs
  • Progress reporting with IProgress<ProgressSnapshot> for real-time throughput metrics
  • Server-recommended parallelism via RecommendedDegreesOfParallelism (can be overridden for higher throughput)

Connection Pooling

  • Throttle-aware connection selection - routes around throttled connections
  • Connection health management - marks invalid connections, prevents pool leaks
  • Dynamic pool expansion with spin-wait to prevent over-creation

Resilience & Retry Logic

  • TVP race condition retry (SQL 3732) - handles transient errors on new tables ✅ production-validated
  • SQL deadlock retry (SQL 1205) - handles contention under high parallelism
  • Service protection throttle retry (429) - exponential backoff with intelligent routing

Migration CLI (ppds-migrate)

  • Schema generation from live Dataverse environments
  • M2M relationship export/import with CMT format compatibility
  • User mapping for cross-environment migrations
  • Plugin disable/enable during imports
  • Attribute filtering for selective exports

Documentation

  • Performance benchmarks (validated ~1.8M records/hour at parallelism 50)
  • Architecture Decision Records (ADRs)
  • Detailed specs for throttle detection, connection health, TVP retry

CI/CD

  • Added CodeQL security scanning workflow

Test Results

  • 42,366 records upserted at parallelism 50
  • 100% success rate (0 failures)
  • Duration: 83 seconds (~508 records/sec)
  • TVP retry triggered and succeeded during test run

Validation Status

Feature Status
Bulk operations ✅ Production-validated
Connection pooling ✅ Production-validated
TVP retry (3732) ✅ Production-validated
Deadlock retry (1205) ✅ Code-complete (follows validated TVP pattern)
Throttle retry (429) ✅ Code-complete (not triggered in test - no 429s hit)

Breaking Changes

None. All changes are additive.

Commits

  • feat(dataverse): add SQL deadlock retry to bulk operations
  • refactor(dataverse): centralize error handling in batch execution wrapper
  • feat(dataverse): implement connection health management and TVP retry
  • feat(dataverse): implement throttle detection and intelligent routing
  • feat(dataverse): use RecommendedDegreesOfParallelism by default
  • fix(dataverse): fix connection pool leak and retry logic
  • feat(dataverse): add progress reporting to bulk operations
  • feat: add parallel batch processing, progress tracking, and bug fixes
  • feat: implement M2M relationship export/import
  • feat: add schema generation and improve dependency detection
  • ci: add security scanning workflows

🤖 Generated with Claude Code

joshsmithxrm and others added 30 commits December 20, 2025 11:37
- Add CodeQL workflow for C# static analysis (security-and-quality queries)
- Add dependency-review-action to block PRs with high severity vulnerabilities
- Add dotnet list package --vulnerable check to build workflow
- Dependency check warns but doesn't fail (handles transitive dependency issues)
CLI Implementation:
- Add ServiceFactory for DI container setup
- Wire ExportCommand to IExporter (ParallelExporter)
- Wire ImportCommand to IImporter (TieredImporter)
- Wire AnalyzeCommand to ICmtSchemaReader + IDependencyGraphBuilder
- Wire MigrateCommand for export→import flow
- Fix import error order (validate file before connection)

CMT Compatibility Fixes:
- Add 'etc' attribute (ObjectTypeCode) to entity element
- Change 'm2m' to 'manyToMany' for relationships
- Add full relationship attributes (referencingEntity, referencingAttribute, etc.)
- Add 'primaryKey' attribute support for fields
- Add IsPrimaryKey property to FieldSchema

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add `schema generate` command to create migration schemas from Dataverse metadata
- Add `schema list` command to browse available entities with filtering
- Write CMT-compatible schema format (etc, primaryidfield, lookupType, relationships)
- Handle polymorphic lookups (e.g., account|contact) in dependency analysis
- Add entityreference type to IsLookup detection
- Include explicit <relationships> section for enhanced dependency detection

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add dateMode="absolute" attribute to schema root (required by CMT)
- Add <entityImportOrder> section to schema (required by CMT)
- Change data.xml to use element content for values instead of attributes
- Update date format to CMT-compatible "yyyy-MM-dd HH:mm:ss"
- Update boolean format to "1"/"0" instead of "true"/"false"

Fixes "Invalid import file" error when importing into CMT.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Read lookup GUID from element content (CMT format)
- Add "entityreference" to lookup type handling
- Fix boolean parsing to handle "1"/"0" values

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add [Content_Types].xml to ZIP (required by CMT)
- Use value attribute instead of element content for field values
- Add xmlns:xsd and xmlns:xsi namespaces to data.xml root
- Use displayname attribute on entity (not recordcount)
- Include primary ID field in record output
- Add m2mrelationships element after records
- Fix DateTime format to ISO 8601 with 7 decimal places
- Remove dateMode attribute and entityImportOrder from schema
- Remove customfield attribute from field elements
- HTML-encode filter content using WriteElementString
- Fix field attribute order: displayname, name, type

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…ibility

- Add IsReflexive and TargetEntityPrimaryKey to RelationshipSchema
- Create ManyToManyRelationshipData class with grouped target model
- Update CmtSchemaReader/Writer to handle M2M attributes (isreflexive, m2mTargetEntityPrimaryKey)
- Update CmtDataReader to parse <m2mrelationships> element
- Update CmtDataWriter to write <m2mrelationships> with grouped targets
- Add M2M export to ParallelExporter (queries intersect entities)
- Rewrite ProcessRelationshipsAsync in TieredImporter with role entity lookup
- Support AssociateRequest with multiple targets per source

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Attribute Filtering:
- Add IncludeAttributes, ExcludeAttributes, ExcludeAttributePatterns to SchemaGeneratorOptions
- Update DataverseSchemaGenerator to apply attribute filtering
- Add CLI options: --include-attributes, --exclude-attributes, --exclude-patterns

User Mapping:
- Add UserMapping and UserMappingCollection models
- Add UserMappingReader for XML user mapping files
- Update TieredImporter to apply user mappings for systemuser/team references
- Add CLI option: --user-mapping

Plugin Disable/Enable:
- Add PluginStepManager to query and disable/enable plugin steps
- Update TieredImporter to disable plugins during import based on schema disableplugins attribute
- Re-enable plugins in finally block to ensure cleanup
- Register IPluginStepManager in DI container

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The M2M relationship Entity1/Entity2 must be relative to the current
entity being processed, not the fixed order from Dataverse metadata.

Bug: When Dataverse returned Entity1LogicalName != current entity,
we would set Entity2 to the wrong entity, causing:
- systemuser M2M to team → exported with targetentityname="systemuser"
- Import then looks for team with systemuser's GUID → fails

Fix: Determine source/target based on which entity matches metadata.LogicalName
- Also set IsReflexive for self-referencing M2M relationships
- Set TargetEntityPrimaryKey from the target intersect attribute

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Bumps [actions/setup-dotnet](https://github.com/actions/setup-dotnet) from 4 to 5.
- [Release notes](https://github.com/actions/setup-dotnet/releases)
- [Commits](actions/setup-dotnet@v4...v5)

---
updated-dependencies:
- dependency-name: actions/setup-dotnet
  dependency-version: '5'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 6.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](actions/checkout@v4...v6)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
---
updated-dependencies:
- dependency-name: coverlet.collector
  dependency-version: 6.0.4
  dependency-type: direct:production
  update-type: version-update:semver-patch
- dependency-name: coverlet.collector
  dependency-version: 6.0.4
  dependency-type: direct:production
  update-type: version-update:semver-patch
- dependency-name: coverlet.collector
  dependency-version: 6.0.4
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
---
updated-dependencies:
- dependency-name: FluentAssertions
  dependency-version: 8.8.0
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
---
updated-dependencies:
- dependency-name: Microsoft.Extensions.DependencyInjection
  dependency-version: 10.0.1
  dependency-type: direct:production
  update-type: version-update:semver-major
- dependency-name: Microsoft.Extensions.DependencyInjection
  dependency-version: 10.0.1
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
…Microsoft.Extensions.Logging.Abstractions

Bumps Microsoft.Extensions.DependencyInjection.Abstractions from 8.0.2 to 10.0.1
Bumps Microsoft.Extensions.Logging.Abstractions from 8.0.3 to 10.0.1

---
updated-dependencies:
- dependency-name: Microsoft.Extensions.DependencyInjection.Abstractions
  dependency-version: 10.0.1
  dependency-type: direct:production
  update-type: version-update:semver-major
- dependency-name: Microsoft.Extensions.Logging.Abstractions
  dependency-version: 10.0.1
  dependency-type: direct:production
  update-type: version-update:semver-major
- dependency-name: Microsoft.Extensions.DependencyInjection.Abstractions
  dependency-version: 10.0.1
  dependency-type: direct:production
  update-type: version-update:semver-major
- dependency-name: Microsoft.Extensions.Logging.Abstractions
  dependency-version: 10.0.1
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
…Microsoft.Extensions.Options

Bumps Microsoft.Extensions.DependencyInjection.Abstractions from 8.0.2 to 10.0.1
Bumps Microsoft.Extensions.Options from 8.0.2 to 10.0.1

---
updated-dependencies:
- dependency-name: Microsoft.Extensions.DependencyInjection.Abstractions
  dependency-version: 10.0.1
  dependency-type: direct:production
  update-type: version-update:semver-major
- dependency-name: Microsoft.Extensions.Options
  dependency-version: 10.0.1
  dependency-type: direct:production
  update-type: version-update:semver-major
- dependency-name: Microsoft.Extensions.DependencyInjection.Abstractions
  dependency-version: 10.0.1
  dependency-type: direct:production
  update-type: version-update:semver-major
- dependency-name: Microsoft.Extensions.Options
  dependency-version: 10.0.1
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Add validation in DataverseConnectionPool that detects when connections
target different Dataverse organizations and logs a warning. This prevents
the common misconfiguration of putting Dev/QA/Prod in the same pool,
which causes requests to be randomly load-balanced across environments.

Also add documentation in README explaining:
- Why multiple orgs in one pool is wrong
- The correct pattern for multi-environment scenarios
- When multiple connections in one pool IS appropriate

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
PPDS.Dataverse:
- Add ProgressTracker utility for thread-safe progress calculation
- Add ProgressSnapshot record with instant/overall rates and ETA
- Add MaxParallelBatches option to BulkOperationOptions
- Implement parallel batch processing in BulkOperationExecutor
- Convert BulkOperationResult to record for immutability

PPDS.Migration:
- Fix Entity.Id not used by UpsertMultiple (add as attribute)
- Add SuccessCount/FailureCount to ProgressEventArgs
- Fix TieredImporter to report success/failure breakdown
- Fix deferred fields progress to set Field, Current, Total
- Pass Errors array to progress.Complete()

ConsoleProgressReporter:
- Show success/failure breakdown during import
- Display detailed errors in completion summary (up to 10)
- Handle null Field gracefully in deferred fields display
- Add color-coded output for success/warnings/errors

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Document the duplicate key error that occurs when setting alternate key
columns in both KeyAttributes and Attributes. The server's internal
ClassifyEntitiesForUpdateAndCreateV2 processor copies KeyAttributes into
Attributes, causing Dictionary.Insert to fail with "An item with the
same key has already been added".

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add IProgress<ProgressSnapshot> parameter to IBulkOperationExecutor methods
- Wire up ProgressTracker in BulkOperationExecutor for real-time progress
- Report progress after each batch completion (both parallel and sequential)

Usage:
var progress = new Progress<ProgressSnapshot>(snapshot =>
    Console.WriteLine($"{snapshot.Processed}/{snapshot.Total} ({snapshot.PercentComplete:F1}%)"));

await executor.UpsertMultipleAsync(entityName, entities, options, progress);

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add LogDebug calls before and after each batch execution in
BulkOperationExecutor for CreateMultiple, UpdateMultiple, and
UpsertMultiple operations.

With --verbose (Debug level logging), users will now see:
  dbug: Executing UpsertMultiple batch. Entity: account, BatchSize: 1000
  dbug: UpsertMultiple batch completed. Entity: account, Success: 1000

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…play

- Add RatePerSecond property that returns OverallRatePerSecond
- Add warning to InstantRatePerSecond about volatility in batch operations
- Add BULK_OPERATIONS_BENCHMARKS.md to document performance testing

The InstantRatePerSecond fluctuates wildly when batches complete in bursts.
Use RatePerSecond (overall rate) for display and throughput reporting.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix InitializeMinimumConnections to count active connections, not just
  idle ones. Previously, validation loop created new connections every
  60s when active connections weren't counted toward the minimum.
- Add PoolExhaustedException for typed exception handling
- Add retry logic with exponential backoff (1s, 2s) for pool exhaustion
- Fix dead code in retry loop where final attempt was unreachable

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When pool is empty but connections are active (about to be returned),
add a brief 5ms spin-wait before creating a new connection. This handles
the race condition where:
1. Worker A logs "batch completed" but hasn't returned connection yet
2. Worker B starts new batch and sees empty pool
3. Worker B creates new connection (unnecessary)
4. Worker A's connection is then returned (now we have an extra)

The spin-wait allows in-flight connection returns to complete, reducing
unnecessary pool expansion from 50 connections down to ~5-8 for a
4-worker parallel batch operation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…osal flag

The PooledClient._disposed flag was set on first Dispose() and never
reset, causing subsequent checkouts of the same connection to silently
skip returning to pool. This caused connection exhaustion under load.

Changes:
- Rename _disposed to _returned to clarify semantics (returning to pool, not disposing)
- Reset _returned flag in Reset() when connection is returned to pool
- Add _poolLock for synchronized queue access
- Use Interlocked operations for thread-safe flag management
- Pass CancellationToken through Parallel.ForEachAsync to batch methods

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Results from testing connection pool vs single ServiceClient:
- Connection pool is 5% faster than single ServiceClient (47.7/s vs 45.4/s)
- Batch size 100 is 3% faster than batch size 1000
- Updates are ~23% slower than creates (alternate key lookup overhead)

Optimal config: Connection Pool + Batch Size 100 = 47.7 records/sec

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Update CLAUDE.md with PPDS.Dataverse project structure and performance rules
- Add Microsoft's reference throughput benchmarks (10M records/hour for bulk APIs)
- Add batch size guidance based on our benchmarks (100 vs 1000)
- Add RecommendedDegreesOfParallelism / x-ms-dop-hint guidance
- Add Performance Settings section documenting auto-applied settings
- Add Microsoft Learn reference links to all ADRs
- Document service protection limits (6K requests, 20min execution, 52 concurrent)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Change BatchSize default from 1000 to 100 (benchmarked optimal)
- Change MaxParallelBatches to nullable, default null
- When null, automatically use ServiceClient.RecommendedDegreesOfParallelism
  (from x-ms-dop-hint header per Microsoft recommendation)
- Add ResolveParallelismAsync helper to resolve parallelism for all bulk ops
- Fall back to sequential (1) if RecommendedDegreesOfParallelism unavailable

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Log warning if MaxParallelBatches > MaxPoolSize since effective
parallelism will be limited by available pool connections.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Two comprehensive specification documents for enterprise-grade features:

1. THROTTLE_DETECTION_AND_ROUTING.md
   - Wire up existing ThrottleTracker infrastructure
   - Detect service protection errors (all 3 error codes)
   - Extract Retry-After from OrganizationServiceFault.ErrorDetails
   - Intelligent routing: try different connection first, wait only when all throttled
   - Expose throttle statistics to consumers

2. CONNECTION_HEALTH_MANAGEMENT.md
   - Health-based connection recycling (not fixed timer)
   - Connection validation on checkout (IsReady, age, validity)
   - Auth and connection failure detection and recovery
   - MarkInvalid() for explicit connection invalidation
   - Retry failed operations with new connections
   - Configurable MaxLifetime (default 60min vs current 30min)

Both specs include:
- Problem statement and Microsoft guidance with references
- Current state analysis
- Detailed implementation with code examples
- Testing requirements and acceptance criteria
- Risk mitigations

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@gemini-code-assist
Copy link

Summary of Changes

Hello @joshsmithxrm, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request delivers a significant feature release focused on dramatically improving performance, reliability, and developer experience when interacting with Dataverse. It introduces a suite of high-performance bulk operations, an intelligent connection pooling system, and advanced resilience features to handle transient errors gracefully. Furthermore, the migration CLI has been substantially enhanced with new schema management and data migration capabilities, making it a more powerful tool for Dataverse developers and administrators.

Highlights

  • High-Performance Bulk Operations: Introduced CreateMultiple, UpdateMultiple, and UpsertMultiple APIs with parallel batch processing, real-time progress reporting via IProgress<ProgressSnapshot>, and automatic server-recommended parallelism for optimal throughput.
  • Intelligent Connection Pooling: Implemented a robust connection pool featuring throttle-aware connection selection to route around throttled connections, comprehensive connection health management to prevent leaks and mark invalid connections, and dynamic pool expansion with spin-wait for efficient resource utilization.
  • Enhanced Resilience and Retry Logic: Added specialized retry mechanisms for transient Dataverse errors, including handling TVP race conditions (SQL 3732) on new tables, SQL deadlocks (SQL 1205) under high contention, and service protection throttles (429) with exponential backoff and intelligent routing.
  • Migration CLI Tooling (ppds-migrate): Expanded the CLI with new capabilities for schema generation directly from live Dataverse environments, export/import of Many-to-Many (M2M) relationships compatible with CMT format, user mapping for cross-environment migrations, and the ability to disable/enable plugins during imports for performance.
  • Comprehensive Documentation & Benchmarks: Included detailed performance benchmarks (validating ~1.8M records/hour at parallelism 50), Architecture Decision Records (ADRs), and specifications for throttle detection, connection health, and TVP retry, providing deep insights into the new features.
  • CI/CD Security Integration: Integrated CodeQL security scanning into the CI/CD pipeline to proactively identify and mitigate potential security vulnerabilities.
Ignored Files
  • Ignored by pattern: .github/workflows/** (2)
    • .github/workflows/build.yml
    • .github/workflows/codeql.yml
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-advanced-security
Copy link

This pull request sets up GitHub code scanning for this repository. Once the scans have completed and the checks have passed, the analysis results for this pull request branch will appear on this overview. Once you merge this pull request, the 'Security' tab will show more code scanning analysis results (for example, for the default branch). Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results. For more information about GitHub code scanning, check out the documentation.

Comment on lines +309 to +317
catch (Exception ex)
{
ConsoleOutput.WriteError($"Schema generation failed: {ex.Message}", json);
if (verbose)
{
Console.Error.WriteLine(ex.StackTrace);
}
return ExitCodes.Failure;
}

Check notice

Code scanning / CodeQL

Generic catch clause Note

Generic catch clause.

Copilot Autofix

AI 2 months ago

In general, to fix a “generic catch clause” you either (1) catch more specific exception types that you expect and can handle meaningfully, or (2) keep a broad catch but explicitly rethrow critical exceptions that should not be swallowed (like OutOfMemoryException, ThreadAbortException, StackOverflowException, etc.).

For this CLI command, we should preserve the existing behavior for “normal” failures (still print a friendly message, optionally print the stack trace when verbose is true, and return ExitCodes.Failure), but avoid swallowing fatal exceptions. The most compatible change is to keep catch (Exception ex) but, at the start of that block, check whether ex is a critical exception type and, if so, rethrow it. This keeps the public behavior unchanged for recoverable errors while ensuring catastrophic conditions are not obscured.

Concretely, in ExecuteGenerateAsync in src/PPDS.Migration.Cli/Commands/SchemaCommand.cs, modify the catch (Exception ex) block starting at line 309 to add a small helper invocation that rethrows critical exceptions. Because we can only edit within the shown snippet, and no new file‑level helpers are shown, we’ll implement the rethrow logic inline using an if that checks is OutOfMemoryException or ThreadAbortException. This requires no new imports and does not affect normal exception handling.

Suggested changeset 1
src/PPDS.Migration.Cli/Commands/SchemaCommand.cs

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/src/PPDS.Migration.Cli/Commands/SchemaCommand.cs b/src/PPDS.Migration.Cli/Commands/SchemaCommand.cs
--- a/src/PPDS.Migration.Cli/Commands/SchemaCommand.cs
+++ b/src/PPDS.Migration.Cli/Commands/SchemaCommand.cs
@@ -308,6 +308,12 @@
         }
         catch (Exception ex)
         {
+            // Rethrow critical exceptions that should not be swallowed.
+            if (ex is OutOfMemoryException || ex is ThreadAbortException)
+            {
+                throw;
+            }
+
             ConsoleOutput.WriteError($"Schema generation failed: {ex.Message}", json);
             if (verbose)
             {
EOF
@@ -308,6 +308,12 @@
}
catch (Exception ex)
{
// Rethrow critical exceptions that should not be swallowed.
if (ex is OutOfMemoryException || ex is ThreadAbortException)
{
throw;
}

ConsoleOutput.WriteError($"Schema generation failed: {ex.Message}", json);
if (verbose)
{
Copilot is powered by AI and may make mistakes. Always verify output.
Comment on lines +304 to +314
foreach (var batch in batches)
{
var batchResult = await ExecuteUpsertMultipleBatchAsync(
entityLogicalName, batch, options, cancellationToken);

successCount += batchResult.SuccessCount;
allErrors.AddRange(batchResult.Errors);
successCount += batchResult.SuccessCount;
allErrors.AddRange(batchResult.Errors);

tracker.RecordProgress(batchResult.SuccessCount, batchResult.FailureCount);
progress?.Report(tracker.GetSnapshot());
}

Check notice

Code scanning / CodeQL

Missed opportunity to use Select Note

This foreach loop immediately
maps its iteration variable to another variable
- consider mapping the sequence explicitly using '.Select(...)'.

Copilot Autofix

AI 2 months ago

Copilot could not generate an autofix suggestion

Copilot could not generate an autofix suggestion for this alert. Try pushing a new commit or if the problem persists contact support.

Comment on lines +237 to +247
foreach (var batch in batches)
{
var batchResult = await ExecuteUpdateMultipleBatchAsync(
entityLogicalName, batch, options, cancellationToken);

successCount += batchResult.SuccessCount;
allErrors.AddRange(batchResult.Errors);

tracker.RecordProgress(batchResult.SuccessCount, batchResult.FailureCount);
progress?.Report(tracker.GetSnapshot());
}

Check notice

Code scanning / CodeQL

Missed opportunity to use Select Note

This foreach loop immediately
maps its iteration variable to another variable
- consider mapping the sequence explicitly using '.Select(...)'.

Copilot Autofix

AI 2 months ago

Copilot could not generate an autofix suggestion

Copilot could not generate an autofix suggestion for this alert. Try pushing a new commit or if the problem persists contact support.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This is a major feature release adding high-performance bulk operations, intelligent connection pooling, and comprehensive migration CLI tooling. The PR includes parallel batch processing with server-recommended parallelism, throttle-aware connection routing, TVP race condition retry logic, connection health management, M2M relationship export/import, and user mapping capabilities for cross-environment migrations.

Key Changes:

  • Parallel bulk operations with IProgress<ProgressSnapshot> for real-time metrics
  • Throttle detection and intelligent routing with retry-after handling
  • Connection health management with invalid connection detection
  • TVP race condition and SQL deadlock retry logic (production-validated)
  • Schema generation from live Dataverse with attribute filtering
  • M2M relationship export/import in CMT-compatible format
  • User mapping for cross-environment migrations
  • Plugin disable/enable during imports

Reviewed changes

Copilot reviewed 57 out of 57 changed files in this pull request and generated 14 comments.

Show a summary per file
File Description
BulkOperationExecutor.cs Major refactor: parallel execution, throttle detection, TVP/deadlock retry, connection health
DataverseConnectionPool.cs Connection health validation, invalid connection disposal, multi-org warning
PooledClient.cs Invalid connection marking, double-dispose prevention
ThrottleTracker.cs Added ThrottledConnectionCount, GetShortestExpiry for routing
ProgressTracker.cs Thread-safe progress tracking with rolling window rate calculation
ProgressSnapshot.cs Immutable progress state with instant/overall rates
SchemaGeneratorOptions.cs Attribute filtering with whitelist/blacklist/pattern support
DataverseSchemaGenerator.cs Live schema generation from Dataverse metadata
TieredImporter.cs Primary key attribute setting, user mapping, plugin management, M2M import
PluginStepManager.cs Plugin step disable/enable for import operations
UserMappingReader.cs XML user mapping file reader
CmtDataWriter/Reader.cs CMT format compatibility with M2M relationship support
Migration CLI Commands Schema generation, list entities, full migration pipeline
Specs & Docs TVP retry, throttle detection, connection health, benchmarks, ADRs

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is a massive and impressive feature release that adds critical capabilities for high-performance data operations and migration. The introduction of a sophisticated connection pool with throttle awareness and health management, robust retry logic for transient errors (TVP, deadlocks, throttling), and a comprehensive migration CLI tool are all excellent additions.

The code is generally well-structured, with good separation of concerns, especially in the BulkOperationExecutor which now centralizes complex error handling and retry logic. The documentation, ADRs, and spec files are outstanding and provide great context.

I've identified a few key areas for improvement related to exception handling, connection pool robustness, and a potential issue in the upsert preparation logic. Addressing these will make the new features even more resilient and reliable.

joshsmithxrm and others added 10 commits December 23, 2025 03:34
Add parallelism 50 results showing 508.6 rec/s (8.4x improvement over
server-recommended). Document when to use standard vs high-throughput
mode and note that multi-app-registration pooling is untested but
could potentially increase throughput further.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove unused variable in ParallelExporter.cs (error)
- Refactor foreach loops to use LINQ Where/Select patterns
- Improve exception handling in DataverseSchemaGenerator.cs with specific catch blocks

Files changed:
- ParallelExporter.cs: Remove dead code, use LINQ for filtering
- SchemaGeneratorOptions.cs: Replace foreach+if with LINQ Any
- CmtDataReader.cs: Use LINQ for parsing target IDs
- ThrottleTracker.cs: Simplify cleanup with LINQ Where
- DependencyGraphBuilder.cs: Use LINQ pipeline for edge creation
- DataverseConnectionPool.cs: Use LINQ for connection string parsing
- DataverseSchemaGenerator.cs: Add specific exception catches

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add direct reference to System.Security.Cryptography.Pkcs 8.0.x to
override the vulnerable 6.0.1 version pulled in by Dataverse.Client.

Resolves: GHSA-555c-2p6r-68mm (High severity)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add ExecuteAsync to pool that retries indefinitely on service protection
- PooledClient auto-records throttle via callback to pool
- GetClientAsync waits for non-throttled connection BEFORE acquiring semaphore
- ThrottleAwareStrategy filters throttled connections, falls back to shortest expiry
- BulkOperationExecutor simplified: no MaxThrottleRetries, pool handles waiting
- Service protection is transient - never fail, only CancellationToken stops retry

Key fix: Throttle waiting no longer holds semaphore slots, preventing
PoolExhaustedException when all connections are throttled.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@joshsmithxrm joshsmithxrm force-pushed the feature/cli-implementation branch from f56dacd to fd32687 Compare December 23, 2025 21:28
Removed the unused `lastException` variable and its assignments in
ExecuteBatchWithThrottleHandlingAsync. The variable was assigned in
4 catch blocks but never read - each catch block either throws the
original exception or continues to retry.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@joshsmithxrm
Copy link
Owner Author

@gemini-code-assist review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is a massive and impressive feature release that adds high-performance bulk operations, intelligent connection pooling, and a comprehensive migration CLI. The level of detail in the new documentation, ADRs, and specifications is excellent and provides great context for the design decisions. The implementation is generally of very high quality, introducing sophisticated and resilient patterns for parallelism, throttle handling, and error recovery. I've identified one critical issue regarding a potential stack overflow in the connection pool and a medium-severity maintainability issue in a constructor. Overall, this is a fantastic contribution that significantly enhances the SDK's capabilities.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 68 out of 68 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

joshsmithxrm and others added 2 commits December 23, 2025 16:04
Replaced recursive call in GetConnectionFromPoolCore with a while loop
that drains invalid connections. This eliminates theoretical stack
overflow risk while maintaining identical behavior:

- Loop continues until valid connection found OR pool is empty
- Invalid connections are disposed and removed from pool
- When pool exhausted, creates new connection
- Slightly more efficient (no function call overhead per iteration)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- ADR-0005: Pool sizing per connection (MaxConnectionsPerUser default)
- Structured Configuration Spec: Typed config with Key Vault/env var support
- Multi-Environment Spec: Named environments and live migration design

These specs document the approved designs for implementation on a future branch.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@joshsmithxrm joshsmithxrm merged commit 4d72360 into main Dec 23, 2025
5 checks passed
@joshsmithxrm joshsmithxrm deleted the feature/cli-implementation branch December 25, 2025 10:10
joshsmithxrm pushed a commit that referenced this pull request Jan 8, 2026
…es (#19)

Bumps the development-dependencies group with 4 updates: [@typescript-eslint/eslint-plugin](https://github.com/typescript-eslint/typescript-eslint/tree/HEAD/packages/eslint-plugin), [@typescript-eslint/parser](https://github.com/typescript-eslint/typescript-eslint/tree/HEAD/packages/parser), [@vscode/vsce](https://github.com/Microsoft/vsce) and [typescript-eslint](https://github.com/typescript-eslint/typescript-eslint/tree/HEAD/packages/typescript-eslint).


Updates `@typescript-eslint/eslint-plugin` from 8.47.0 to 8.48.0
- [Release notes](https://github.com/typescript-eslint/typescript-eslint/releases)
- [Changelog](https://github.com/typescript-eslint/typescript-eslint/blob/main/packages/eslint-plugin/CHANGELOG.md)
- [Commits](https://github.com/typescript-eslint/typescript-eslint/commits/v8.48.0/packages/eslint-plugin)

Updates `@typescript-eslint/parser` from 8.47.0 to 8.48.0
- [Release notes](https://github.com/typescript-eslint/typescript-eslint/releases)
- [Changelog](https://github.com/typescript-eslint/typescript-eslint/blob/main/packages/parser/CHANGELOG.md)
- [Commits](https://github.com/typescript-eslint/typescript-eslint/commits/v8.48.0/packages/parser)

Updates `@vscode/vsce` from 3.7.0 to 3.7.1
- [Release notes](https://github.com/Microsoft/vsce/releases)
- [Commits](microsoft/vscode-vsce@v3.7.0...v3.7.1)

Updates `typescript-eslint` from 8.47.0 to 8.48.0
- [Release notes](https://github.com/typescript-eslint/typescript-eslint/releases)
- [Changelog](https://github.com/typescript-eslint/typescript-eslint/blob/main/packages/typescript-eslint/CHANGELOG.md)
- [Commits](https://github.com/typescript-eslint/typescript-eslint/commits/v8.48.0/packages/typescript-eslint)

---
updated-dependencies:
- dependency-name: "@typescript-eslint/eslint-plugin"
  dependency-version: 8.48.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
  dependency-group: development-dependencies
- dependency-name: "@typescript-eslint/parser"
  dependency-version: 8.48.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
  dependency-group: development-dependencies
- dependency-name: "@vscode/vsce"
  dependency-version: 3.7.1
  dependency-type: direct:development
  update-type: version-update:semver-patch
  dependency-group: development-dependencies
- dependency-name: typescript-eslint
  dependency-version: 8.48.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
  dependency-group: development-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants