-
-
Notifications
You must be signed in to change notification settings - Fork 7
Fix issue 394 and update info #481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
ooples
merged 277 commits into
master
from
claude/fix-issue-394-011CV3HkgfwwbaSAdrzrKd58
Nov 17, 2025
Merged
Changes from 250 commits
Commits
Show all changes
277 commits
Select commit
Hold shift + click to select a range
bc5f451
fix: remove readonly from all RL agents and correct DeepReinforcement…
claude 20aef17
fix: update all existing deep RL agents to inherit from DeepReinforce…
claude f4a4ce3
feat: add classical RL implementations (Tabular Q-Learning and SARSA)
claude 1930a77
feat: add more classical RL algorithms (Expected SARSA, First-Visit MC)
claude 13be43a
feat: add classical RL implementations (Expected SARSA, First-Visit MC)
claude 28dff3a
feat: add n-step SARSA classical RL implementation
claude 6d1a961
fix: update deep RL agents with .NET Framework compatibility and miss…
claude f43eec7
feat: add 5 classical RL implementations (MC and DP methods)
claude 0fae751
feat: add Modified Policy Iteration (6/29 classical RL)
claude e948012
wip: add 15 options files and 1 agent for remaining classical RL algo…
claude 51fdeb8
feat: add 3 eligibility trace algorithms (SARSA(λ), Q(λ), Watkins Q(λ))
claude b65b8c2
chore: prepare for final 12 classical RL algorithm implementations
claude 17c41cc
feat: add 3 Planning algorithms (Dyna-Q, Dyna-Q+, Prioritized Sweeping)
claude 80fbb36
feat: add 4 Bandit algorithms (ε-Greedy, UCB, Thompson Sampling, Grad…
claude 4a76080
feat: add final 5 Advanced RL algorithms (Actor-Critic, Linear Q/SARS…
claude 5090cc7
fix: use count instead of length for list assertion in uniform replay…
ooples 85c3553
fix: correct loss function type name and collection syntax in td3options
ooples e1dc8ba
fix: correct loss function type name and collection syntax in ddpgopt…
ooples 9a2f713
fix: validate ddpg options before base constructor call
ooples 9584472
fix: validate double dqn options before base constructor and sync tar…
ooples dd6e242
fix: validate dqn options before base constructor call
ooples 54b8aa3
fix: correct ornstein-uhlenbeck diffusion term sign
ooples 4205cd6
fix: throw notsupportedexception in ddpg computegradients and applygr…
ooples d97191f
fix: return actual gradients not parameters in double dqn computegrad…
ooples 3abe2a2
fix: apply gradient descent update in dueling dqn applygradients
ooples 6c0ac1e
fix: return actual gradients not parameters in dueling dqn computegra…
ooples 86b1b5b
fix: persist nextstate in trpo trajectory buffer
ooples 9c87416
fix: run a3c workers sequentially to prevent environment corruption
ooples 1d8c40b
fix: correct expectile gradient calculation in iql value function update
ooples bf36489
fix: apply correct mse gradient sign in iql q-network updates
ooples 7e0f8b1
fix: include conservative penalty gradient in cql q-network updates
ooples fdc4830
fix: negate policy gradient for q-value maximization in cql
ooples e7b39cb
fix: mark sac policy gradient as not implemented with proper exception
ooples 1506aaf
fix: mark reinforce policy gradient as not implemented with proper ex…
ooples 0d3315d
fix: mark a2c as needing backpropagation implementation before updates
ooples 86b905c
fix: mark a3c gradient computation as not implemented
ooples fe202df
fix: mark trpo policy update as not implemented with proper exception
ooples f9d5b21
fix: mark ddpg actor update as not implemented with proper exception
ooples 93ee5bb
fix: remove unused aiDotNet.LossFunctions using directive from maddpg…
ooples cfef0bb
feat: implement production-ready reinforce policy gradient with prope…
ooples cddcf13
feat: implement production-ready a2c backpropagation with proper grad…
ooples 7dbb227
feat: implement production-ready sac policy gradient with reparameter…
ooples 065d366
feat: implement production-ready ddpg deterministic policy gradient
ooples 404428a
feat: implement production-ready a3c gradient computation
ooples e04169d
feat: implement production-ready trpo importance-weighted policy grad…
ooples d59aace
fix: correct syntax errors - missing semicolon and params keyword
ooples b2ff5d7
fix: correct activation functions namespace import
ooples e39f13b
fix: net462 compatibility - add IsExternalInit shim and fix ambiguous…
ooples 3ddd7aa
fix: remove duplicate SequenceContext class definition from DecisionT…
ooples c095e34
feat: implement Save/Load methods for SAC, REINFORCE, and A2C agents
ooples 8b6c92e
fix: correct API method names and remove List<T> in Advanced RL agents
claude 04e4128
docs: add comprehensive XML documentation to Advanced RL Options
claude 0a729d7
fix: correct ModelMetadata properties in Advanced RL agents
claude 1411dde
fix: batch replace incorrect API method names across all RL agents
claude 564419b
fix: correct ModelMetadata properties across all RL agents
claude 038eba1
fix: add IActivationFunction casts and fix collection expressions
claude 62d13a8
fix: remove List<T> usage from GetParameters in 6 RL agents
claude 52efe00
fix: remove redundant epsilon properties from 16 RL Options classes
claude 66cd888
fix: qualify Experience type in SACAgent to resolve ambiguity
claude 55b1a8b
fix: remove invalid override keywords from PredictAsync and TrainAsync
claude 96d111d
fix: replace ReplayBuffer<T> with UniformReplayBuffer<T> and fix MCTS…
claude 672b037
fix: rename Save/Load to SaveModel/LoadModel to match IModelSerialize…
claude 427ebb6
fix: change base class to use Vector<T> instead of Matrix<T> and add …
claude 1687bac
fix: add missing abstract method implementations to A3C, TD3, CQL, IQ…
claude ef8ce78
fix: correct Matrix/Vector usage in deep RL agent parameter methods
claude c8018f6
fix: correct Matrix/Vector usage in all remaining RL agent parameter …
claude 25d7370
fix: correct GetActiveFeatureIndices and ComputeGradients signatures …
ooples 27532b8
fix: update all RL agent ComputeGradients methods to return Vector<T>…
ooples ef9ce0a
fix: replace NumericOperations<T>.Instance with MathHelper.GetNumeric…
ooples 7f878b4
fix: disambiguate denselayer constructor calls with explicit iactivat…
ooples 2f956d5
fix: replace mathhelper exp log with numops exp log for generic type …
ooples 2922e3c
fix: remove non-existent modelmetadata properties from rl agents
ooples e4977a9
fix: replace tasktype with neuralnetworktasktype for correct enum ref…
ooples 7c085e7
fix: correct experience property names to capitalized (state/nextstat…
ooples a49ef3e
fix: replace updateweights with updateparameters for correct neural n…
ooples ba15a8b
fix: replace takelast with skip take pattern for net462 compatibility
ooples ae52038
fix: replace backward with backpropagate for correct neural network api
ooples 545f758
fix: resolve actor-critic agents vector/tensor errors
ooples aa1f194
fix: resolve dqn family vector/tensor errors
ooples bb9c325
fix: resolve policy gradient agents vector/tensor errors
ooples ae14e77
fix: resolve cql agent vector/tensor conversion and api signature errors
ooples aae5129
fix: resolve constructor, type reference, and property errors
ooples 93a77de
fix: resolve worldmodelsagent vector/tensor api conversion errors
ooples a048320
fix: resolve maddpg agent build errors - network architecture and ten…
ooples 05eaaa6
fix: resolve planning agent computegradients vector/matrix type errors
ooples 5ef84a2
fix: resolve epsilon greedy bandit agent matrix to vector conversion …
ooples 2d37dad
fix: resolve ucb bandit agent matrix to vector conversion errors
ooples 844c3d3
fix: resolve thompson sampling agent matrix to vector conversion errors
ooples f0d630c
fix: resolve gradient bandit agent matrix to vector conversion errors
ooples d7a4a40
fix: resolve qmix agent build errors - network architecture and tenso…
ooples e9823ec
fix: resolve monte carlo agent build errors - modeltype enum and vect…
ooples 9558bf3
fix: resolve reinforce agent build errors - network architecture and …
ooples ca6fa32
fix: resolve sarsa lambda agent build errors - null assignment and lo…
ooples 9a6ca7d
fix: apply batch fixes to rl agents - experience api and using direct…
ooples 5391bad
fix: replace linearactivation with identityactivation and fix loss fu…
ooples e08cf16
fix: correct backpropagate calls to use single argument and initializ…
ooples c753cd6
fix: add activation function casts and fix experience property names …
ooples 748d90a
fix: resolve 36 iqlAgent errors using proper api patterns
ooples 8aed3ed
fix(rl): complete maddpgagent api migration to tensor-based neural ne…
ooples 7fda0b0
fix(rl): complete td3agent api migration to tensor-based neural networks
ooples b13109b
fix(rl): complete a3c/trpo/sac/qmix api migration to tensor-based neu…
ooples d9ca7e9
fix(rl): complete muzero api migration and resolve remaining errors
ooples 9446361
fix(rl): complete rainbowdqn api migration and resolve remaining errors
ooples 9939a95
fix(rl): complete dreameragent api migration to tensor-based neural n…
ooples 44beae2
fix(rl): complete batch api migration for duelingdqn and classical rl…
ooples 17b1685
fix: resolve cs1503 type conversion errors in cql and ppo agents
ooples f973be0
fix: resolve CS8618 and CS1061 errors in reinforcement learning agent…
ooples 7eca77b
fix: resolve all cs1061 missing member errors
ooples 70f608f
fix: complete decisiontransformeragent tensor conversions and modelty…
ooples e70136f
fix: correct initializers in STLDecompositionOptions and ProphetOptions
ooples e8eb882
fix: resolve 32 errors in 4 RL agent files
ooples 7362c9e
fix: resolve compilation errors in DDPG, QMIX, TRPO, MuZero, TabularQ…
ooples 8365fb9
fix: manual error fixes for pr #481
ooples 1e41347
feat: add core policy and exploration strategy interfaces
claude 6d50476
feat: implement epsilon-greedy, gaussian noise, and no-exploration st…
claude 8105265
feat: implement discrete and continuous policy classes
claude 77b636b
feat: add policy options configuration classes
claude 7c1659b
fix: correct numops usage and net462 compatibility in policy files
ooples 22a0876
docs: add comprehensive policy base classes implementation prompt
ooples 5be9a04
feat: add core policy and exploration strategy interfaces
claude f3d0128
feat: implement epsilon-greedy, gaussian noise, and no-exploration st…
claude d5c6bb6
feat: implement discrete and continuous policy classes
claude b8ad0a6
feat: add policy options configuration classes
claude 90d070c
refactor: update policies and exploration strategies to inherit from …
claude 160e890
feat: add advanced exploration strategies and policy implementations
claude 5b6ebce
fix: update policy options classes with sensible default implementations
claude e49d4be
fix: pass vector<T> to cartpole step method in tests
ooples 9d2b0dc
feat: complete comprehensive RL policy architecture
claude 3a7b49a
fix: use vector<T> instead of tensor<T> in uniformreplaybuffertests
ooples 8eda795
fix: remove epsilongreedypolicytests for non-existent type
ooples 64fe6a7
docs: add comprehensive documentation to DiscretePolicyOptions and Co…
claude cbf5e8a
Merge remote-tracking branch 'origin/claude/pr-481-followup-01S72jB7k…
ooples a1a77d7
fix: complete production-ready fixes for qlambdaagent with all 6 issu…
ooples 3d4f48a
fix: resolve all 6 critical issues in muzeroagent implementation
ooples 6cf111c
fix: format predict method in duelingdqnagent for proper code structure
ooples 81f933f
fix(rl): complete dreamer agent - all 9 pr review issues addressed
ooples 309333b
fix(rl): complete agents 2-10 - all 47 pr review issues addressed
ooples 520090d
fix(RL): implement agents 11-12 fixes (11 issues, 3 critical)
ooples 44b33fb
fix(sarsa-lambda): implement serialization, fix clone, add random ins…
ooples bafd20b
fix(monte-carlo): implement serialization, fix clone, add random inst…
ooples e0b4595
fix: implement production fixes for sarsaagent (agent #16/17)
ooples fdb5955
fix(rl): address misc agent issues in dreameroptions, iql, td3, cartp…
ooples 0a76c48
fix(q-learning): implement production fixes for doubleq, nstep, lstd …
ooples fd8721f
fix(monte-carlo): implement production fixes for first-visit, on-poli…
ooples 86f68bc
fix(monte-carlo): complete production fixes for on-policy and every-v…
ooples c039bb1
fix(n-step-sarsa): implement production fixes for n-step sarsa agent …
ooples 59c59a2
fix(dynamic-programming): implement production fixes for all dp agent…
ooples 8691525
fix(qlambda): implement production fixes for q(lambda) agent (#28)
ooples 4f414db
fix: implement planning agents production-ready fixes
ooples 0653f0d
fix: implement bandit agents production-ready fixes
ooples 915a7ff
fix: implement tabular qlearning production-ready fixes
ooples 647388a
fix: add override keywords to policy class methods
ooples a91ec81
fix: correct policy dispose method hiding
ooples 43b37ac
fix: remove empty dispose methods from policy classes
ooples ab3d1ad
fix: initialize random fields in bandit agents
ooples fe2d24d
fix: initialize _random field in bandit agent constructors
ooples 058b8d4
fix: add validation to dreamer options and implement serialization fo…
ooples b3dc534
fix: preserve learned arm statistics in thompsonsampling clone
ooples 2d9185a
fix: update setparameters to preserve qtable state keys
ooples 403fe32
fix: implement PPO clipped objective with importance sampling ratio
ooples 9e8630f
fix: implement serialization and clone for watkinsqlambda
ooples 6e06267
fix: add previous action field to decisiontransformer trajectory buffer
ooples f655ae5
fix(ucb-bandit): clone now copies learned state (q-values, counts, st…
ooples 41ad89e
fix(lspi): implement serialize/deserialize for model persistence
ooples 53f7f27
fix: correct mcts backup to compute returns before updating qvalues
ooples 8cf33bb
refactor: clarify setparameters logic in sarsaagent
ooples 487858c
fix: use shared random instance from base class
ooples e1b68b5
fix: make td3options inherit from base and use init-only properties
ooples d61efd2
fix(modified-pi): normalize probabilities to prevent blow-up
ooples 709ec33
fix: add validation for numarms in epsilongreedybandit
ooples f5f4a28
fix: implement deep copy of q-table in tabularqlearningagent clone me…
ooples 94281c7
fix: preserve preferences and baseline in gradientbandit clone
ooples 1528148
fix(n-step-q): epsilon decay per episode instead of per step
ooples 52971aa
fix: use greedy action for n-step sarsa bootstrap value
ooples 3cc93d3
fix: clear existing transitions before adding new one in deterministi…
ooples a704330
fix: copy network parameters in rainbowdqn clone
ooples ade3dc7
fix(qmix): implement proper td gradient flow through mixer and agents
ooples 9c91db6
fix: implement applygradients for linear q-learning agent
ooples 7c4595b
fix(trpo): correctly use nextstate from trajectory buffer
ooples a7cdd07
fix: make sac applygradients throw notsupportedexception
ooples cb66489
docs(sac-options): verify and document mse loss correctness
ooples daa8e1f
fix: add validation for nstepqlearningoptions properties
ooples cb193d0
docs(dreamer): clarify representation network training
ooples ba2bd67
docs(expected-sarsa): clarify deep copy in clone method
ooples d63237a
fix: initialize discountfactor in a3coptions constructor
ooples 101bcfc
fix: lspiagent clone now copies learned weights and samples
ooples bc9ff09
fix: cql policy gradient now includes variance action gradient component
ooples cdea3a3
fix: dreamer gradient calculations for dynamics, representation, and …
ooples e5d7e6a
fix: modified policy iteration serialization now handles tuple transi…
ooples 2af32f7
fix: expectedsarsa modelmetadata now includes featurecount and comple…
ooples 652db72
fix: qmix agent serialization and save/load now properly implemented
ooples 9d5c324
fix: prevent negative infinity in thompson sampling beta distribution
ooples c35daf2
fix: tabular qlearning setparameters now preserves state keys
ooples d0ccd86
fix: resolve 10 P0 critical issues for agent 5 work package
ooples 327243e
fix: correct weighted importance sampling and watkins q lambda trace …
ooples df48c1a
fix: add maddpg validation and prevent a2c null options crash in base…
ooples 54669d4
fix(dreamer): correct batch processing and gradient calculations
ooples 9f0fc48
style(dreamer): remove fix comments from code
ooples 2dec9f3
fix(maddpg): use target actors for target Q computation
ooples 5e67039
fix(qmix): enforce monotonicity in mixing network weights
ooples 18c322a
fix(dqn): sync target network in setparameters
ooples da35264
fix(sarsa): setparameters cannot restore Q-values without state keys
ooples 6ca2e95
fix(doubleqlearning): setparameters cannot restore without state info…
ooples d50af0d
fix(decisiontransformer): use optimizer for gradient application
ooples eb23984
fix(nstepqlearning): setparameters cannot restore without state infor…
ooples 7fbee16
style(epsilongreedybanditoptions): remove unused using directives
ooples 498cbe8
style(sarsalambdaoptions): remove unused using directives
ooples 9027482
fix(dynaq): applygradients throws exception for unsupported operation
ooples 5d10250
fix(dreamer): applygradients throws exception for multi-network compl…
ooples d3ec55a
fix(montecarlo): savemodel/loadmodel throw exception for unsupported …
ooples 73131cf
fix: remove resetgradients call from ddpg agent
ooples afd239d
fix: correct ppoagent cliprange property and linearsarsaagent deseria…
ooples 386e34d
fix: add comprehensive null checks for td3agent discountfactor and mi…
ooples c289924
chore: remove investigation/report files and temporary scripts per CL…
ooples 6905d24
Delete POLICY_BASE_CLASSES_PROMPT.md
ooples 0af9c52
Delete src/ReinforcementLearning/INTEGRATION_PLAN.md
ooples c3e785a
Delete fix-ppo-rainbow-dueling-muzero.sh
ooples 725e524
fix: resolve merge conflicts with master
ooples 03ed1fd
fix: add savestate and loadstate methods to reinforcement learning ag…
ooples 3d2ca7e
Merge branch 'claude/fix-issue-394-011CV3HkgfwwbaSAdrzrKd58' of https…
ooples ed16f3e
fix: add missing using directive for jsonconvert in lspi agent
ooples a8b244d
fix: add validation for state and action size in expected sarsa options
ooples 5a932d9
fix: add defensive validation for expectedsarsaagent options
ooples e073989
fix: add constructor validation to expectedsarsaoptions
ooples e074d7a
fix: add deployment configuration to rl training path
ooples c297646
fix: replace dummy gradient vector with notsupportedexception in duel…
ooples e72d71e
fix: change applygradients to throw notsupportedexception in doubledq…
ooples eea9928
fix: preserve isfirstaction flag in montecarloexploringstarts clone m…
ooples bc6b317
fix: serialize/deserialize _isfirstaction flag in montecarloexploring…
ooples 2801f59
fix: implement proper maddpg critic gradient descent updates
ooples e1b91ba
fix: throw notsupportedexception in maddpg applygradients
ooples 873a506
fix: use base class random instance in everyvisitmontecarlo constructor
ooples e675b41
fix: use base random instance in firstvisitmontecarlo constructor
ooples 2d47db4
fix: use base random instance in montecarloexploringstarts constructor
ooples 790a1af
fix: use base class random in sarsalambdaagent
ooples 72b7ef3
fix: add options validation in iql agent constructor
ooples e753aaa
fix: add options validation to maddpg agent constructor
ooples 8928771
fix: pass seeded random to get normal random in iql agent
ooples 32f6bf7
fix: add random parameter to mathhelper get normal random
ooples 06ae821
fix: add validate method to iql options
ooples fa1840c
fix: use network.getgradients() for parameter updates in dqn and doub…
ooples b56650a
feat: add computeaverage helper method to firstvisitmontecarloagent
ooples 367dc89
fix: implement proper deterministic policy gradient in maddpg actor u…
ooples 776ca19
feat: include target networks in maddpg get/setparameters and synchro…
ooples f5f7ce2
fix: validate gradient vector length in dqn applygradients to prevent…
ooples f0fef0d
fix: increase state key precision from f4 to f8 in firstvisitmontecar…
ooples 87612a2
fix(maddpg): implement per-agent reward tracking for competitive scen…
ooples 39b6c3a
fix(maddpg): correct critic gradient application using per-parameter …
ooples 401e9b4
fix(maddpg): use backpropagate return value for input gradients and c…
ooples 471fc18
fix(maddpg): rename inner loop variable to avoid shadowing outer scop…
ooples File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
77 changes: 0 additions & 77 deletions
77
CUserscheatsourcereposAiDotNet.githubISSUE_333_CRITICAL_FINDINGS.md
This file was deleted.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,14 @@ | ||
| // Compatibility shim for init-only setters in .NET Framework 4.6.2 | ||
| // This type is required for C# 9+ init accessors to work in older frameworks | ||
| // See: https://github.com/dotnet/runtime/issues/45510 | ||
|
|
||
| namespace System.Runtime.CompilerServices | ||
| { | ||
| /// <summary> | ||
| /// Reserved for use by the compiler for tracking metadata. | ||
| /// This class allows the use of init-only setters in .NET Framework 4.6.2. | ||
| /// </summary> | ||
| internal static class IsExternalInit | ||
| { | ||
| } | ||
| } | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.