[BUG]  initialize-q-learning: use only self-transition  instead of cross-action transition

● Looking at the code, the bug is in the initialize_q_value() method, specifically in lines 118-121:

  #### Line 103: Sets the target action we want to initialize
  action_type_str = str(action_type).split('.')[-1]

  #### Lines 118-121: The problematic loop
  prob_sum = 0
  for action, count in action_counts.items():
      action_type_str = action.split('.')[-1]  # BUG: Line 120 overwrites the target!
      prob_sum += self.transition_probabilities.get(action_type_str, {}).get(action_type_str, 0) * count

  The specific problem:
  - Line 103 sets action_type_str to the target action (e.g., "ExfiltrateData")
  - Line 120 overwrites action_type_str with the source action (e.g., "FindData")
  - Line 121 then looks up transition_probabilities["FindData"]["FindData"] instead of transition_probabilities["FindData"]["ExfiltrateData"]

  This makes the initialization use only self-transition probabilities instead of the intended cross-action transition probabilities.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] initialize-q-learning: use only self-transition instead of cross-action transition #98

Line 103: Sets the target action we want to initialize

Lines 118-121: The problematic loop

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] initialize-q-learning: use only self-transition instead of cross-action transition #98

Description

Line 103: Sets the target action we want to initialize

Lines 118-121: The problematic loop

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions