Skip to content

[BUG] initialize-q-learning: use only self-transition instead of cross-action transition #98

@harpomaxx

Description

@harpomaxx

● Looking at the code, the bug is in the initialize_q_value() method, specifically in lines 118-121:

Line 103: Sets the target action we want to initialize

action_type_str = str(action_type).split('.')[-1]

Lines 118-121: The problematic loop

prob_sum = 0
for action, count in action_counts.items():
action_type_str = action.split('.')[-1] # BUG: Line 120 overwrites the target!
prob_sum += self.transition_probabilities.get(action_type_str, {}).get(action_type_str, 0) * count

The specific problem:

  • Line 103 sets action_type_str to the target action (e.g., "ExfiltrateData")
  • Line 120 overwrites action_type_str with the source action (e.g., "FindData")
  • Line 121 then looks up transition_probabilities["FindData"]["FindData"] instead of transition_probabilities["FindData"]["ExfiltrateData"]

This makes the initialization use only self-transition probabilities instead of the intended cross-action transition probabilities.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions