Skip to content

Commit 42a9943

Browse files
authored
move calc_H_D out for loop
calc_H_D 在for循环中值不变,这样只需在之前计算一遍即可。测试可以快100s
1 parent 158a9ca commit 42a9943

File tree

1 file changed

+3
-2
lines changed

1 file changed

+3
-2
lines changed

DecisionTree/DecisionTree.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -142,11 +142,12 @@ def calcBestFeature(trainDataList, trainLabelList):
142142
maxG_D_A = -1
143143
#初始化最大信息增益的特征
144144
maxFeature = -1
145+
#1.计算数据集D的经验熵H(D)
146+
H_D = calc_H_D(trainLabelArr)
145147
#对每一个特征进行遍历计算
146148
for feature in range(featureNum):
147149
#“5.2.2 信息增益”中“算法5.1(信息增益的算法)”第一步:
148-
#1.计算数据集D的经验熵H(D)
149-
H_D = calc_H_D(trainLabelArr)
150+
150151
#2.计算条件经验熵H(D|A)
151152
#由于条件经验熵的计算过程中只涉及到标签以及当前特征,为了提高运算速度(全部样本
152153
#做成的矩阵运算速度太慢,需要剔除不需要的部分),将数据集矩阵进行切割

0 commit comments

Comments
 (0)