您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

[machine learning] implementing ID3 decision tree in Python

編輯：Python

List of articles

- Article reference
- ID3 Decision tree
- - - 1、 Test data set
    - 2、 Information entropy
    - 3、 Information gain
    - 4、 The construction of decision tree
    - 5、 Use the decision tree to make decisions
    - 6、 Decision tree source code
    - 7、 Decision tree visualization
- Future period

Article reference

https://www.cnblogs.com/kanjian2016/p/7746005.html
https://blog.csdn.net/asialee_bird/article/details/81118245

ID3 Decision tree

1、 Test data set

The weather temperature humidity The wind speed Activities Fine scorching hot high weak Cancel Fine scorching hot high strong Cancel Yin scorching hot high weak Conduct rain Moderate high weak Conduct rain cold normal weak Conduct rain cold normal strong Cancel Yin cold normal strong Conduct Fine Moderate high weak Cancel Fine cold normal weak Conduct rain Moderate normal weak Conduct Fine Moderate normal strong Conduct Yin Moderate high strong Conduct Yin scorching hot normal weak Conduct rain Moderate high strong Cancel

The code is created as follows ：

''' Create test data sets '''
def createDataset():
dataSet = [['sunny', 'hot', 'high', 'weak', 'no'],
['sunny', 'hot', 'high', 'strong', 'no'],
['overcast', 'hot', 'high', 'weak', 'yes'],
['rain', 'mild', 'high', 'weak', 'yes'],
['rain', 'cool', 'normal', 'weak', 'yes'],
['rain', 'cool', 'normal', 'strong', 'no'],
['overcast', 'cool', 'normal', 'strong', 'yes'],
['sunny', 'mild', 'high', 'weak', 'no'],
['sunny', 'cool', 'normal', 'weak', 'yes'],
['rain', 'mild', 'normal', 'weak', 'yes'],
['sunny', 'mild', 'normal', 'strong', 'yes'],
['overcast', 'mild', 'high', 'strong', 'yes'],
['overcast', 'hot', 'normal', 'weak', 'yes'],
['rain', 'mild', 'high', 'strong', 'no']] # Data sets 
labels = ['outlook', 'temperature', 'humidity', 'wind'] # Classification properties 
return dataSet, labels

The table created by the code corresponds to the above table one by one ！

2、 Information entropy

Entropy usually indicates the degree of chaos of things , The greater the entropy, the greater the degree of chaos , Smaller means less confusion .

Suppose the current sample set 𝐷 contain ℓ Class sample , Among them the first 𝑘 The proportion of class samples is 𝑝𝑘 (𝑘 = 1, 2, · · · , ℓ), be 𝐷 The entropy of information is defined as ：

Take whether the activity is going on , In the activity column attribute , There are two values for activities ： Cancel (5 individual ) And carry on (9 individual ), You can calculate H( Activities ), The information entropy of the activity ：

The code operation is as follows ：

''' Calculate Shannon entropy '''
def shannonEnt(dataSet):
len_dataSet = len(dataSet) # Get the number of rows in the dataset 
labelCounts = {
} # Create a dictionary , Used to calculate , The number of occurrences of each attribute value 
shannonEnt = 0.0 # Let the initial value of Shannon entropy be 0
for element in dataSet: # Analyze each piece of data one by one 
currentLabel = element[-1] # Extract attribute value information 
if currentLabel not in labelCounts.keys(): # Take the attribute name as labelCounts Of this dictionary key
labelCounts[currentLabel] = 0 # Set the initial value of the dictionary value by 0
labelCounts[currentLabel] += 1 # value Gradually increase the value by one , To achieve the function of counting the number of tag occurrences 
for key in labelCounts: # Traversing the dictionary key
proportion = float(labelCounts[key])/len_dataSet
shannonEnt -= proportion*log(proportion, 2) # According to the formula, the Shannon entropy 
print(' The number of times the attribute value appears ：{}'.format(labelCounts))
print(' The information entropy of the activity is ：{}'.format(shannonEnt))
return shannonEnt

3、 Information gain

Suppose discrete properties 𝑎 Yes 𝑚 Possible values {𝑎1, 𝑎2, · · · , 𝑎𝑚}, If you use 𝑎 Come on Sample set 𝐷 division , Will produce 𝑚 Branch nodes , Among them the first 𝑖 Branch nodes contain 了 𝐷 All in attributes 𝑎 The upper value is 𝑎𝑖 The sample of , Write it down as 𝐷𝑖. It can be calculated according to the formula Out 𝐷𝑖 The entropy of information , Then consider that the number of samples contained in different branch nodes is different , To branch Nodes are weighted |𝐷𝑖|/|𝐷|, That is, the greater the number of samples, the greater the influence of branch nodes , Then we can calculate the attribute 𝑎 For the sample set 𝐷 The result of division “ Information gain .

In fact, it is information entropy minus conditional entropy .
Take the information gain of weather attributes as an example
There are three values for weather attributes ： Fine （5 individual ）、 Yin （4 individual ）、 rain （5 individual ）

When the weather is sunny
There are two values for activities ： Cancel （3 individual ） And carry on （2 individual ）
Calculating conditional entropy when the weather is sunny ：
When the weather is overcast
The value of an activity has one type ： Conduct （4 individual ）
Calculate the conditional entropy when the weather is overcast ：
When the weather is rainy
There are two values for activities ： Cancel （2 individual ） And carry on （3 individual ）
Calculating conditional entropy when the weather is sunny ：

From this, we can calculate the weighted average entropy when weather attributes are divided into different values ：

So information gain Gain( Activities , The weather ) by ：

Empathy , You can calculate that Gain( Activities , temperature )、Gain( Activities , humidity )、Gain( Activities , The wind speed ), By comparing the size , Get the maximum information gain , The attribute corresponding to the maximum information gain is selected as the root node of the decision tree .

The code is calculated as follows ：

''' Divide the data set , So it is more convenient to calculate the conditional entropy dataSet: Data sets i: Attributes that divide the data set （ Such as ： The weather ） The index of （0） value: The value of the attribute to be returned （ Such as ： a sunny day ） '''
def splitDataSet(dataSet, i, value):
splitDataSet = [] # Create a list , To hold Divided data set 
for example in dataSet: # Traverse a given data set 
if example[i] == value:
splitExample = example[:i]
splitExample.extend(example[i+1:])
splitDataSet.append(splitExample) # Get rid of i Attribute column , Generate new datasets , That is, divided data sets 
return splitDataSet # Get the partitioned data set 
''' Calculated information gain , So as to select the optimal attribute （ label ） '''
def chooseBestFeature(dataSet):
numFeature = len(dataSet[0]) - 1 # Find the number of attributes 
baseEntropy = shannonEnt(dataSet) # Shannon entropy of the test data set , Information entropy 
bestInfoGain = 0.0 # Create initial maximum information gain , Used to get the final maximum information gain 
bestFeature = -1 # Used to get the attribute corresponding to the maximum information gain An index in a dataset ; among -1, It could be any number （ The range of numbers ： Less than 0 Or greater than or equal to numFeature）
for i in range(numFeature):
featList = [example[i] for example in dataSet] # Get the first i Attributes , All corresponding attribute values 
featValue = set(featList) # Create a set aggregate ( The elements in the set are not repeatable ), It is easier to see all attribute values 
newEntropy = 0.0 # Create initial conditional entropy , The initial value is 0
for value in featValue: # Traverse each attribute value 
subDataSet = splitDataSet(dataSet, i, value) # Call function , Divide the data set 
proportion = float(len(subDataSet)/len(dataSet))
newEntropy += proportion*shannonEnt(subDataSet) # Calculate the conditional entropy by formula 
infoGain = baseEntropy - newEntropy # Calculate the information gain by formula 
print(' attribute %s The information gain of is %.3f' % (labels[i], infoGain)) # Print the information gain corresponding to each attribute 
if infoGain > bestInfoGain: # By comparison , Select the index of the maximum information gain and its corresponding attributes in the data set 
bestInfoGain = infoGain
bestFeature = i
print(' The attribute corresponding to the maximum gain is ：%s' % labels[bestFeature])
return bestFeature

Diagram of function

4、 The construction of decision tree

ID3 Decision tree algorithm ：

(1) If 𝐷 All instances in belong to the same class 𝐶𝑘, be 𝑇 It is a single node tree , And 𝐶𝑘 As the class tag of the node , return 𝑇 .
(2) If 𝐴 = ∅, be 𝑇 It is a single node tree , And will 𝐷 The class with the largest number of categories in 𝐶𝑘 As the class tag of the node , return 𝑇 . otherwise , Use the formula to calculate 𝐴 Each attribute pair in 𝐷 Information gain of , Select the attribute with the most information gain 𝐴𝑔.
(3) If 𝐴𝑔 The information gain of is less than the threshold 𝜀, be 𝑇 It is a single node tree , And will 𝐷 The class with the largest number of categories in 𝐶𝑘 As the class tag of the node , return 𝑇 . otherwise , Yes 𝐴𝑔 Each of them can Emergy 𝑎𝑖, In accordance with the 𝐴𝑔 = 𝑎𝑖 take 𝐷 Split into several non empty subsets 𝐷𝑖, take 𝐷𝑖 The maximum number of categories in As a tag , Build child nodes , The tree is composed of nodes and their subtrees 𝑇 , return 𝑇 .
Right. 𝑖 Child nodes , With 𝐷𝑖 As the training set , With 𝐴∖{𝐴𝑔} For Attribute Collection , Recursive tone use (1)∼(3), Get the subtree 𝑇𝑖, return 𝑇𝑖.

''' Statistics classList The most common class tags in '''
def maxLabel(classList):
classCount = {
}
for vote in classList: # Statistics classCount The number of times an element appears in 
if vote not in classCount.keys():
classCount[vote] = 0
classCount += 1
# Sort in descending order according to the dictionary values , The result is a list , The elements in the list are tuples 
sortedClassCount = sorted(classCount.items(), key=operator.itemgetter(1), reverse=True)
return sortedClassCount[0][0] # return classCount The most frequently occurring element in 
''' Build decision tree '''
def creatTree(dataSet, labels, featLabels):
classList = [example[-1] for example in dataSet] # Get category labels （yes or no）
if classList.count(classList[0]) == len(classList): # If the classification labels are the same , Then stop dividing 
return classList[0]
if len(dataSet[0]) == 1: # If all the attributes are traversed , The category label that appears the most times in the result is returned 
return maxLabel(classList)
bestFeature = chooseBestFeature(dataSet) # Get the index of the attribute corresponding to the maximum information gain in the data set 
bestFeatureLabel = labels[bestFeature] # Get the attribute corresponding to the maximum information gain （ Such as ： The weather ）
featLabels.append(bestFeatureLabel)
myTree = {
bestFeatureLabel: {
}} # According to the maximum information gain of the label spanning tree 
del(labels[bestFeature]) # Delete already used attributes 
featureList = [example[bestFeature] for example in dataSet] # Get the attribute value of the optimal attribute in the data set （ Such as ： a sunny day , It's raining ）
featureValue = set(featureList) # Create set , Remove duplicate attribute values 
for value in featureValue:
subLabels = labels[:] # New attribute tag set （ And labels comparison , Remove the attribute tags that have been used ）
# Recursively call creatTree, To create a decision tree 
myTree[bestFeatureLabel][value] = creatTree(splitDataSet(dataSet, bestFeature, value), subLabels, featLabels)
# print(featLabels)
return myTree

5、 Use the decision tree to make decisions

''' Use the decision tree for classification '''
def classify(myTree, featLabels, testData):
# global classLabel
firstStr = next(iter(myTree)) # Get the root node of the decision tree 
# print(firstStr)
secondDict = myTree[firstStr] # Next Dictionary 
# print(secondDict)
featIndex = featLabels.index(firstStr) # Get the index corresponding to the root node attribute in the test data set 
# print(featIndex)
for key in secondDict.keys():
if testData[featIndex] == key:
if type(secondDict[key]).__name__ == 'dict': # type().__name__ The role of is , Judge data type 
classLabel = classify(secondDict[key], featLabels, testData)
else:
classLabel = secondDict[key]
return classLabel

6、 Decision tree source code

# -*- coding: UTF-8 -*-
""" # @Time: 2022/6/18 16:19 # @Author: Sleepy CV jun # @CSDN: https://blog.csdn.net/qq_44921056 """
from math import log
import operator
''' Create test data sets '''
def createDataset():
dataSet = [['sunny', 'hot', 'high', 'weak', 'no'],
['sunny', 'hot', 'high', 'strong', 'no'],
['overcast', 'hot', 'high', 'weak', 'yes'],
['rain', 'mild', 'high', 'weak', 'yes'],
['rain', 'cool', 'normal', 'weak', 'yes'],
['rain', 'cool', 'normal', 'strong', 'no'],
['overcast', 'cool', 'normal', 'strong', 'yes'],
['sunny', 'mild', 'high', 'weak', 'no'],
['sunny', 'cool', 'normal', 'weak', 'yes'],
['rain', 'mild', 'normal', 'weak', 'yes'],
['sunny', 'mild', 'normal', 'strong', 'yes'],
['overcast', 'mild', 'high', 'strong', 'yes'],
['overcast', 'hot', 'normal', 'weak', 'yes'],
['rain', 'mild', 'high', 'strong', 'no']] # Data sets 
labels = ['outlook', 'temperature', 'humidity', 'wind'] # Classification properties 
return dataSet, labels
''' Calculate Shannon entropy '''
def shannonEnt(dataSet):
len_dataSet = len(dataSet) # Get the number of rows in the dataset 
labelCounts = {
} # Create a dictionary , Used to calculate , The number of occurrences of each attribute value 
shannonEnt = 0.0 # Let the initial value of Shannon entropy be 0
for element in dataSet: # Analyze each piece of data one by one 
currentLabel = element[-1] # Extract attribute value information 
if currentLabel not in labelCounts.keys(): # Take the attribute name as labelCounts Of this dictionary key
labelCounts[currentLabel] = 0 # Set the initial value of the dictionary value by 0
labelCounts[currentLabel] += 1 # value Gradually increase the value by one , To achieve the function of counting the number of tag occurrences 
for key in labelCounts: # Traversing the dictionary key
proportion = float(labelCounts[key])/len_dataSet
shannonEnt -= proportion*log(proportion, 2) # According to the formula, the Shannon entropy 
# print(' The number of times the attribute value appears ：{}'.format(labelCounts))
# print(' The information entropy of the activity is ：{}'.format(shannonEnt))
return shannonEnt
''' Divide the data set , So it is more convenient to calculate the conditional entropy dataSet: Data sets i: Attributes that divide the data set （ Such as ： The weather ） The index of （0） value: The value of the attribute to be returned （ Such as ： a sunny day ） '''
def splitDataSet(dataSet, i, value):
splitDataSet = [] # Create a list , To hold Divided data set 
for example in dataSet: # Traverse a given data set 
if example[i] == value:
splitExample = example[:i]
splitExample.extend(example[i+1:])
splitDataSet.append(splitExample) # Get rid of i Attribute column , Generate new datasets , That is, divided data sets 
return splitDataSet # Get the partitioned data set 
''' Calculated information gain , So as to select the optimal attribute （ label ） '''
def chooseBestFeature(dataSet):
numFeature = len(dataSet[0]) - 1 # Find the number of attributes 
baseEntropy = shannonEnt(dataSet) # Shannon entropy of the test data set , Information entropy 
bestInfoGain = 0.0 # Create initial maximum information gain , Used to get the final maximum information gain 
bestFeature = -1 # Used to get the attribute corresponding to the maximum information gain An index in a dataset ; among -1, It could be any number （ The range of numbers ： Less than 0 Or greater than or equal to numFeature）
for i in range(numFeature):
featList = [example[i] for example in dataSet] # Get the first i Attributes , All corresponding attribute values 
featValue = set(featList) # Create a set aggregate ( The elements in the set are not repeatable ), It is easier to see all attribute values 
newEntropy = 0.0 # Create initial conditional entropy , The initial value is 0
for value in featValue: # Traverse each attribute value 
subDataSet = splitDataSet(dataSet, i, value) # Call function , Divide the data set 
proportion = float(len(subDataSet)/len(dataSet))
newEntropy += proportion*shannonEnt(subDataSet) # Calculate the conditional entropy by formula 
infoGain = baseEntropy - newEntropy # Calculate the information gain by formula 
# print(' attribute %s The information gain of is %.3f' % (labels[i], infoGain)) # Print the information gain corresponding to each attribute 
if infoGain > bestInfoGain: # By comparison , Select the index of the maximum information gain and its corresponding attributes in the data set 
bestInfoGain = infoGain
bestFeature = i
# print(' The attribute corresponding to the maximum gain is ：%s' % labels[bestFeature])
return bestFeature
''' Statistics classList The most common class tags in '''
def maxLabel(classList):
classCount = {
}
for vote in classList: # Statistics classCount The number of times an element appears in 
if vote not in classCount.keys():
classCount[vote] = 0
classCount += 1
# Sort in descending order according to the dictionary values , The result is a list , The elements in the list are tuples 
sortedClassCount = sorted(classCount.items(), key=operator.itemgetter(1), reverse=True)
return sortedClassCount[0][0] # return classCount The most frequently occurring element in 
''' Build decision tree '''
def creatTree(dataSet, labels, featLabels):
classList = [example[-1] for example in dataSet] # Get category labels （yes or no）
if classList.count(classList[0]) == len(classList): # If the classification labels are the same , Then stop dividing 
return classList[0]
if len(dataSet[0]) == 1: # If all the attributes are traversed , The category label that appears the most times in the result is returned 
return maxLabel(classList)
bestFeature = chooseBestFeature(dataSet) # Get the index of the attribute corresponding to the maximum information gain in the data set 
bestFeatureLabel = labels[bestFeature] # Get the attribute corresponding to the maximum information gain （ Such as ： The weather ）
featLabels.append(bestFeatureLabel)
myTree = {
bestFeatureLabel: {
}} # According to the maximum information gain of the label spanning tree 
del(labels[bestFeature]) # Delete already used attributes 
featureList = [example[bestFeature] for example in dataSet] # Get the attribute value of the optimal attribute in the data set （ Such as ： a sunny day , It's raining ）
featureValue = set(featureList) # Create set , Remove duplicate attribute values 
for value in featureValue:
subLabels = labels[:] # New attribute tag set （ And labels comparison , Remove the attribute tags that have been used ）
# Recursively call creatTree, To create a decision tree 
myTree[bestFeatureLabel][value] = creatTree(splitDataSet(dataSet, bestFeature, value), subLabels, featLabels)
# print(featLabels)
return myTree
''' Use the decision tree for classification '''
def classify(myTree, featLabels, testData):
firstStr = next(iter(myTree)) # Get the root node of the decision tree 
# print(firstStr)
secondDict = myTree[firstStr] # Next Dictionary 
# print(secondDict)
featIndex = featLabels.index(firstStr) # Get the index corresponding to the root node attribute in the test data set 
# print(featIndex)
for key in secondDict.keys():
if testData[featIndex] == key:
if type(secondDict[key]).__name__ == 'dict': # type().__name__ The role of is , Judge data type 
classLabel = classify(secondDict[key], featLabels, testData)
else:
classLabel = secondDict[key]
return classLabel
if __name__ == '__main__':
dataSet, labels = createDataset() # Create test data sets and their labels 
featLabels = [] # Used to store the best attribute value 
myTree = creatTree(dataSet, labels, featLabels) # Generate decision tree 
# print(' The decision tree is :{}'.format(myTree))
testData = ['sunny', 'hot', 'high', 'weak'] # Test data 
result = classify(myTree, featLabels, testData) # To test 
print(' The result of the decision ：{}'.format(result))

7、 Decision tree visualization

among demo Is the file name of the decision tree source code written in the previous part
The functions used in the drawing can be referred to ：https://matplotlib.org/stable/tutorials/text/annotations.html#sphx-glr-tutorials-text-annotations-py

# -*- coding: UTF-8 -*-
""" # @Time: 2022/6/19 16:44 # @Author: Sleepy CV jun # @CSDN: https://blog.csdn.net/qq_44921056 """
from matplotlib.font_manager import FontProperties
import matplotlib.pyplot as plt
import demo
# Define the text box and arrow format 
decisionNode = dict(boxstyle='round4', fc='0.8')
leafNode = dict(boxstyle='circle', fc='0.8')
arrow_args = dict(arrowstyle='<-')
# Set Chinese font 
font = FontProperties(fname=r"C:\Windows\Fonts\simsun.ttc", size=14)
""" Function description : Get the number of leaf nodes of the decision tree Parameters: myTree - Decision tree Returns: numLeafs - The number of leaf nodes of the decision tree """
def getNumLeafs(myTree):
numLeafs = 0 # Initialize leaf 
# python3 in myTree.keys() The return is dict_keys, No, it's not list, So it can't be used myTree.keys()[0] Method to obtain node properties ,
# have access to list(myTree.keys())[0]
firstStr = next(iter(myTree))
secondDict = myTree[firstStr] # Get the next set of dictionaries 
for key in secondDict.keys():
if type(secondDict[key]).__name__ == 'dict': # Test whether the node is a dictionary , If it's not a dictionary , Represents that this node is a leaf node 
numLeafs += getNumLeafs(secondDict[key])
else:
numLeafs += 1
return numLeafs
""" Function description : Gets the number of layers of the decision tree Parameters: myTree - Decision tree Returns: maxDepth - The number of layers of the decision tree """
def getTreeDepth(myTree):
maxDepth = 0 # Initialize decision tree depth 
# python3 in myTree.keys() The return is dict_keys, No, it's not list, So it can't be used myTree.keys()[0] Method to obtain node properties ,
# have access to list(myTree.keys())[0]
firstStr = next(iter(myTree))
secondDict = myTree[firstStr] # Get the next Dictionary 
for key in secondDict.keys():
if type(secondDict[key]).__name__ == 'dict': # Test whether the node is a dictionary , If it's not a dictionary , Represents that this node is a leaf node 
thisDepth = 1 + getTreeDepth(secondDict[key])
else:
thisDepth = 1
if thisDepth > maxDepth:
maxDepth = thisDepth # Update the number of layers 
# print(maxDepth)
return maxDepth
""" Function description : Draw nodes Parameters: nodeTxt - Node name centerPt - Text location parentPt - The arrow position of the dimension nodeType - Node format """
def plotNode(nodeTxt, centerPt, parentPt, nodeType):
arrow_args = dict(arrow) # Define the arrow format 
font = FontProperties(fname=r"C:\Windows\Fonts\simsun.ttc", size=14) # Set Chinese font 
createPlot.ax1.annotate(nodeTxt, xy=parentPt, xycoords='axes fraction', # Draw nodes 
xytext=centerPt, textcoords='axes fraction',
va="center", ha="center", bbox=nodeType, arrowprops=arrow_args, fontproperties=font)
""" Function description : Label the directed edge attribute value Parameters: cntrPt、parentPt - Used to calculate dimension position txtString - What is marked """
def plotMidText(cntrPt, parentPt, txtString):
xMid = (parentPt[0] - cntrPt[0]) / 2.0 + cntrPt[0] # Calculate dimension position 
yMid = (parentPt[1] - cntrPt[1]) / 2.0 + cntrPt[1]
createPlot.ax1.text(xMid, yMid, txtString, va="center", ha="center", rotation=30)
""" Function description : Draw decision tree Parameters: myTree - Decision tree ( Dictionaries ) parentPt - What is marked nodeTxt - Node name """
def plotTree(myTree, parentPt, nodeTxt):
decisionNode = dict(box, fc="0.8") # Set node format 
leafNode = dict(box, fc="0.8") # Format leaf nodes 
numLeafs = getNumLeafs(myTree) # Get the number of decision leaf nodes , Determines the width of the tree 
depth = getTreeDepth(myTree) # Get the number of decision tree layers 
firstStr = next(iter(myTree)) # Next dictionary 
cntrPt = (plotTree.xOff + (1.0 + float(numLeafs)) / 2.0 / plotTree.totalW, plotTree.yOff) # Center position 
plotMidText(cntrPt, parentPt, nodeTxt) # Label the directed edge attribute value 
plotNode(firstStr, cntrPt, parentPt, decisionNode) # Draw nodes 
secondDict = myTree[firstStr] # Next Dictionary , That is, continue drawing child nodes 
plotTree.yOff = plotTree.yOff - 1.0 / plotTree.totalD # y The offset 
for key in secondDict.keys():
if type(secondDict[key]).__name__ == 'dict': # Test whether the node is a dictionary , If it's not a dictionary , Represents that this node is a leaf node 
plotTree(secondDict[key], cntrPt, str(key)) # Not leaf nodes , The recursive call continues to draw 
else: # If it's a leaf node , Draw leaf nodes , And marked with directed edge attribute value 
plotTree.xOff = plotTree.xOff + 1.0 / plotTree.totalW
plotNode(secondDict[key], (plotTree.xOff, plotTree.yOff), cntrPt, leafNode)
plotMidText((plotTree.xOff, plotTree.yOff), cntrPt, str(key))
plotTree.yOff = plotTree.yOff + 1.0 / plotTree.totalD
""" Function description : Create a paint panel Parameters: inTree - Decision tree ( Dictionaries ) """
def createPlot(inTree):
fig = plt.figure(1, facecolor='white') # establish fig
fig.clf() # Empty fig
axprops = dict(xticks=[], yticks=[])
createPlot.ax1 = plt.subplot(111, frameon=False, **axprops) # Get rid of x、y Axis 
plotTree.totalW = float(getNumLeafs(inTree)) # Get the number of decision leaf nodes 
plotTree.totalD = float(getTreeDepth(inTree)) # Get the number of decision tree layers 
plotTree.xOff = -0.5 / plotTree.totalW
plotTree.yOff = 1.0 # x The offset 
plotTree(inTree, (0.5, 1.0), '') # Draw decision tree 
plt.show()
if __name__ == '__main__':
dataSet, labels = demo.createDataset()
featLabels = []
myTree = demo.creatTree(dataSet, labels, featLabels)
createPlot(myTree)

Future period

This is the end of the article , But the story has no ending

If this article helps you , Remember to like it , It's also the greatest encouragement to the author .

If there are deficiencies, you can make more corrections in the comment area , I'll fix it as soon as I see it

author ： Sleepy CV jun
CSDN：https://blog.csdn.net/qq_44921056
This article is only for communication learning , Without the permission of the author , Prohibited reproduced , Let alone for other purposes , Offenders will investigate .