Skip to content

Commit e4d8870

Browse files
committed
Rebuild dev docs at master=5a58e56
1 parent d67799d commit e4d8870

File tree

234 files changed

+2237
-1369
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

234 files changed

+2237
-1369
lines changed
Lines changed: 134 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
"""
2+
=========================================
3+
Understanding the decision tree structure
4+
=========================================
5+
6+
The decision tree structure can be analysed to gain further insight on the
7+
relation between the features and the target to predict. In this example, we
8+
show how to retrieve:
9+
10+
- the binary tree structure;
11+
- the depth of each node and whether or not it's a leaf;
12+
- the nodes that were reached by a sample using the ``decision_path`` method;
13+
- the leaf that was reached by a sample using the apply method;
14+
- the rules that were used to predict a sample;
15+
- the decision path shared by a group of samples.
16+
17+
"""
18+
import numpy as np
19+
20+
from sklearn.cross_validation import train_test_split
21+
from sklearn.datasets import load_iris
22+
from sklearn.tree import DecisionTreeClassifier
23+
24+
iris = load_iris()
25+
X = iris.data
26+
y = iris.target
27+
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
28+
29+
estimator = DecisionTreeClassifier(max_leaf_nodes=3, random_state=0)
30+
estimator.fit(X_train, y_train)
31+
32+
# The decision estimator has an attribute called tree_ which stores the entire
33+
# tree structure and allows access to low level attributes. The binary tree
34+
# tree_ is represented as a number of parallel arrays. The i-th element of each
35+
# array holds information about the node `i`. Node 0 is the tree's root. NOTE:
36+
# Some of the arrays only apply to either leaves or split nodes, resp. In this
37+
# case the values of nodes of the other type are arbitrary!
38+
#
39+
# Among those arrays, we have:
40+
# - left_child, id of the left child of the node
41+
# - right_child, id of the right child of the node
42+
# - feature, feature used for splitting the node
43+
# - threshold, threshold value at the node
44+
#
45+
46+
# Using those arrays, we can parse the tree structure:
47+
48+
n_nodes = estimator.tree_.node_count
49+
children_left = estimator.tree_.children_left
50+
children_right = estimator.tree_.children_right
51+
feature = estimator.tree_.feature
52+
threshold = estimator.tree_.threshold
53+
54+
55+
# The tree structure can be traversed to compute various properties such
56+
# as the depth of each node and whether or not it is a leaf.
57+
node_depth = np.zeros(shape=n_nodes)
58+
is_leaves = np.zeros(shape=n_nodes, dtype=bool)
59+
stack = [(0, -1)] # seed is the root node id and its parent depth
60+
while len(stack) > 0:
61+
node_id, parent_depth = stack.pop()
62+
node_depth[node_id] = parent_depth + 1
63+
64+
# If we have a test node
65+
if (children_left[node_id] != children_right[node_id]):
66+
stack.append((children_left[node_id], parent_depth + 1))
67+
stack.append((children_right[node_id], parent_depth + 1))
68+
else:
69+
is_leaves[node_id] = True
70+
71+
print("The binary tree structure has %s nodes and has "
72+
"the following tree structure:"
73+
% n_nodes)
74+
for i in range(n_nodes):
75+
if is_leaves[i]:
76+
print("%snode=%s leaf node." % (node_depth[i] * "\t", i))
77+
else:
78+
print("%snode=%s test node: go to node %s if X[:, %s] <= %ss else to "
79+
"node %s."
80+
% (node_depth[i] * "\t",
81+
i,
82+
children_left[i],
83+
feature[i],
84+
threshold[i],
85+
children_right[i],
86+
))
87+
print()
88+
89+
# First let's retrieve the decision path of each sample. The decision_path
90+
# method allows to retrieve the node indicator functions. A non zero element of
91+
# indicator matrix at the position (i, j) indicates that the sample i goes
92+
# through the node j.
93+
94+
node_indicator = estimator.decision_path(X_test)
95+
96+
# Similarly, we can also have the leaves ids reached by each sample.
97+
98+
leave_id = estimator.apply(X_test)
99+
100+
# Now, it's possible to get the tests that were used to predict a sample or
101+
# a group of samples. First, let's make it for the sample.
102+
103+
sample_id = 0
104+
node_index = node_indicator.indices[node_indicator.indptr[sample_id]:
105+
node_indicator.indptr[sample_id + 1]]
106+
107+
print('Rules used to predict sample %s: ' % sample_id)
108+
for node_id in node_index:
109+
if leave_id[sample_id] != node_id:
110+
continue
111+
112+
if (X_test[sample_id, feature[node_id]] <= threshold[node_id]):
113+
threshold_sign = "<="
114+
else:
115+
threshold_sign = ">"
116+
117+
print("decision id node %s : (X[%s, %s] (= %s) %s %s)"
118+
% (node_id,
119+
sample_id,
120+
feature[node_id],
121+
X_test[i, feature[node_id]],
122+
threshold_sign,
123+
threshold[node_id]))
124+
125+
# For a group of samples, we have the following common node.
126+
sample_ids = [0, 1]
127+
common_nodes = (node_indicator.toarray()[sample_ids].sum(axis=0) ==
128+
len(sample_ids))
129+
130+
common_node_id = np.arange(n_nodes)[common_nodes]
131+
132+
print("\nThe following samples %s share the node %s in the tree"
133+
% (sample_ids, common_node_id))
134+
print("It is %s %% of all nodes." % (100 * len(common_node_id) / n_nodes,))

dev/_images/unveil_tree_structure.png

3.03 KB
Loading
3.03 KB
Loading

dev/_sources/auto_examples/index.txt

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5442,6 +5442,31 @@ Examples concerning the :mod:`sklearn.tree` module.
54425442
tree/plot_iris
54435443

54445444

5445+
5446+
.. raw:: html
5447+
5448+
<div class="thumbnailContainer" tooltip="The decision tree structure can be analysed to gain further insight on the relation between the...">
5449+
5450+
.. only:: html
5451+
5452+
.. figure:: tree/images/thumb/unveil_tree_structure.png
5453+
:target: ./tree/unveil_tree_structure.html
5454+
5455+
:ref:`example_tree_unveil_tree_structure.py`
5456+
5457+
5458+
.. raw:: html
5459+
5460+
</div>
5461+
5462+
5463+
5464+
.. toctree::
5465+
:hidden:
5466+
5467+
tree/unveil_tree_structure
5468+
5469+
54455470
.. raw:: html
54465471

54475472
<div class="clearer"></div>
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
2+
3+
.. _example_tree_unveil_tree_structure.py:
4+
5+
6+
=========================================
7+
Understanding the decision tree structure
8+
=========================================
9+
10+
The decision tree structure can be analysed to gain further insight on the
11+
relation between the features and the target to predict. In this example, we
12+
show how to retrieve:
13+
14+
- the binary tree structure;
15+
- the depth of each node and whether or not it's a leaf;
16+
- the nodes that were reached by a sample using the ``decision_path`` method;
17+
- the leaf that was reached by a sample using the apply method;
18+
- the rules that were used to predict a sample;
19+
- the decision path shared by a group of samples.
20+
21+
22+
23+
**Python source code:** :download:`unveil_tree_structure.py <unveil_tree_structure.py>`
24+
25+
.. literalinclude:: unveil_tree_structure.py
26+
:lines: 18-
27+

dev/auto_examples/index.html

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1535,6 +1535,12 @@ <h2>Tutorial exercises<a class="headerlink" href="#tutorial-exercises" title="Pe
15351535
</div>
15361536
</div><div class="toctree-wrapper compound">
15371537
</div>
1538+
<div class="thumbnailContainer" tooltip="The decision tree structure can be analysed to gain further insight on the relation between the..."><div class="figure">
1539+
<a class="reference external image-reference" href="./tree/unveil_tree_structure.html"><img alt="../_images/unveil_tree_structure.png" src="../_images/unveil_tree_structure.png" /></a>
1540+
<p class="caption"><a class="reference internal" href="tree/unveil_tree_structure.html#example-tree-unveil-tree-structure-py"><em>Understanding the decision tree structure</em></a></p>
1541+
</div>
1542+
</div><div class="toctree-wrapper compound">
1543+
</div>
15381544
<div class="clearer"></div></div>
15391545
</div>
15401546

dev/auto_examples/tree/plot_iris.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@
3636
<link rel="author" title="About these documents" href="../../about.html" />
3737
<link rel="top" title="scikit-learn 0.18.dev0 documentation" href="../../index.html" />
3838
<link rel="up" title="Examples" href="../index.html" />
39-
<link rel="next" title="API Reference" href="../../modules/classes.html" />
39+
<link rel="next" title="Understanding the decision tree structure" href="unveil_tree_structure.html" />
4040
<link rel="prev" title="Multi-output Decision Tree Regression" href="plot_tree_regression_multioutput.html" />
4141

4242

0 commit comments

Comments
 (0)