- "import numpy as np\n\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.datasets import load_iris\nfrom sklearn.tree import DecisionTreeClassifier\n\niris = load_iris()\nX = iris.data\ny = iris.target\nX_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)\n\nestimator = DecisionTreeClassifier(max_leaf_nodes=3, random_state=0)\nestimator.fit(X_train, y_train)\n\n# The decision estimator has an attribute called tree_ which stores the entire\n# tree structure and allows access to low level attributes. The binary tree\n# tree_ is represented as a number of parallel arrays. The i-th element of each\n# array holds information about the node `i`. Node 0 is the tree's root. NOTE:\n# Some of the arrays only apply to either leaves or split nodes, resp. In this\n# case the values of nodes of the other type are arbitrary!\n#\n# Among those arrays, we have:\n# - left_child, id of the left child of the node\n# - right_child, id of the right child of the node\n# - feature, feature used for splitting the node\n# - threshold, threshold value at the node\n#\n\n# Using those arrays, we can parse the tree structure:\n\nn_nodes = estimator.tree_.node_count\nchildren_left = estimator.tree_.children_left\nchildren_right = estimator.tree_.children_right\nfeature = estimator.tree_.feature\nthreshold = estimator.tree_.threshold\n\n\n# The tree structure can be traversed to compute various properties such\n# as the depth of each node and whether or not it is a leaf.\nnode_depth = np.zeros(shape=n_nodes)\nis_leaves = np.zeros(shape=n_nodes, dtype=bool)\nstack = [(0, -1)] # seed is the root node id and its parent depth\nwhile len(stack) > 0:\n node_id, parent_depth = stack.pop()\n node_depth[node_id] = parent_depth + 1\n\n # If we have a test node\n if (children_left[node_id] != children_right[node_id]):\n stack.append((children_left[node_id], parent_depth + 1))\n stack.append((children_right[node_id], parent_depth + 1))\n else:\n is_leaves[node_id] = True\n\nprint(\"The binary tree structure has %s nodes and has \"\n \"the following tree structure:\"\n % n_nodes)\nfor i in range(n_nodes):\n if is_leaves[i]:\n print(\"%snode=%s leaf node.\" % (node_depth[i] * \"\\t\", i))\n else:\n print(\"%snode=%s test node: go to node %s if X[:, %s] <= %ss else to \"\n \"node %s.\"\n % (node_depth[i] * \"\\t\",\n i,\n children_left[i],\n feature[i],\n threshold[i],\n children_right[i],\n ))\nprint()\n\n# First let's retrieve the decision path of each sample. The decision_path\n# method allows to retrieve the node indicator functions. A non zero element of\n# indicator matrix at the position (i, j) indicates that the sample i goes\n# through the node j.\n\nnode_indicator = estimator.decision_path(X_test)\n\n# Similarly, we can also have the leaves ids reached by each sample.\n\nleave_id = estimator.apply(X_test)\n\n# Now, it's possible to get the tests that were used to predict a sample or\n# a group of samples. First, let's make it for the sample.\n\nsample_id = 0\nnode_index = node_indicator.indices[node_indicator.indptr[sample_id]:\n node_indicator.indptr[sample_id + 1]]\n\nprint('Rules used to predict sample %s: ' % sample_id)\nfor node_id in node_index:\n if leave_id[sample_id] != node_id:\n continue\n\n if (X_test[sample_id, feature[node_id]] <= threshold[node_id]):\n threshold_sign = \"<=\"\n else:\n threshold_sign = \">\"\n\n print(\"decision id node %s : (X[%s, %s] (= %s) %s %s)\"\n % (node_id,\n sample_id,\n feature[node_id],\n X_test[i, feature[node_id]],\n threshold_sign,\n threshold[node_id]))\n\n# For a group of samples, we have the following common node.\nsample_ids = [0, 1]\ncommon_nodes = (node_indicator.toarray()[sample_ids].sum(axis=0) ==\n len(sample_ids))\n\ncommon_node_id = np.arange(n_nodes)[common_nodes]\n\nprint(\"\\nThe following samples %s share the node %s in the tree\"\n % (sample_ids, common_node_id))\nprint(\"It is %s %% of all nodes.\" % (100 * len(common_node_id) / n_nodes,))"
0 commit comments