Skip to content

Commit eb14106

Browse files
committed
Pushing the docs to dev/ for branch: main, commit 6f36a860ec2d94e1aadc14cb09ea1dd7fe665787
1 parent 28b6598 commit eb14106

File tree

1,316 files changed

+6475
-5876
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,316 files changed

+6475
-5876
lines changed

dev/.buildinfo

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
# Sphinx build info version 1
22
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
3-
config: 837e3b264c6b0087245d25d256a33f76
3+
config: 863036b5c08a0041ad9992072466972c
44
tags: 645f666f9bcd5a90fca523b33c5a78b7
Binary file not shown.

dev/_downloads/6b00e458f3e282f1cc421f077b2fcad1/plot_spectral_biclustering.ipynb

Lines changed: 124 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
"cell_type": "markdown",
55
"metadata": {},
66
"source": [
7-
"\n# A demo of the Spectral Biclustering algorithm\n\nThis example demonstrates how to generate a checkerboard dataset and\nbicluster it using the Spectral Biclustering algorithm.\n\nThe data is generated with the ``make_checkerboard`` function, then\nshuffled and passed to the Spectral Biclustering algorithm. The rows\nand columns of the shuffled matrix are rearranged to show the\nbiclusters found by the algorithm.\n\nThe outer product of the row and column label vectors shows a\nrepresentation of the checkerboard structure.\n"
7+
"\n# A demo of the Spectral Biclustering algorithm\n\nThis example demonstrates how to generate a checkerboard dataset and bicluster\nit using the :class:`~sklearn.cluster.SpectralBiclustering` algorithm. The\nspectral biclustering algorithm is specifically designed to cluster data by\nsimultaneously considering both the rows (samples) and columns (features) of a\nmatrix. It aims to identify patterns not only between samples but also within\nsubsets of samples, allowing for the detection of localized structure within the\ndata. This makes spectral biclustering particularly well-suited for datasets\nwhere the order or arrangement of features is fixed, such as in images, time\nseries, or genomes.\n\nThe data is generated, then shuffled and passed to the spectral biclustering\nalgorithm. The rows and columns of the shuffled matrix are then rearranged to\nplot the biclusters found.\n"
88
]
99
},
1010
{
@@ -15,7 +15,129 @@
1515
},
1616
"outputs": [],
1717
"source": [
18-
"# Author: Kemal Eren <[email protected]>\n# License: BSD 3 clause\n\nimport numpy as np\nfrom matplotlib import pyplot as plt\n\nfrom sklearn.datasets import make_checkerboard\nfrom sklearn.cluster import SpectralBiclustering\nfrom sklearn.metrics import consensus_score\n\n\nn_clusters = (4, 3)\ndata, rows, columns = make_checkerboard(\n shape=(300, 300), n_clusters=n_clusters, noise=10, shuffle=False, random_state=0\n)\n\nplt.matshow(data, cmap=plt.cm.Blues)\nplt.title(\"Original dataset\")\n\n# shuffle clusters\nrng = np.random.RandomState(0)\nrow_idx = rng.permutation(data.shape[0])\ncol_idx = rng.permutation(data.shape[1])\ndata = data[row_idx][:, col_idx]\n\nplt.matshow(data, cmap=plt.cm.Blues)\nplt.title(\"Shuffled dataset\")\n\nmodel = SpectralBiclustering(n_clusters=n_clusters, method=\"log\", random_state=0)\nmodel.fit(data)\nscore = consensus_score(model.biclusters_, (rows[:, row_idx], columns[:, col_idx]))\n\nprint(\"consensus score: {:.1f}\".format(score))\n\nfit_data = data[np.argsort(model.row_labels_)]\nfit_data = fit_data[:, np.argsort(model.column_labels_)]\n\nplt.matshow(fit_data, cmap=plt.cm.Blues)\nplt.title(\"After biclustering; rearranged to show biclusters\")\n\nplt.matshow(\n np.outer(np.sort(model.row_labels_) + 1, np.sort(model.column_labels_) + 1),\n cmap=plt.cm.Blues,\n)\nplt.title(\"Checkerboard structure of rearranged data\")\n\nplt.show()"
18+
"# Author: Kemal Eren <[email protected]>\n# License: BSD 3 clause"
19+
]
20+
},
21+
{
22+
"cell_type": "markdown",
23+
"metadata": {},
24+
"source": [
25+
"## Generate sample data\nWe generate the sample data using the\n:func:`~sklearn.datasets.make_checkerboard` function. Each pixel within\n`shape=(300, 300)` represents with it's color a value from a uniform\ndistribution. The noise is added from a normal distribution, where the value\nchosen for `noise` is the standard deviation.\n\nAs you can see, the data is distributed over 12 cluster cells and is\nrelatively well distinguishable.\n\n"
26+
]
27+
},
28+
{
29+
"cell_type": "code",
30+
"execution_count": null,
31+
"metadata": {
32+
"collapsed": false
33+
},
34+
"outputs": [],
35+
"source": [
36+
"from sklearn.datasets import make_checkerboard\nfrom matplotlib import pyplot as plt\n\nn_clusters = (4, 3)\ndata, rows, columns = make_checkerboard(\n shape=(300, 300), n_clusters=n_clusters, noise=10, shuffle=False, random_state=42\n)\n\nplt.matshow(data, cmap=plt.cm.Blues)\nplt.title(\"Original dataset\")\n_ = plt.show()"
37+
]
38+
},
39+
{
40+
"cell_type": "markdown",
41+
"metadata": {},
42+
"source": [
43+
"We shuffle the data and the goal is to reconstruct it afterwards using\n:class:`~sklearn.bicluster.SpectralBiclustering`.\n\n"
44+
]
45+
},
46+
{
47+
"cell_type": "code",
48+
"execution_count": null,
49+
"metadata": {
50+
"collapsed": false
51+
},
52+
"outputs": [],
53+
"source": [
54+
"import numpy as np\n\n# Creating lists of shuffled row and column indices\nrng = np.random.RandomState(0)\nrow_idx_shuffled = rng.permutation(data.shape[0])\ncol_idx_shuffled = rng.permutation(data.shape[1])"
55+
]
56+
},
57+
{
58+
"cell_type": "markdown",
59+
"metadata": {},
60+
"source": [
61+
"We redefine the shuffled data and plot it. We observe that we lost the\nstrucuture of original data matrix.\n\n"
62+
]
63+
},
64+
{
65+
"cell_type": "code",
66+
"execution_count": null,
67+
"metadata": {
68+
"collapsed": false
69+
},
70+
"outputs": [],
71+
"source": [
72+
"data = data[row_idx_shuffled][:, col_idx_shuffled]\n\nplt.matshow(data, cmap=plt.cm.Blues)\nplt.title(\"Shuffled dataset\")\n_ = plt.show()"
73+
]
74+
},
75+
{
76+
"cell_type": "markdown",
77+
"metadata": {},
78+
"source": [
79+
"## Fitting `SpectralBiclustering`\nWe fit the model and compare the obtained clusters with the ground truth. Note\nthat when creating the model we specify the same number of clusters that we\nused to create the dataset (`n_clusters = (4, 3)`), which will contribute to\nobtain a good result.\n\n"
80+
]
81+
},
82+
{
83+
"cell_type": "code",
84+
"execution_count": null,
85+
"metadata": {
86+
"collapsed": false
87+
},
88+
"outputs": [],
89+
"source": [
90+
"from sklearn.cluster import SpectralBiclustering\nfrom sklearn.metrics import consensus_score\n\nmodel = SpectralBiclustering(n_clusters=n_clusters, method=\"log\", random_state=0)\nmodel.fit(data)\n\n# Compute the similarity of two sets of biclusters\nscore = consensus_score(\n model.biclusters_, (rows[:, row_idx_shuffled], columns[:, col_idx_shuffled])\n)\nprint(f\"consensus score: {score:.1f}\")"
91+
]
92+
},
93+
{
94+
"cell_type": "markdown",
95+
"metadata": {},
96+
"source": [
97+
"The score is between 0 and 1, where 1 corresponds to a perfect matching. It\nshows the quality of the biclustering.\n\n"
98+
]
99+
},
100+
{
101+
"cell_type": "markdown",
102+
"metadata": {},
103+
"source": [
104+
"## Plotting results\nNow, we rearrange the data based on the row and column labels assigned by the\n:class:`~sklearn.cluster.SpectralBiclustering` model in ascending order and\nplot again. The `row_labels_` range from 0 to 3, while the `column_labels_`\nrange from 0 to 2, representing a total of 4 clusters per row and 3 clusters\nper column.\n\n"
105+
]
106+
},
107+
{
108+
"cell_type": "code",
109+
"execution_count": null,
110+
"metadata": {
111+
"collapsed": false
112+
},
113+
"outputs": [],
114+
"source": [
115+
"# Reordering first the rows and then the columns.\nreordered_rows = data[np.argsort(model.row_labels_)]\nreordered_data = reordered_rows[:, np.argsort(model.column_labels_)]\n\nplt.matshow(reordered_data, cmap=plt.cm.Blues)\nplt.title(\"After biclustering; rearranged to show biclusters\")\n_ = plt.show()"
116+
]
117+
},
118+
{
119+
"cell_type": "markdown",
120+
"metadata": {},
121+
"source": [
122+
"As a last step, we want to demonstrate the relationships between the row\nand column labels assigned by the model. Therefore, we create a grid with\n:func:`numpy.outer`, which takes the sorted `row_labels_` and `column_labels_`\nand adds 1 to each to ensure that the labels start from 1 instead of 0 for\nbetter visualization.\n\n"
123+
]
124+
},
125+
{
126+
"cell_type": "code",
127+
"execution_count": null,
128+
"metadata": {
129+
"collapsed": false
130+
},
131+
"outputs": [],
132+
"source": [
133+
"plt.matshow(\n np.outer(np.sort(model.row_labels_) + 1, np.sort(model.column_labels_) + 1),\n cmap=plt.cm.Blues,\n)\nplt.title(\"Checkerboard structure of rearranged data\")\nplt.show()"
134+
]
135+
},
136+
{
137+
"cell_type": "markdown",
138+
"metadata": {},
139+
"source": [
140+
"The outer product of the row and column label vectors shows a representation\nof the checkerboard structure, where different combinations of row and column\nlabels are represented by different shades of blue.\n\n"
19141
]
20142
}
21143
],
Binary file not shown.

dev/_downloads/ac19db97f4bbd077ccffef2736ed5f3d/plot_spectral_biclustering.py

Lines changed: 88 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -3,63 +3,120 @@
33
A demo of the Spectral Biclustering algorithm
44
=============================================
55
6-
This example demonstrates how to generate a checkerboard dataset and
7-
bicluster it using the Spectral Biclustering algorithm.
8-
9-
The data is generated with the ``make_checkerboard`` function, then
10-
shuffled and passed to the Spectral Biclustering algorithm. The rows
11-
and columns of the shuffled matrix are rearranged to show the
12-
biclusters found by the algorithm.
13-
14-
The outer product of the row and column label vectors shows a
15-
representation of the checkerboard structure.
16-
6+
This example demonstrates how to generate a checkerboard dataset and bicluster
7+
it using the :class:`~sklearn.cluster.SpectralBiclustering` algorithm. The
8+
spectral biclustering algorithm is specifically designed to cluster data by
9+
simultaneously considering both the rows (samples) and columns (features) of a
10+
matrix. It aims to identify patterns not only between samples but also within
11+
subsets of samples, allowing for the detection of localized structure within the
12+
data. This makes spectral biclustering particularly well-suited for datasets
13+
where the order or arrangement of features is fixed, such as in images, time
14+
series, or genomes.
15+
16+
The data is generated, then shuffled and passed to the spectral biclustering
17+
algorithm. The rows and columns of the shuffled matrix are then rearranged to
18+
plot the biclusters found.
1719
"""
1820

1921
# Author: Kemal Eren <[email protected]>
2022
# License: BSD 3 clause
2123

22-
import numpy as np
23-
from matplotlib import pyplot as plt
24-
24+
# %%
25+
# Generate sample data
26+
# --------------------
27+
# We generate the sample data using the
28+
# :func:`~sklearn.datasets.make_checkerboard` function. Each pixel within
29+
# `shape=(300, 300)` represents with it's color a value from a uniform
30+
# distribution. The noise is added from a normal distribution, where the value
31+
# chosen for `noise` is the standard deviation.
32+
#
33+
# As you can see, the data is distributed over 12 cluster cells and is
34+
# relatively well distinguishable.
2535
from sklearn.datasets import make_checkerboard
26-
from sklearn.cluster import SpectralBiclustering
27-
from sklearn.metrics import consensus_score
28-
36+
from matplotlib import pyplot as plt
2937

3038
n_clusters = (4, 3)
3139
data, rows, columns = make_checkerboard(
32-
shape=(300, 300), n_clusters=n_clusters, noise=10, shuffle=False, random_state=0
40+
shape=(300, 300), n_clusters=n_clusters, noise=10, shuffle=False, random_state=42
3341
)
3442

3543
plt.matshow(data, cmap=plt.cm.Blues)
3644
plt.title("Original dataset")
45+
_ = plt.show()
3746

38-
# shuffle clusters
47+
# %%
48+
# We shuffle the data and the goal is to reconstruct it afterwards using
49+
# :class:`~sklearn.bicluster.SpectralBiclustering`.
50+
import numpy as np
51+
52+
# Creating lists of shuffled row and column indices
3953
rng = np.random.RandomState(0)
40-
row_idx = rng.permutation(data.shape[0])
41-
col_idx = rng.permutation(data.shape[1])
42-
data = data[row_idx][:, col_idx]
54+
row_idx_shuffled = rng.permutation(data.shape[0])
55+
col_idx_shuffled = rng.permutation(data.shape[1])
56+
57+
# %%
58+
# We redefine the shuffled data and plot it. We observe that we lost the
59+
# strucuture of original data matrix.
60+
data = data[row_idx_shuffled][:, col_idx_shuffled]
4361

4462
plt.matshow(data, cmap=plt.cm.Blues)
4563
plt.title("Shuffled dataset")
64+
_ = plt.show()
65+
66+
# %%
67+
# Fitting `SpectralBiclustering`
68+
# ------------------------------
69+
# We fit the model and compare the obtained clusters with the ground truth. Note
70+
# that when creating the model we specify the same number of clusters that we
71+
# used to create the dataset (`n_clusters = (4, 3)`), which will contribute to
72+
# obtain a good result.
73+
from sklearn.cluster import SpectralBiclustering
74+
from sklearn.metrics import consensus_score
4675

4776
model = SpectralBiclustering(n_clusters=n_clusters, method="log", random_state=0)
4877
model.fit(data)
49-
score = consensus_score(model.biclusters_, (rows[:, row_idx], columns[:, col_idx]))
50-
51-
print("consensus score: {:.1f}".format(score))
5278

53-
fit_data = data[np.argsort(model.row_labels_)]
54-
fit_data = fit_data[:, np.argsort(model.column_labels_)]
55-
56-
plt.matshow(fit_data, cmap=plt.cm.Blues)
79+
# Compute the similarity of two sets of biclusters
80+
score = consensus_score(
81+
model.biclusters_, (rows[:, row_idx_shuffled], columns[:, col_idx_shuffled])
82+
)
83+
print(f"consensus score: {score:.1f}")
84+
85+
# %%
86+
# The score is between 0 and 1, where 1 corresponds to a perfect matching. It
87+
# shows the quality of the biclustering.
88+
89+
# %%
90+
# Plotting results
91+
# ----------------
92+
# Now, we rearrange the data based on the row and column labels assigned by the
93+
# :class:`~sklearn.cluster.SpectralBiclustering` model in ascending order and
94+
# plot again. The `row_labels_` range from 0 to 3, while the `column_labels_`
95+
# range from 0 to 2, representing a total of 4 clusters per row and 3 clusters
96+
# per column.
97+
98+
# Reordering first the rows and then the columns.
99+
reordered_rows = data[np.argsort(model.row_labels_)]
100+
reordered_data = reordered_rows[:, np.argsort(model.column_labels_)]
101+
102+
plt.matshow(reordered_data, cmap=plt.cm.Blues)
57103
plt.title("After biclustering; rearranged to show biclusters")
58-
104+
_ = plt.show()
105+
106+
# %%
107+
# As a last step, we want to demonstrate the relationships between the row
108+
# and column labels assigned by the model. Therefore, we create a grid with
109+
# :func:`numpy.outer`, which takes the sorted `row_labels_` and `column_labels_`
110+
# and adds 1 to each to ensure that the labels start from 1 instead of 0 for
111+
# better visualization.
59112
plt.matshow(
60113
np.outer(np.sort(model.row_labels_) + 1, np.sort(model.column_labels_) + 1),
61114
cmap=plt.cm.Blues,
62115
)
63116
plt.title("Checkerboard structure of rearranged data")
64-
65117
plt.show()
118+
119+
# %%
120+
# The outer product of the row and column label vectors shows a representation
121+
# of the checkerboard structure, where different combinations of row and column
122+
# labels are represented by different shades of blue.

dev/_downloads/scikit-learn-docs.zip

25.1 KB
Binary file not shown.
182 Bytes
243 Bytes
321 Bytes
216 Bytes

0 commit comments

Comments
 (0)