Skip to content

Commit 0e6a507

Browse files
committed
Rebuild dev docs at master=ec7d1db
1 parent 0a40f9d commit 0e6a507

File tree

232 files changed

+1364
-1364
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

232 files changed

+1364
-1364
lines changed
19.6 KB
Loading

dev/_sources/modules/neural_networks_supervised.txt

Lines changed: 55 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ Neural network models (supervised)
88

99

1010
.. _multilayer_perceptron:
11-
11+
1212
Multi-layer Perceptron
1313
======================
1414

@@ -28,7 +28,7 @@ output.
2828

2929
**Figure 1 : One hidden layer MLP.**
3030

31-
The leftmost layer, known as the input layer, consists of a set of neurons
31+
The leftmost layer, known as the input layer, consists of a set of neurons
3232
:math:`\{x_i | x_1, x_2, ..., x_m\}` representing the input features. Each
3333
neuron in the hidden layer transforms the values from the previous layer with
3434
a weighted linear summation :math:`w_1x_1 + w_2x_2 + ... + w_mx_m`, followed
@@ -38,42 +38,42 @@ last hidden layer and transforms them into output values.
3838

3939
The module contains the public attributes ``coefs_`` and ``intercepts_``.
4040
``coefs_`` is a list of weight matrices, where weight matrix at index
41-
:math:`i` represents the weights between layer :math:`i` and layer
41+
:math:`i` represents the weights between layer :math:`i` and layer
4242
:math:`i+1`. ``intercepts_`` is a list of bias vectors, where the vector
4343
at index :math:`i` represents the bias values added to layer :math:`i+1`.
4444

4545
The advantages of Multi-layer Perceptron are:
4646

4747
+ Capability to learn non-linear models.
4848

49-
+ Capability to learn models in real-time (on-line learning)
49+
+ Capability to learn models in real-time (on-line learning)
5050
using ``partial_fit``.
51-
52-
51+
52+
5353
The disadvantages of Multi-layer Perceptron (MLP) include:
5454

55-
+ MLP with hidden layers have a non-convex loss function where there exists
56-
more than one local minimum. Therefore different random weight
55+
+ MLP with hidden layers have a non-convex loss function where there exists
56+
more than one local minimum. Therefore different random weight
5757
initializations can lead to different validation accuracy.
5858

59-
+ MLP requires tuning a number of hyperparameters such as the number of
59+
+ MLP requires tuning a number of hyperparameters such as the number of
6060
hidden neurons, layers, and iterations.
6161

6262
+ MLP is sensitive to feature scaling.
6363

64-
Please see :ref:`Tips on Practical Use <mlp_tips>` section that addresses
64+
Please see :ref:`Tips on Practical Use <mlp_tips>` section that addresses
6565
some of these disadvantages.
6666

6767

6868
Classification
6969
==============
7070

7171
Class :class:`MLPClassifier` implements a multi-layer perceptron (MLP) algorithm
72-
that trains using `Backpropagation <http://ufldl.stanford.edu/wiki/index.php/Backpropagation_Algorithm>`_.
72+
that trains using `Backpropagation <http://ufldl.stanford.edu/wiki/index.php/Backpropagation_Algorithm>`_.
7373

74-
MLP trains on two arrays: array X of size (n_samples, n_features), which holds
75-
the training samples represented as floating point feature vectors; and array
76-
y of size (n_samples,), which holds the target values (class labels) for the
74+
MLP trains on two arrays: array X of size (n_samples, n_features), which holds
75+
the training samples represented as floating point feature vectors; and array
76+
y of size (n_samples,), which holds the target values (class labels) for the
7777
training samples::
7878

7979
>>> from sklearn.neural_network import MLPClassifier
@@ -91,10 +91,10 @@ training samples::
9191

9292
After fitting (training), the model can predict labels for new samples::
9393

94-
>>> clf.predict([[2., 2.], [-1., -2.]])
94+
>>> clf.predict([[2., 2.], [-1., -2.]])
9595
array([1, 0])
9696

97-
MLP can fit a non-linear model to the training data. ``clf.coefs_``
97+
MLP can fit a non-linear model to the training data. ``clf.coefs_``
9898
contains the weight matrices that constitute the model parameters::
9999

100100
>>> [coef.shape for coef in clf.coefs_]
@@ -108,27 +108,27 @@ use :meth:`MLPClassifier.decision_function`::
108108
>>> clf.decision_function([[2., 2.], [1., 2.]]) # doctest: +ELLIPSIS
109109
array([ 47.6..., 47.6...])
110110

111-
Currently, :class:`MLPClassifier` supports only the
112-
Cross-Entropy loss function, which allows probability estimates by running the
111+
Currently, :class:`MLPClassifier` supports only the
112+
Cross-Entropy loss function, which allows probability estimates by running the
113113
``predict_proba`` method.
114114

115115
MLP trains using Backpropagation. More precisely, it trains using some form of
116116
gradient descent and the gradients are calculated using Backpropagation. For
117117
classification, it minimizes the Cross-Entropy loss function, giving a vector
118-
of probability estimates :math:`P(y|x)` per sample :math:`x`::
118+
of probability estimates :math:`P(y|x)` per sample :math:`x`::
119119

120120
>>> clf.predict_proba([[2., 2.], [1., 2.]]) # doctest: +ELLIPSIS
121121
array([[ 0., 1.],
122122
[ 0., 1.]])
123123

124-
:class:`MLPClassifier` supports multi-class classification by
124+
:class:`MLPClassifier` supports multi-class classification by
125125
applying `Softmax <http://en.wikipedia.org/wiki/Softmax_activation_function>`_
126-
as the output function.
126+
as the output function.
127127

128-
Further, the algorithm supports :ref:`multi-label classification <multiclass>`
129-
in which a sample can belong to more than one class. For each class, the output
130-
of :meth:`MLPClassifier.decision_function` passes through the
131-
logistic function. Values larger or equal to `0.5` are rounded to `1`,
128+
Further, the algorithm supports :ref:`multi-label classification <multiclass>`
129+
in which a sample can belong to more than one class. For each class, the output
130+
of :meth:`MLPClassifier.decision_function` passes through the
131+
logistic function. Values larger or equal to `0.5` are rounded to `1`,
132132
otherwise to `0`. For a predicted output of a sample, the indices where the
133133
value is `1` represents the assigned classes of that sample::
134134

@@ -148,7 +148,7 @@ value is `1` represents the assigned classes of that sample::
148148
>>> clf.predict([0., 0.])
149149
array([[0, 1]])
150150

151-
See the examples below and the doc string of
151+
See the examples below and the doc string of
152152
:meth:`MLPClassifier.fit` for further information.
153153

154154
.. topic:: Examples:
@@ -165,7 +165,7 @@ which can also be seen as using the identity function as activation function.
165165
Therefore, it uses the square error as the loss function, and the output is a
166166
set of continuous values.
167167

168-
:class:`MLPRegressor` also supports multi-output regression, in
168+
:class:`MLPRegressor` also supports multi-output regression, in
169169
which a sample can have more than one target.
170170

171171

@@ -175,61 +175,61 @@ Algorithms
175175
MLP trains using `Stochastic Gradient Descent
176176
<http://en.wikipedia.org/wiki/Stochastic_gradient_descent>`_,
177177
`Adam <http://arxiv.org/abs/1412.6980>`_, or
178-
`L-BFGS <http://en.wikipedia.org/wiki/Limited-memory_BFGS>`_.
178+
`L-BFGS <http://en.wikipedia.org/wiki/Limited-memory_BFGS>`_.
179179
Stochastic Gradient Descent (SGD) updates parameters using the gradient of the
180180
loss function with respect to a parameter that needs adaptation, i.e.
181181

182182
.. math::
183183

184184
w \leftarrow w - \eta (\alpha \frac{\partial R(w)}{\partial w}
185185
+ \frac{\partial Loss}{\partial w})
186-
186+
187187
where :math:`\eta` is the learning rate which controls the step-size in
188188
the parameter space search. :math:`Loss` is the loss function used
189189
for the network.
190190

191-
More details can be found in the documentation of
192-
`SGD <http://scikit-learn.org/stable/modules/sgd.html>`_
191+
More details can be found in the documentation of
192+
`SGD <http://scikit-learn.org/stable/modules/sgd.html>`_
193193

194194
Adam is similar to SGD in a sense that it is a stochastic optimization
195195
algorithm, but it can automatically adjust the amount to update parameters
196196
based on adaptive estimates of lower-order moments.
197197

198198
With SGD or Adam, training supports online and mini-batch learning.
199199

200-
L-BFGS is a fast learning algorithm that approximates the Hessian matrix which
200+
L-BFGS is a fast learning algorithm that approximates the Hessian matrix which
201201
represents the second-order partial derivative of a function. Further it
202202
approximates the inverse of the Hessian matrix to perform parameter updates.
203-
The implementation uses the Scipy version of
203+
The implementation uses the Scipy version of
204204
`L-BFGS <http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fmin_l_bfgs_b.html>`_..
205205

206-
If the selected algorithm is 'L-BFGS', training does not support online nor
206+
If the selected algorithm is 'L-BFGS', training does not support online nor
207207
mini-batch learning.
208208

209209

210210
Complexity
211211
==========
212212

213-
Suppose there are :math:`n` training samples, :math:`m` features, :math:`k`
213+
Suppose there are :math:`n` training samples, :math:`m` features, :math:`k`
214214
hidden layers, each containing :math:`h` neurons - for simplicity, and :math:`o`
215-
output neurons. The time complexity of backpropagation is
216-
:math:`O(n\cdot m \cdot h^k \cdot o \cdot i)`, where :math:`i` is the number
215+
output neurons. The time complexity of backpropagation is
216+
:math:`O(n\cdot m \cdot h^k \cdot o \cdot i)`, where :math:`i` is the number
217217
of iterations. Since backpropagation has a high time complexity, it is advisable
218-
to start with smaller number of hidden neurons and few hidden layers for
218+
to start with smaller number of hidden neurons and few hidden layers for
219219
training.
220220

221221

222222
Mathematical formulation
223223
========================
224224

225-
Given a set of training examples :math:`(x_1, y_1), (x_2, y_2), \ldots, (x_n, y_n)`
226-
where :math:`x_i \in \mathbf{R}^n` and :math:`y_i \in \{0, 1\}`, a one hidden
227-
layer one hidden neuron MLP learns the function :math:`f(x) = W_2 g(W_1^T x + b_1) + b_2`
228-
where :math:`W_1 \in \mathbf{R}^m` and :math:`W_2, b_1, b_2 \in \mathbf{R}` are
229-
model parameters. :math:`W_1, W_2` represent the weights of the input layer and
225+
Given a set of training examples :math:`(x_1, y_1), (x_2, y_2), \ldots, (x_n, y_n)`
226+
where :math:`x_i \in \mathbf{R}^n` and :math:`y_i \in \{0, 1\}`, a one hidden
227+
layer one hidden neuron MLP learns the function :math:`f(x) = W_2 g(W_1^T x + b_1) + b_2`
228+
where :math:`W_1 \in \mathbf{R}^m` and :math:`W_2, b_1, b_2 \in \mathbf{R}` are
229+
model parameters. :math:`W_1, W_2` represent the weights of the input layer and
230230
hidden layer, resepctively; and :math:`b_1, b_2` represent the bias added to
231-
the hidden layer and the output layer, respectively.
232-
:math:`g(\cdot) : R \rightarrow R` is the activation function, set by default as
231+
the hidden layer and the output layer, respectively.
232+
:math:`g(\cdot) : R \rightarrow R` is the activation function, set by default as
233233
the hyperbolic tan. It is given as,
234234

235235
.. math::
@@ -255,15 +255,15 @@ belong to each class. The output is the class with the highest probability.
255255
In regression, the output remains as :math:`f(x)`; therefore, output activation
256256
function is just the identity function.
257257

258-
MLP uses different loss functions depending on the problem type. The loss
258+
MLP uses different loss functions depending on the problem type. The loss
259259
function for classification is Cross-Entropy, which in binary case is given as,
260260

261261
.. math::
262262

263263
Loss(\hat{y},y,W) = -y \ln {\hat{y}} - (1-y) \ln{(1-\hat{y})} + \alpha ||W||_2^2
264264

265265
where :math:`\alpha ||W||_2^2` is an L2-regularization term (aka penalty)
266-
that penalizes complex models; and :math:`\alpha > 0` is a non-negative
266+
that penalizes complex models; and :math:`\alpha > 0` is a non-negative
267267
hyperparameter that controls the magnitude of the penalty.
268268

269269
For regression, MLP uses the Square Error loss function; written as,
@@ -273,7 +273,7 @@ For regression, MLP uses the Square Error loss function; written as,
273273
Loss(\hat{y},y,W) = \frac{1}{2}||\hat{y} - y ||_2^2 + \alpha ||W||_2^2
274274

275275

276-
Starting from initial random weights, multi-layer perceptron (MLP) minimizes
276+
Starting from initial random weights, multi-layer perceptron (MLP) minimizes
277277
the loss function by repeatedly updating these weights. After computing the
278278
loss, a backward pass propagates it from the output layer to the previous
279279
layers, providing each weight parameter with an update value meant to decrease
@@ -287,8 +287,8 @@ More formally, this is expressed as,
287287
W^{i+1} = W^i - \epsilon \nabla {Loss}_{W}^{i}
288288

289289

290-
where :math:`i` is the iteration step, and :math:`\epsilon` is the learning rate
291-
with a value larger than 0.
290+
where :math:`i` is the iteration step, and :math:`\epsilon` is the learning rate
291+
with a value larger than 0.
292292

293293
The algorithm stops when it reaches a preset maximum number of iterations; or
294294
when the improvement in loss is below a certain, small number.
@@ -303,8 +303,8 @@ Tips on Practical Use
303303
* Multi-layer Perceptron is sensitive to feature scaling, so it
304304
is highly recommended to scale your data. For example, scale each
305305
attribute on the input vector X to [0, 1] or [-1, +1], or standardize
306-
it to have mean 0 and variance 1. Note that you must apply the *same*
307-
scaling to the test set for meaningful results.
306+
it to have mean 0 and variance 1. Note that you must apply the *same*
307+
scaling to the test set for meaningful results.
308308
You can use :class:`StandardScaler` for standardization.
309309

310310
>>> from sklearn.preprocessing import StandardScaler # doctest: +SKIP
@@ -350,14 +350,14 @@ or want to do additional monitoring, using ``warm_start=True`` and
350350
Rumelhart, David E., Geoffrey E. Hinton, and Ronald J. Williams.
351351

352352
* `"Stochastic Gradient Descent" <http://leon.bottou.org/projects/sgd>`_ L. Bottou - Website, 2010.
353-
353+
354354
* `"Backpropagation" <http://ufldl.stanford.edu/wiki/index.php/Backpropagation_Algorithm>`_
355355
Andrew Ng, Jiquan Ngiam, Chuan Yu Foo, Yifan Mai, Caroline Suen - Website, 2011.
356356

357-
* `"Efficient BackProp" <yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf>`_
357+
* `"Efficient BackProp" <http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf>`_
358358
Y. LeCun, L. Bottou, G. Orr, K. Müller - In Neural Networks: Tricks
359359
of the Trade 1998.
360-
360+
361361
* `"Adam: A method for stochastic optimization."
362362
<http://arxiv.org/pdf/1412.6980v8.pdf>`_
363363
Kingma, Diederik, and Jimmy Ba. arXiv preprint arXiv:1412.6980 (2014).

dev/_sources/modules/sgd.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -269,7 +269,7 @@ Tips on Practical Use
269269

270270
.. topic:: References:
271271

272-
* `"Efficient BackProp" <yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf>`_
272+
* `"Efficient BackProp" <http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf>`_
273273
Y. LeCun, L. Bottou, G. Orr, K. Müller - In Neural Networks: Tricks
274274
of the Trade 1998.
275275

dev/modules/classes.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1889,7 +1889,7 @@ <h3>Multiclass and multilabel classification strategies<a class="headerlink" hre
18891889
<span id="sklearn-neural-network-neural-network-models"></span><span id="neural-network-ref"></span><h2><a class="reference internal" href="#module-sklearn.neural_network" title="sklearn.neural_network"><tt class="xref py py-mod docutils literal"><span class="pre">sklearn.neural_network</span></tt></a>: Neural network models<a class="headerlink" href="#module-sklearn.neural_network" title="Permalink to this headline"></a></h2>
18901890
<p>The <a class="reference internal" href="#module-sklearn.neural_network" title="sklearn.neural_network"><tt class="xref py py-mod docutils literal"><span class="pre">sklearn.neural_network</span></tt></a> module includes models based on neural
18911891
networks.</p>
1892-
<p><strong>User guide:</strong> See the <a class="reference internal" href="neural_networks_unsupervised.html#neural-network"><em>Neural network models (unsupervised)</em></a> section for further details.</p>
1892+
<p><strong>User guide:</strong> See the <a class="reference internal" href="neural_networks_supervised.html#neural-network"><em>Neural network models (supervised)</em></a> section for further details.</p>
18931893
<table border="1" class="longtable docutils">
18941894
<colgroup>
18951895
<col width="10%" />

dev/modules/generated/sklearn.base.BaseEstimator.html

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -172,7 +172,7 @@
172172
<h1><a class="reference internal" href="../classes.html#module-sklearn.base" title="sklearn.base"><tt class="xref py py-mod docutils literal"><span class="pre">sklearn.base</span></tt></a>.BaseEstimator<a class="headerlink" href="#sklearn-base-baseestimator" title="Permalink to this headline"></a></h1>
173173
<dl class="class">
174174
<dt id="sklearn.base.BaseEstimator">
175-
<em class="property">class </em><tt class="descclassname">sklearn.base.</tt><tt class="descname">BaseEstimator</tt><a class="reference external" href="https://github.com/scikit-learn/scikit-learn/blob/44c8519/sklearn/base.py#L169"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#sklearn.base.BaseEstimator" title="Permalink to this definition"></a></dt>
175+
<em class="property">class </em><tt class="descclassname">sklearn.base.</tt><tt class="descname">BaseEstimator</tt><a class="reference external" href="https://github.com/scikit-learn/scikit-learn/blob/ec7d1db/sklearn/base.py#L169"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#sklearn.base.BaseEstimator" title="Permalink to this definition"></a></dt>
176176
<dd><p>Base class for all estimators in scikit-learn</p>
177177
<p class="rubric">Notes</p>
178178
<p>All estimators should specify all the parameters that can be set
@@ -201,7 +201,7 @@ <h1><a class="reference internal" href="../classes.html#module-sklearn.base" tit
201201

202202
<dl class="method">
203203
<dt id="sklearn.base.BaseEstimator.get_params">
204-
<tt class="descname">get_params</tt><big>(</big><em>deep=True</em><big>)</big><a class="reference external" href="https://github.com/scikit-learn/scikit-learn/blob/44c8519/sklearn/base.py#L206"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#sklearn.base.BaseEstimator.get_params" title="Permalink to this definition"></a></dt>
204+
<tt class="descname">get_params</tt><big>(</big><em>deep=True</em><big>)</big><a class="reference external" href="https://github.com/scikit-learn/scikit-learn/blob/ec7d1db/sklearn/base.py#L206"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#sklearn.base.BaseEstimator.get_params" title="Permalink to this definition"></a></dt>
205205
<dd><p>Get parameters for this estimator.</p>
206206
<table class="docutils field-list" frame="void" rules="none">
207207
<col class="field-name" />
@@ -226,7 +226,7 @@ <h1><a class="reference internal" href="../classes.html#module-sklearn.base" tit
226226

227227
<dl class="method">
228228
<dt id="sklearn.base.BaseEstimator.set_params">
229-
<tt class="descname">set_params</tt><big>(</big><em>**params</em><big>)</big><a class="reference external" href="https://github.com/scikit-learn/scikit-learn/blob/44c8519/sklearn/base.py#L243"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#sklearn.base.BaseEstimator.set_params" title="Permalink to this definition"></a></dt>
229+
<tt class="descname">set_params</tt><big>(</big><em>**params</em><big>)</big><a class="reference external" href="https://github.com/scikit-learn/scikit-learn/blob/ec7d1db/sklearn/base.py#L243"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#sklearn.base.BaseEstimator.set_params" title="Permalink to this definition"></a></dt>
230230
<dd><p>Set the parameters of this estimator.</p>
231231
<p>The method works on simple estimators as well as on nested objects
232232
(such as pipelines). The former have parameters of the form

0 commit comments

Comments
 (0)