Skip to content

Commit 8e63973

Browse files
authored
SLEP007 update -- make explicit that verbose_feature_names isnt required
2 parents 8445d6c + 6b556cf commit 8e63973

File tree

1 file changed

+29
-35
lines changed

1 file changed

+29
-35
lines changed

slep007/proposal.rst

Lines changed: 29 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -111,19 +111,6 @@ original features:
111111
This proposal talks about how feature names are generated and not how they are
112112
propagated.
113113

114-
verbose_feature_names
115-
*********************
116-
117-
``verbose_feature_names`` controls the verbosity of the generated feature names
118-
and it can be ``True`` or ``False``. Alternative solutions could include:
119-
120-
- an integer: fine tuning the verbosity of the generated feature names.
121-
- a ``callable`` which would give further flexibility to the user to generate
122-
user defined feature names.
123-
124-
These alternatives may be discussed and implemented in the future if deemed
125-
necessary.
126-
127114
Scope
128115
#####
129116

@@ -152,9 +139,7 @@ A fitted estimator exposes the output feature names through the
152139
feature names are generated. Since for most estimators there are multiple ways
153140
to generate feature names, this SLEP does not intend to define how exactly
154141
feature names are generated for all of them. It is instead a guideline on how
155-
they could generally be generated. Furthermore, that specific behavior of a
156-
given estimator may be tuned via the ``verbose_feature_names`` parameter, as
157-
detailed below.
142+
they could generally be generated.
158143

159144
As detailed bellow, some generated output features names are the same or a
160145
derived from the input feature names. In such cases, if no input feature names
@@ -172,17 +157,12 @@ Feature Generating Transformers
172157
*******************************
173158

174159
The simplest category of transformers in this section are the ones which
175-
generate a column based on a single given column. The generated output column
176-
in this case is a sensible transformation of the input feature name. For
177-
instance, a ``LogTransformer`` can do ``'age' -> 'log(age)'``, and a
178-
``OneHotEncoder`` could do ``'gender' -> 'gender_female', 'gender_fluid',
179-
...``. An alternative is to leave the feature names unchanged when each output
180-
feature corresponds to exactly one input feature. Whether or not to modify the
181-
feature name, *e.g.* ``log(x0)`` vs. ``x0`` may be controlled via the
182-
``verbose_feature_names`` to the constructor. The default value of
183-
``verbose_feature_names`` can be different depending on the transformer. For
184-
instance, ``StandardScaler`` can have it as ``False``, whereas
185-
``LogTransformer`` could have it as ``True`` by default.
160+
generate a column based on a single given column. These would simply
161+
preserve the input feature names if a single new feature is generated,
162+
such as in ``StandardScaler``, which would map ``'age'`` to ``'age'``.
163+
If an input feature maps to multiple new
164+
features, a postfix is added, so that ``OneHotEncoder`` might map
165+
``'gender'`` to ``'gender_female'`` ``'gender_fluid'`` etc.
186166

187167
Transformers where each output feature depends on a fixed number of input
188168
features may generate descriptive names as well. For instance, a
@@ -210,11 +190,6 @@ indicating the name of the transformer applied to them. If a column is in the ou
210190
as a part of ``passthrough``, it won't be prefixed since no operation has been
211191
applied on it.
212192

213-
This is the default behavior, and it can be tuned by constructor parameters if
214-
the meta estimator allows it. For instance, a ``verbose_feature_names=False``
215-
may indicate that a ``ColumnTransformer`` should not prefix the generated
216-
feature names with the name of the step.
217-
218193
Examples
219194
########
220195

@@ -255,8 +230,7 @@ names::
255230
'cat_make_ABC', 'cat_make_XYZ', ...,
256231
'num_pca0', 'num_pca1', 'num_pca2']
257232

258-
However, the following examples produce a somewhat redundant feature names,
259-
and hence the relevance of ``verbose_feature_names=False``::
233+
However, the following examples produce a somewhat redundant feature names::
260234

261235
[model, make, numeric0, ..., numeric100] ->
262236
ColumnTransformer([
@@ -267,7 +241,18 @@ and hence the relevance of ``verbose_feature_names=False``::
267241
'ohe_make_ABC', 'ohe_make_XYZ', ...,
268242
'pca_pca0', 'pca_pca1', 'pca_pca2']
269243

270-
If desired, the user can remove the prefixes::
244+
Extensions
245+
##########
246+
247+
verbose_feature_names
248+
*********************
249+
To provide more control over feature names, we could add a boolean
250+
``verbose_feature_names`` constructor argument to certain transformers.
251+
The default would reflect the description above, but changes would allow more verbose
252+
names in some transformers, say having ``StandardScaler`` map ``'age'`` to ``'scale(age)'``.
253+
254+
In case of the ``ColumnTransformer`` example above ``verbose_feature_names``
255+
could remove the estimator names, leading to shorter and less redundant names::
271256

272257
[model, make, numeric0, ..., numeric100] ->
273258
make_column_transformer(
@@ -279,6 +264,15 @@ If desired, the user can remove the prefixes::
279264
'make_ABC', 'make_XYZ', ...,
280265
'pca0', 'pca1', 'pca2']
281266

267+
Alternative solutions to a boolean flag could include:
268+
269+
- an integer: fine tuning the verbosity of the generated feature names.
270+
- a ``callable`` which would give further flexibility to the user to generate
271+
user defined feature names.
272+
273+
These alternatives may be discussed and implemented in the future if deemed
274+
necessary.
275+
282276
Backward Compatibility
283277
######################
284278

0 commit comments

Comments
 (0)