@@ -111,19 +111,6 @@ original features:
111
111
This proposal talks about how feature names are generated and not how they are
112
112
propagated.
113
113
114
- verbose_feature_names
115
- *********************
116
-
117
- ``verbose_feature_names `` controls the verbosity of the generated feature names
118
- and it can be ``True `` or ``False ``. Alternative solutions could include:
119
-
120
- - an integer: fine tuning the verbosity of the generated feature names.
121
- - a ``callable `` which would give further flexibility to the user to generate
122
- user defined feature names.
123
-
124
- These alternatives may be discussed and implemented in the future if deemed
125
- necessary.
126
-
127
114
Scope
128
115
#####
129
116
@@ -152,9 +139,7 @@ A fitted estimator exposes the output feature names through the
152
139
feature names are generated. Since for most estimators there are multiple ways
153
140
to generate feature names, this SLEP does not intend to define how exactly
154
141
feature names are generated for all of them. It is instead a guideline on how
155
- they could generally be generated. Furthermore, that specific behavior of a
156
- given estimator may be tuned via the ``verbose_feature_names `` parameter, as
157
- detailed below.
142
+ they could generally be generated.
158
143
159
144
As detailed bellow, some generated output features names are the same or a
160
145
derived from the input feature names. In such cases, if no input feature names
@@ -172,17 +157,12 @@ Feature Generating Transformers
172
157
*******************************
173
158
174
159
The simplest category of transformers in this section are the ones which
175
- generate a column based on a single given column. The generated output column
176
- in this case is a sensible transformation of the input feature name. For
177
- instance, a ``LogTransformer `` can do ``'age' -> 'log(age)' ``, and a
178
- ``OneHotEncoder `` could do ``'gender' -> 'gender_female', 'gender_fluid',
179
- ... ``. An alternative is to leave the feature names unchanged when each output
180
- feature corresponds to exactly one input feature. Whether or not to modify the
181
- feature name, *e.g. * ``log(x0) `` vs. ``x0 `` may be controlled via the
182
- ``verbose_feature_names `` to the constructor. The default value of
183
- ``verbose_feature_names `` can be different depending on the transformer. For
184
- instance, ``StandardScaler `` can have it as ``False ``, whereas
185
- ``LogTransformer `` could have it as ``True `` by default.
160
+ generate a column based on a single given column. These would simply
161
+ preserve the input feature names if a single new feature is generated,
162
+ such as in ``StandardScaler ``, which would map ``'age' `` to ``'age' ``.
163
+ If an input feature maps to multiple new
164
+ features, a postfix is added, so that ``OneHotEncoder `` might map
165
+ ``'gender' `` to ``'gender_female' `` ``'gender_fluid' `` etc.
186
166
187
167
Transformers where each output feature depends on a fixed number of input
188
168
features may generate descriptive names as well. For instance, a
@@ -210,11 +190,6 @@ indicating the name of the transformer applied to them. If a column is in the ou
210
190
as a part of ``passthrough ``, it won't be prefixed since no operation has been
211
191
applied on it.
212
192
213
- This is the default behavior, and it can be tuned by constructor parameters if
214
- the meta estimator allows it. For instance, a ``verbose_feature_names=False ``
215
- may indicate that a ``ColumnTransformer `` should not prefix the generated
216
- feature names with the name of the step.
217
-
218
193
Examples
219
194
########
220
195
@@ -255,8 +230,7 @@ names::
255
230
'cat_make_ABC', 'cat_make_XYZ', ...,
256
231
'num_pca0', 'num_pca1', 'num_pca2']
257
232
258
- However, the following examples produce a somewhat redundant feature names,
259
- and hence the relevance of ``verbose_feature_names=False ``::
233
+ However, the following examples produce a somewhat redundant feature names::
260
234
261
235
[model, make, numeric0, ..., numeric100] ->
262
236
ColumnTransformer([
@@ -267,7 +241,18 @@ and hence the relevance of ``verbose_feature_names=False``::
267
241
'ohe_make_ABC', 'ohe_make_XYZ', ...,
268
242
'pca_pca0', 'pca_pca1', 'pca_pca2']
269
243
270
- If desired, the user can remove the prefixes::
244
+ Extensions
245
+ ##########
246
+
247
+ verbose_feature_names
248
+ *********************
249
+ To provide more control over feature names, we could add a boolean
250
+ ``verbose_feature_names `` constructor argument to certain transformers.
251
+ The default would reflect the description above, but changes would allow more verbose
252
+ names in some transformers, say having ``StandardScaler `` map ``'age' `` to ``'scale(age)' ``.
253
+
254
+ In case of the ``ColumnTransformer `` example above ``verbose_feature_names ``
255
+ could remove the estimator names, leading to shorter and less redundant names::
271
256
272
257
[model, make, numeric0, ..., numeric100] ->
273
258
make_column_transformer(
@@ -279,6 +264,15 @@ If desired, the user can remove the prefixes::
279
264
'make_ABC', 'make_XYZ', ...,
280
265
'pca0', 'pca1', 'pca2']
281
266
267
+ Alternative solutions to a boolean flag could include:
268
+
269
+ - an integer: fine tuning the verbosity of the generated feature names.
270
+ - a ``callable `` which would give further flexibility to the user to generate
271
+ user defined feature names.
272
+
273
+ These alternatives may be discussed and implemented in the future if deemed
274
+ necessary.
275
+
282
276
Backward Compatibility
283
277
######################
284
278
0 commit comments