1
1
.. _slep_012 :
2
2
3
- ==========
4
- InputArray
5
- ==========
3
+ =======================
4
+ SLEP012: `` InputArray ``
5
+ =======================
6
6
7
7
:Author: Adrin jalali
8
8
:Status: Draft
9
9
:Type: Standards Track
10
10
:Created: 2019-12-20
11
11
12
12
Motivation
13
- **********
13
+ ##########
14
14
15
15
This proposal results in a solution to propagating feature names through
16
16
transformers, pipelines, and the column transformer. Ideally, we would have::
@@ -39,7 +39,7 @@ transformer, would not break. This SLEP focuses on *feature names* as the only
39
39
meta-data attached to the data. Support for other meta-data can be added later.
40
40
41
41
Backward/NumPy/Pandas Compatibility
42
- ***********************************
42
+ ###################################
43
43
44
44
Since currently transformers return a ``numpy `` or a ``scipy `` array, backward
45
45
compatibility in this context means the operations which are valid on those
@@ -59,13 +59,13 @@ which ``pandas`` does not provide a clean API at the moment. Alternatively,
59
59
relevant meta-data attached.
60
60
61
61
Feature Names
62
- *************
62
+ #############
63
63
64
64
Feature names are an object ``ndarray `` of strings aligned with the columns.
65
65
They can be ``None ``.
66
66
67
67
Operations
68
- **********
68
+ ##########
69
69
70
70
Estimators understand the ``InputArray `` and extract the feature names from the
71
71
given data before applying the operations and transformations on the data.
@@ -75,20 +75,20 @@ The way feature names are generated is discussed in *SLEP007 - The Style of The
75
75
Feature Names *.
76
76
77
77
Sparse Arrays
78
- *************
78
+ #############
79
79
80
80
Ideally sparse arrays follow the same pattern, but since ``scipy.sparse `` does
81
81
not provide the kinda of API provided by ``numpy ``, we may need to find
82
82
compromises.
83
83
84
84
Factory Methods
85
- ***************
85
+ ###############
86
86
87
87
There will be factory methods creating an ``InputArray `` given a
88
88
``pandas.DataFrame `` or an ``xarray.DataArray `` or simply an ``np.ndarray `` or
89
89
an ``sp.SparseMatrix `` and a given set of feature names.
90
90
91
- An ``InputArray `` can also be converted to a `pandas.DataFrame`` using a
91
+ An ``InputArray `` can also be converted to a `` pandas.DataFrame `` using a
92
92
``todataframe() `` method.
93
93
94
94
``X `` being an ``InputArray ``::
@@ -103,7 +103,7 @@ feature names, one can make the right ``InputArray`` using::
103
103
>>> make_inputarray(X, feature_names)
104
104
105
105
Alternative Solutions
106
- *********************
106
+ #####################
107
107
108
108
Since we expect the feature names to be attached to the data given to an
109
109
estimator, there are a few potential approaches we can take:
@@ -114,7 +114,7 @@ estimator, there are a few potential approaches we can take:
114
114
is not a feasible solution since ``pandas `` plans to move to a per column
115
115
representation, which means ``pd.DataFrame(np.asarray(df)) `` has two
116
116
guaranteed memory copies.
117
- - ``XArray ``: we could accept a `pandas.DataFrame``, and use
117
+ - ``XArray ``: we could accept a `` pandas.DataFrame ``, and use
118
118
``xarray.DataArray `` as the output of transformers, including feature names.
119
119
However, ``xarray `` has a hard dependency on ``pandas ``, and uses
120
120
``pandas.Index `` to handle row labels and aligns rows when an operation
0 commit comments