|
5 | 5 | :suppress:
|
6 | 6 |
|
7 | 7 | from pandas import *
|
| 8 | + import numpy.random as random |
| 9 | + from numpy import * |
8 | 10 | options.display.max_rows=15
|
9 | 11 |
|
10 | 12 | Comparison with R / R libraries
|
@@ -98,12 +100,193 @@ xts
|
98 | 100 | plyr
|
99 | 101 | ----
|
100 | 102 |
|
| 103 | +``plyr`` is an R library for the split-apply-combine strategy for data |
| 104 | +analysis. The functions revolve around three data structures in R, ``a`` |
| 105 | +for ``arrays``, ``l`` for ``lists``, and ``d`` for ``data.frame``. The |
| 106 | +table below shows how these data structures could be mapped in Python. |
| 107 | + |
| 108 | ++------------+-------------------------------+ |
| 109 | +| R | Python | |
| 110 | ++============+===============================+ |
| 111 | +| array | list | |
| 112 | ++------------+-------------------------------+ |
| 113 | +| lists | dictionary or list of objects | |
| 114 | ++------------+-------------------------------+ |
| 115 | +| data.frame | dataframe | |
| 116 | ++------------+-------------------------------+ |
| 117 | + |
| 118 | +|ddply|_ |
| 119 | +~~~~~~~~ |
| 120 | + |
| 121 | +An expression using a data.frame called ``df`` in R where you want to |
| 122 | +summarize ``x`` by ``month``: |
| 123 | + |
| 124 | + |
| 125 | + |
| 126 | + .. code-block:: r |
| 127 | +
|
| 128 | + require(plyr) |
| 129 | + df <- data.frame( |
| 130 | + x = runif(120, 1, 168), |
| 131 | + y = runif(120, 7, 334), |
| 132 | + z = runif(120, 1.7, 20.7), |
| 133 | + month = rep(c(5,6,7,8),30), |
| 134 | + week = sample(1:4, 120, TRUE) |
| 135 | + ) |
| 136 | +
|
| 137 | + ddply(df, .(month, week), summarize, |
| 138 | + mean = round(mean(x), 2), |
| 139 | + sd = round(sd(x), 2)) |
| 140 | +
|
| 141 | +In ``pandas`` the equivalent expression, using the |
| 142 | +:meth:`~pandas.DataFrame.groupby` method, would be: |
| 143 | + |
| 144 | + |
| 145 | + |
| 146 | + .. ipython:: python |
| 147 | +
|
| 148 | + df = DataFrame({ |
| 149 | + 'x': random.uniform(1., 168., 120), |
| 150 | + 'y': random.uniform(7., 334., 120), |
| 151 | + 'z': random.uniform(1.7, 20.7, 120), |
| 152 | + 'month': [5,6,7,8]*30, |
| 153 | + 'week': random.randint(1,4, 120) |
| 154 | + }) |
| 155 | +
|
| 156 | + grouped = df.groupby(['month','week']) |
| 157 | + print grouped['x'].agg([mean, std]) |
| 158 | +
|
| 159 | +
|
| 160 | +For more details and examples see :ref:`the groupby documentation |
| 161 | +<groupby.aggregate>`. |
| 162 | + |
101 | 163 | reshape / reshape2
|
102 | 164 | ------------------
|
103 | 165 |
|
| 166 | +|meltarray|_ |
| 167 | +~~~~~~~~~~~~~ |
| 168 | + |
| 169 | +An expression using a 3 dimensional array called ``a`` in R where you want to |
| 170 | +melt it into a data.frame: |
| 171 | + |
| 172 | + .. code-block:: r |
| 173 | +
|
| 174 | + a <- array(c(1:23, NA), c(2,3,4)) |
| 175 | + data.frame(melt(a)) |
| 176 | +
|
| 177 | +In Python, since ``a`` is a list, you can simply use list comprehension. |
| 178 | + |
| 179 | + .. ipython:: python |
| 180 | + a = array(range(1,24)+[NAN]).reshape(2,3,4) |
| 181 | + DataFrame([tuple(list(x)+[val]) for x, val in ndenumerate(a)]) |
| 182 | +
|
| 183 | +|meltlist|_ |
| 184 | +~~~~~~~~~~~~ |
| 185 | + |
| 186 | +An expression using a list called ``a`` in R where you want to melt it |
| 187 | +into a data.frame: |
| 188 | + |
| 189 | + .. code-block:: r |
| 190 | +
|
| 191 | + a <- as.list(c(1:4, NA)) |
| 192 | + data.frame(melt(a)) |
| 193 | +
|
| 194 | +In Python, this list would be a list of tuples, so |
| 195 | +:meth:`~pandas.DataFrame` method would convert it to a dataframe as required. |
| 196 | + |
| 197 | + .. ipython:: python |
| 198 | +
|
| 199 | + a = list(enumerate(range(1,5)+[NAN])) |
| 200 | + DataFrame(a) |
| 201 | +
|
| 202 | +For more details and examples see :ref:`the Into to Data Structures |
| 203 | +documentation <basics.dataframe.from_items>`. |
| 204 | + |
| 205 | +|meltdf|_ |
| 206 | +~~~~~~~~~~~~~~~~ |
| 207 | + |
| 208 | +An expression using a data.frame called ``cheese`` in R where you want to |
| 209 | +reshape the data.frame: |
| 210 | + |
| 211 | + .. code-block:: r |
| 212 | +
|
| 213 | + cheese <- data.frame( |
| 214 | + first = c('John, Mary'), |
| 215 | + last = c('Doe', 'Bo'), |
| 216 | + height = c(5.5, 6.0), |
| 217 | + weight = c(130, 150) |
| 218 | + ) |
| 219 | + melt(cheese, id=c("first", "last")) |
| 220 | +
|
| 221 | +In Python, the :meth:`~pandas.melt` method is the R equivalent: |
| 222 | + |
| 223 | + .. ipython:: python |
| 224 | +
|
| 225 | + cheese = DataFrame({'first' : ['John', 'Mary'], |
| 226 | + 'last' : ['Doe', 'Bo'], |
| 227 | + 'height' : [5.5, 6.0], |
| 228 | + 'weight' : [130, 150]}) |
| 229 | + melt(cheese, id_vars=['first', 'last']) |
| 230 | + cheese.set_index(['first', 'last']).stack() # alternative way |
| 231 | +
|
| 232 | +For more details and examples see :ref:`the reshaping documentation |
| 233 | +<reshaping.melt>`. |
| 234 | + |
| 235 | +|cast|_ |
| 236 | +~~~~~~~ |
| 237 | + |
| 238 | +An expression using a data.frame called ``df`` in R to cast into a higher |
| 239 | +dimensional array: |
| 240 | + |
| 241 | + .. code-block:: r |
| 242 | +
|
| 243 | + df <- data.frame( |
| 244 | + x = runif(12, 1, 168), |
| 245 | + y = runif(12, 7, 334), |
| 246 | + z = runif(12, 1.7, 20.7), |
| 247 | + month = rep(c(5,6,7),4), |
| 248 | + week = rep(c(1,2), 6) |
| 249 | + ) |
| 250 | +
|
| 251 | + mdf <- melt(df, id=c("month", "week")) |
| 252 | + acast(mdf, week ~ month ~ variable, mean) |
| 253 | +
|
| 254 | +In Python the best way is to make use of :meth:`~pandas.pivot_table`: |
| 255 | + |
| 256 | + .. ipython:: python |
| 257 | +
|
| 258 | + df = DataFrame({ |
| 259 | + 'x': random.uniform(1., 168., 12), |
| 260 | + 'y': random.uniform(7., 334., 12), |
| 261 | + 'z': random.uniform(1.7, 20.7, 12), |
| 262 | + 'month': [5,6,7]*4, |
| 263 | + 'week': [1,2]*6 |
| 264 | + }) |
| 265 | + mdf = melt(df, id_vars=['month', 'week']) |
| 266 | + pivot_table(mdf, values='value', rows=['variable','week'], |
| 267 | + cols=['month'], aggfunc=mean) |
| 268 | +
|
| 269 | +For more details and examples see :ref:`the reshaping documentation |
| 270 | +<reshaping.pivot>`. |
104 | 271 |
|
105 | 272 | .. |with| replace:: ``with``
|
106 | 273 | .. _with: http://finzi.psych.upenn.edu/R/library/base/html/with.html
|
107 | 274 |
|
108 | 275 | .. |subset| replace:: ``subset``
|
109 | 276 | .. _subset: http://finzi.psych.upenn.edu/R/library/base/html/subset.html
|
| 277 | + |
| 278 | +.. |ddply| replace:: ``ddply`` |
| 279 | +.. _ddply: http://www.inside-r.org/packages/cran/plyr/docs/ddply |
| 280 | + |
| 281 | +.. |meltarray| replace:: ``melt.array`` |
| 282 | +.. _meltarray: http://www.inside-r.org/packages/cran/reshape2/docs/melt.array |
| 283 | + |
| 284 | +.. |meltlist| replace:: ``melt.list`` |
| 285 | +.. meltlist: http://www.inside-r.org/packages/cran/reshape2/docs/melt.list |
| 286 | +
|
| 287 | +.. |meltdf| replace:: ``melt.data.frame`` |
| 288 | +.. meltdf: http://www.inside-r.org/packages/cran/reshape2/docs/melt.data.frame |
| 289 | +
|
| 290 | +.. |cast| replace:: ``cast`` |
| 291 | +.. cast: http://www.inside-r.org/packages/cran/reshape2/docs/cast |
| 292 | +
|
0 commit comments