Skip to content

Commit 26123e9

Browse files
minor edits per joel
1 parent a3a662e commit 26123e9

File tree

1 file changed

+8
-8
lines changed

1 file changed

+8
-8
lines changed

source/wrangling.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -44,8 +44,7 @@ By the end of the chapter, readers will be able to do the following:
4444
- Recall and use the following functions for their
4545
intended data wrangling tasks:
4646
- `agg`
47-
- `apply`
48-
- `assign`
47+
- `assign` (as well as regular column assignment)
4948
- `groupby`
5049
- `melt`
5150
- `pivot`
@@ -782,7 +781,7 @@ Is this data set now tidy? If we recall the three criteria for tidy data:
782781

783782
We can see that this data now satisfies all three criteria, making it easier to
784783
analyze. But we aren't done yet! Although we can't see it in the data frame above, all of the variables are actually
785-
"object" data types. We can check this using the `info` method.
784+
`object` data types. We can check this using the `info` method.
786785
```{code-cell} ipython3
787786
tidy_lang.info()
788787
```
@@ -795,20 +794,21 @@ Python read these columns in as string types, and by default, `str.split` will
795794
return columns with the `object` data type.
796795

797796
It makes sense for `region`, `category`, and `language` to be stored as an
798-
`object` type. However, suppose we want to apply any functions that treat the
797+
`object` type since they hold categorical values. However, suppose we want to apply any functions that treat the
799798
`most_at_home` and `most_at_work` columns as a number (e.g., finding rows
800799
above a numeric threshold of a column).
801800
That won't be possible if the variable is stored as an `object`.
802-
Fortunately, the `pandas.to_numeric` function provides a natural way to fix problems
803-
like this: it will convert the columns to the best numeric data types. Note that below
801+
Fortunately, the `astype` method from `pandas` provides a natural way to fix problems
802+
like this: it will convert the column to a selected data type. In this case, we choose the `int`
803+
data type to indicate that these variables contain integer counts. Note that below
804804
we *assign* the new numerical series to the `most_at_home` and `most_at_work` columns
805805
in `tidy_lang`; we have seen this syntax before in {numref}`ch1-adding-modifying`,
806806
and we will discuss it in more depth later in this chapter in {numref}`pandas-assign`.
807807

808808
```{code-cell} ipython3
809809
:tags: ["output_scroll"]
810-
tidy_lang["most_at_home"] = pd.to_numeric(tidy_lang["most_at_home"])
811-
tidy_lang["most_at_work"] = pd.to_numeric(tidy_lang["most_at_work"])
810+
tidy_lang["most_at_home"] = tidy_lang["most_at_home"].astype("int")
811+
tidy_lang["most_at_work"] = tidy_lang["most_at_work"].astype("int")
812812
tidy_lang
813813
```
814814

0 commit comments

Comments
 (0)