@@ -44,8 +44,7 @@ By the end of the chapter, readers will be able to do the following:
44
44
- Recall and use the following functions for their
45
45
intended data wrangling tasks:
46
46
- ` agg `
47
- - ` apply `
48
- - ` assign `
47
+ - ` assign ` (as well as regular column assignment)
49
48
- ` groupby `
50
49
- ` melt `
51
50
- ` pivot `
@@ -782,7 +781,7 @@ Is this data set now tidy? If we recall the three criteria for tidy data:
782
781
783
782
We can see that this data now satisfies all three criteria, making it easier to
784
783
analyze. But we aren't done yet! Although we can't see it in the data frame above, all of the variables are actually
785
- " object" data types. We can check this using the ` info ` method.
784
+ ` object ` data types. We can check this using the ` info ` method.
786
785
``` {code-cell} ipython3
787
786
tidy_lang.info()
788
787
```
@@ -795,20 +794,21 @@ Python read these columns in as string types, and by default, `str.split` will
795
794
return columns with the ` object ` data type.
796
795
797
796
It makes sense for ` region ` , ` category ` , and ` language ` to be stored as an
798
- ` object ` type. However, suppose we want to apply any functions that treat the
797
+ ` object ` type since they hold categorical values . However, suppose we want to apply any functions that treat the
799
798
` most_at_home ` and ` most_at_work ` columns as a number (e.g., finding rows
800
799
above a numeric threshold of a column).
801
800
That won't be possible if the variable is stored as an ` object ` .
802
- Fortunately, the ` pandas.to_numeric ` function provides a natural way to fix problems
803
- like this: it will convert the columns to the best numeric data types. Note that below
801
+ Fortunately, the ` astype ` method from ` pandas ` provides a natural way to fix problems
802
+ like this: it will convert the column to a selected data type. In this case, we choose the ` int `
803
+ data type to indicate that these variables contain integer counts. Note that below
804
804
we * assign* the new numerical series to the ` most_at_home ` and ` most_at_work ` columns
805
805
in ` tidy_lang ` ; we have seen this syntax before in {numref}` ch1-adding-modifying ` ,
806
806
and we will discuss it in more depth later in this chapter in {numref}` pandas-assign ` .
807
807
808
808
``` {code-cell} ipython3
809
809
:tags: ["output_scroll"]
810
- tidy_lang["most_at_home"] = pd.to_numeric( tidy_lang["most_at_home"])
811
- tidy_lang["most_at_work"] = pd.to_numeric( tidy_lang["most_at_work"])
810
+ tidy_lang["most_at_home"] = tidy_lang["most_at_home"].astype("int" )
811
+ tidy_lang["most_at_work"] = tidy_lang["most_at_work"].astype("int" )
812
812
tidy_lang
813
813
```
814
814
0 commit comments