Skip to content

Commit 721b047

Browse files
more sensible proportion assignment and bonus assignment to loc
1 parent ef00ec3 commit 721b047

File tree

1 file changed

+16
-12
lines changed

1 file changed

+16
-12
lines changed

source/wrangling.md

Lines changed: 16 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1631,9 +1631,8 @@ five_cities
16311631
The data frame above shows that the populations of the five cities in 2016 were
16321632
5928040 (Toronto), 4098927 (Montréal), 2463431 (Vancouver), 1392609 (Calgary), and 1321426 (Edmonton).
16331633
Next, we will add this information to a new data frame column called `city_pops`.
1634-
Once again, we will illustrate how to do this using both regular column assignment
1635-
and the `assign` method, starting with the latter.
1636-
Once again we specify the new column name (`city_pops`) as the argument, followed by the equal symbol `=`,
1634+
Once again, we will illustrate how to do this using both the `assign` method and regular column assignment.
1635+
We specify the new column name (`city_pops`) as the argument, followed by the equals symbol `=`,
16371636
and finally the data in the column.
16381637
Note that the order of the rows in the `english_lang` data frame is Montréal, Toronto, Calgary, Edmonton, Vancouver.
16391638
So we will create a column called `city_pops` where we list the populations of those cities in that
@@ -1693,19 +1692,24 @@ the `merge` function, which lets you combine two data frames. We will show you a
16931692
example using `merge` at the end of the chapter!
16941693
```
16951694

1696-
Now we have a new column with the population for each city. Finally, we calculate the
1697-
proportion of people who speak English the most at home by taking the ratio of the columns
1698-
`most_at_home` and `city_pops`. Let's modify the `most_at_home` column directly; in this case
1699-
we can just assign directly to the column.
1700-
This is precisely what we did in {numref}`str-split`,
1695+
Now we have a new column with the population for each city. Finally, we can convert all the numerical
1696+
columns to proportions of people who speak English by taking the ratio of all the numerical columns
1697+
with `city_pops`. Let's modify the `english_lang` column directly; in this case
1698+
we can just assign directly to the data frame.
1699+
This is similar to what we did in {numref}`str-split`,
17011700
when we first read in the `"region_lang_top5_cities_messy.csv"` data and we needed to convert a few
1702-
of the variables to numeric types.
1703-
Note that it is again possible to instead use the `assign` function to produce a new data frame when modifying an existing column,
1704-
although this is not commonly done.
1701+
of the variables to numeric types. Here we assign to a range of columns simultaneously using `loc[]`.
1702+
Note that it is again possible to instead use the `assign` function to produce a new data
1703+
frame when modifying existing columns, although this is not commonly done.
1704+
Note also that we use the `div` method with the argument `axis=0` to divide a range of columns in a data frame
1705+
by the values in a single column—the basic division symbol `/` won't work in this case.
17051706

17061707
```{code-cell} ipython3
17071708
:tags: ["output_scroll"]
1708-
english_lang["most_at_home"] = english_lang["most_at_home"]/english_lang["city_pops"]
1709+
english_lang.loc[:, "mother_tongue":"lang_known"] = english_lang.loc[
1710+
:,
1711+
"mother_tongue":"lang_known"
1712+
].div(english_lang["city_pops"], axis=0)
17091713
english_lang
17101714
```
17111715

0 commit comments

Comments
 (0)