@@ -1631,9 +1631,8 @@ five_cities
1631
1631
The data frame above shows that the populations of the five cities in 2016 were
1632
1632
5928040 (Toronto), 4098927 (Montréal), 2463431 (Vancouver), 1392609 (Calgary), and 1321426 (Edmonton).
1633
1633
Next, we will add this information to a new data frame column called ` city_pops ` .
1634
- Once again, we will illustrate how to do this using both regular column assignment
1635
- and the ` assign ` method, starting with the latter.
1636
- Once again we specify the new column name (` city_pops ` ) as the argument, followed by the equal symbol ` = ` ,
1634
+ Once again, we will illustrate how to do this using both the ` assign ` method and regular column assignment.
1635
+ We specify the new column name (` city_pops ` ) as the argument, followed by the equals symbol ` = ` ,
1637
1636
and finally the data in the column.
1638
1637
Note that the order of the rows in the ` english_lang ` data frame is Montréal, Toronto, Calgary, Edmonton, Vancouver.
1639
1638
So we will create a column called ` city_pops ` where we list the populations of those cities in that
@@ -1693,19 +1692,24 @@ the `merge` function, which lets you combine two data frames. We will show you a
1693
1692
example using `merge` at the end of the chapter!
1694
1693
```
1695
1694
1696
- Now we have a new column with the population for each city. Finally, we calculate the
1697
- proportion of people who speak English the most at home by taking the ratio of the columns
1698
- ` most_at_home ` and ` city_pops ` . Let's modify the ` most_at_home ` column directly; in this case
1699
- we can just assign directly to the column .
1700
- This is precisely what we did in {numref}` str-split ` ,
1695
+ Now we have a new column with the population for each city. Finally, we can convert all the numerical
1696
+ columns to proportions of people who speak English by taking the ratio of all the numerical columns
1697
+ with ` city_pops ` . Let's modify the ` english_lang ` column directly; in this case
1698
+ we can just assign directly to the data frame .
1699
+ This is similar to what we did in {numref}` str-split ` ,
1701
1700
when we first read in the ` "region_lang_top5_cities_messy.csv" ` data and we needed to convert a few
1702
- of the variables to numeric types.
1703
- Note that it is again possible to instead use the ` assign ` function to produce a new data frame when modifying an existing column,
1704
- although this is not commonly done.
1701
+ of the variables to numeric types. Here we assign to a range of columns simultaneously using ` loc[] ` .
1702
+ Note that it is again possible to instead use the ` assign ` function to produce a new data
1703
+ frame when modifying existing columns, although this is not commonly done.
1704
+ Note also that we use the ` div ` method with the argument ` axis=0 ` to divide a range of columns in a data frame
1705
+ by the values in a single column&mdash ; the basic division symbol ` / ` won't work in this case.
1705
1706
1706
1707
``` {code-cell} ipython3
1707
1708
:tags: ["output_scroll"]
1708
- english_lang["most_at_home"] = english_lang["most_at_home"]/english_lang["city_pops"]
1709
+ english_lang.loc[:, "mother_tongue":"lang_known"] = english_lang.loc[
1710
+ :,
1711
+ "mother_tongue":"lang_known"
1712
+ ].div(english_lang["city_pops"], axis=0)
1709
1713
english_lang
1710
1714
```
1711
1715
0 commit comments