Skip to content

Commit 5571290

Browse files
nikomatsakismark-i-m
authored andcommitted
various nits from mark-i-m
1 parent 44c05c7 commit 5571290

File tree

1 file changed

+20
-18
lines changed

1 file changed

+20
-18
lines changed

src/profiling/with_perf.md

Lines changed: 20 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Profiling with perf
22

3-
sThis is a guide for how to profile rustc with perf.
3+
This is a guide for how to profile rustc with [perf](https://perf.wiki.kernel.org/index.php/Main_Page).
44

55
## Initial steps
66

@@ -11,7 +11,7 @@ sThis is a guide for how to profile rustc with perf.
1111
- leave everything else the defaults
1212
- Run `./x.py build` to get a full build
1313
- Make a rustup toolchain (let's call it `rust-prof`) pointing to that result
14-
- `rustup toolchain link` XXX
14+
- `rustup toolchain link <path-to-toolchain>`
1515

1616
## Gathering a perf profile
1717

@@ -29,9 +29,11 @@ perf record -F99 --call-graph dwarf XXX
2929
```
3030

3131
The `-F99` tells perf to sample at 99 Hz, which avoids generating too
32-
much data for longer runs. The `--call-graph dwarf` tells perf to get
33-
call-graph information from debuginfo, which is accurate. The `XXX` is
34-
the command you want to profile. So, for example, you might do:
32+
much data for longer runs (why 99 Hz you ask? No particular reason, it
33+
just seems to work well for me). The `--call-graph dwarf` tells perf
34+
to get call-graph information from debuginfo, which is accurate. The
35+
`XXX` is the command you want to profile. So, for example, you might
36+
do:
3537

3638
```
3739
perf record -F99 --call-graph dwarf cargo +rust-prof rustc
@@ -42,6 +44,7 @@ to run `cargo`. But there are some things to be aware of:
4244
- You probably don't want to profile the time spend building
4345
dependencies. So something like `cargo build; cargo clean -p $C` may
4446
be helpful (where `$C` is the crate name)
47+
- Though usually I just do `touch src/lib.rs` and rebuild instead. =)
4548
- You probably don't want incremental messing about with your
4649
profile. So something like `CARGO_INCREMENTAL=0` can be helpful.
4750

@@ -89,8 +92,7 @@ CARGO_INCREMENTAL=0 perf record -F99 --call-graph dwarf cargo rustc --profile ch
8992
Note that final command: it's a doozy! It uses the `cargo rustc`
9093
command, which executes rustc with (potentially) additional options;
9194
the `--profile check` and `--lib` options specify that we are doing a
92-
`cargo check` execution, and that this is a library (not an
93-
execution).
95+
`cargo check` execution, and that this is a library (not a binary).
9496

9597
At this point, we can use `perf` tooling to analyze the results. For example:
9698

@@ -110,7 +112,8 @@ can be helpful; it is covered below.
110112

111113
### Gathering NLL data
112114

113-
If you want to profile an NLL run, you can just pass extra options to the `cargo rustc` command. The actual perf site just uses `-Zborrowck=mir`, which we can simulate like so:
115+
If you want to profile an NLL run, you can just pass extra options to
116+
the `cargo rustc` command, like so:
114117

115118
```bash
116119
touch src/lib.rs
@@ -128,26 +131,26 @@ simple but useful tool that lets you answer queries like:
128131
- "how much time was spent in function F" (no matter where it was called from)
129132
- "how much time was spent in function F when it was called from G"
130133
- "how much time was spent in function F *excluding* time spent in G"
131-
- "what fns does F call and how much time does it spend in them"
134+
- "what functions does F call and how much time does it spend in them"
132135

133136
To understand how it works, you have to know just a bit about
134137
perf. Basically, perf works by *sampling* your process on a regular
135138
basis (or whenever some event occurs). For each sample, perf gathers a
136139
backtrace. `perf focus` lets you write a regular expression that tests
137-
which fns appear in that backtrace, and then tells you which
140+
which functions appear in that backtrace, and then tells you which
138141
percentage of samples had a backtrace that met the regular
139142
expression. It's probably easiest to explain by walking through how I
140143
would analyze NLL performance.
141144

142-
## Installing `perf-focus`
145+
### Installing `perf-focus`
143146

144147
You can install perf-focus using `cargo install`:
145148

146149
```
147150
cargo install perf-focus
148151
```
149152

150-
## Example: How much time is spent in MIR borrowck?
153+
### Example: How much time is spent in MIR borrowck?
151154

152155
Let's say we've gathered the NLL data for a test. We'd like to know
153156
how much time it is spending in the MIR borrow-checker. The "main"
@@ -175,7 +178,7 @@ samples where `do_mir_borrowck` was on the stack: in this case, 29%.
175178
currently executes `perf script` (perhaps there is a better
176179
way...). I've sometimes found that `perf script` outputs C++ mangled
177180
names. This is annoying. You can tell by running `perf script |
178-
head` yourself -- if you see named like `5rustc6middle` instead of
181+
head` yourself -- if you see names like `5rustc6middle` instead of
179182
`rustc::middle`, then you have the same problem. You can solve this
180183
by doing:
181184

@@ -190,7 +193,7 @@ stdin, rather than executing `perf focus`. We should make this more
190193
convenient (at worst, maybe add a `c++filt` option to `perf focus`, or
191194
just always use it -- it's pretty harmless).
192195

193-
## Example: How much time does MIR borrowck spend solving traits?
196+
### Example: How much time does MIR borrowck spend solving traits?
194197

195198
Perhaps we'd like to know how much time MIR borrowck spends in the
196199
trait checker. We can ask this using a more complex regex:
@@ -215,7 +218,7 @@ If you're curious, you can find out exactly which samples by using the
215218
each sample. The `|` at the front of the line indicates the part that
216219
the regular expression matched.
217220

218-
## Example: Where does MIR borrowck spend its time?
221+
### Example: Where does MIR borrowck spend its time?
219222

220223
Often we want to do a more "explorational" queries. Like, we know that
221224
MIR borrowck is 29% of the time, but where does that time get spent?
@@ -258,7 +261,7 @@ altogether ("total") and the percent of time spent in **just that
258261
function and not some callee of that function** (self). Usually
259262
"total" is the more interesting number, but not always.
260263

261-
### Absolute vs relative percentages
264+
### Relative percentages
262265

263266
By default, all in perf-focus are relative to the **total program
264267
execution**. This is useful to help you keep perspective -- often as
@@ -270,8 +273,7 @@ are easily compared against one another.
270273
That said, sometimes it's useful to get relative percentages, so `perf
271274
focus` offers a `--relative` option. In this case, the percentages are
272275
listed only for samples that match (vs all samples). So for example we
273-
could find out get our percentages relative to the borrowck itself
274-
like so:
276+
could get our percentages relative to the borrowck itself like so:
275277

276278
```bash
277279
> perf focus '{do_mir_borrowck}' --tree-callees --relative --tree-max-depth 1 --tree-min-percent 5

0 commit comments

Comments
 (0)