Skip to content

Rewriteis_ascii using slice::as_chunks #144837

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Kmeakin
Copy link
Contributor

@Kmeakin Kmeakin commented Aug 2, 2025

Generalize the x86-64+sse2 version of is_ascii to be architecture-neutral, and rewrite it using slice::as_chunks. The new version is both shorter (in terms of Rust source code) and smaller (in terms of produced assembly).

Compare the assembly generated before and after:
https://godbolt.org/z/MWKdnaYoK

@rustbot
Copy link
Collaborator

rustbot commented Aug 2, 2025

r? @tgross35

rustbot has assigned @tgross35.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Aug 2, 2025
@rust-log-analyzer

This comment has been minimized.

Generalize the x86-64+sse2 version of `is_ascii` to be architecture-neutral, and
rewrite it using `slice::as_chunks`. The new version is both shorter (in terms
of Rust source code) and smaller (in terms of produced assembly).

Compare the assembly generated before and after:
https://godbolt.org/z/MWKdnaYoK
@Kmeakin Kmeakin force-pushed the km/optimize-is-ascii branch from c43111d to 7cbbbc6 Compare August 2, 2025 18:08
@hanna-kruppe
Copy link
Contributor

The simpler source code and shorter assembly seem to boil down to two changes:

  1. Using unaligned loads for every full chunk, instead of trying to align all loads except possibly the first and the last one.
  2. Always using the simple byte-by-byte loop for the last bytes.len() % CHUNK_SIZE bytes, instead of trying to handle it with an unaligned load that overlaps with the preceding chunk.

The first one seems quite reasonable in many cases. It probably causes a huge performance regression for targets that don't have efficient unaligned loads, but to be fair, those are becoming less common and less important over time.

The second change may be quite problematic for some common input sizes, though. Try benchmarking before vs. after on an input that's 2 * CHUNK_SIZE - 1 bytes long, or with a random short input lengths that make the branches and iteration counts less predictable.

@okaneco
Copy link
Contributor

okaneco commented Aug 2, 2025

There are some benchmarks in library/core/benches/ascii/is_ascii.rs and I added more in #130733, also a codegen test.

When I originally made that PR, new uses of const_eval_select seemed to be discouraged when making a function const, and then the situation was a little different by the time it was reviewed and merged.

However, the usize-aligned path is probably still needed for targets without SIMD like i586-unknown-linux-gnu since it can do SWAR ASCII checks instead of byte at a time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants