Allow the global allocator to use thread-local storage and std::thread::current() #144465

orlp · 2025-07-25T20:39:24Z

Currently the thread-local storage implementation uses the Global allocator if it needs to allocate memory in some places. This effectively means the global allocator can not use thread-local variables. This is a shame as an allocator is precisely one of the locations where you'd really want to use thread-locals. We also see that this lead to hacks such as #116402, where we detect re-entrance and abort.

So I've made the places where I could find allocation happening in the TLS implementation use the System allocator instead. I also applied this change to the storage allocated for a Thread handle so that it may be used care-free in the global allocator as well, for e.g. registering it to a central place or parking primitives.

r? @joboet

Mark-Simulacrum · 2025-07-26T02:21:43Z

I think this merits some libs-api discussion (unless that's already happened? I didn't quickly find it). Landing this IMO implies at least an implicit guarantee (even if not necessarily stable without actual docs) that we don't introduce global allocator usage in thread local code. I think we should have some discussion and either commit to that or not. And we should discuss where we draw the line (e.g., is other runtime-like code in std supposed to do this? For example, environment variables or other bits that need allocation early, potentially even before main starts -- is that OK to call into the allocator for?)

OTOH, if we can make a weaker guarantee here while still serving the purpose, that may also be good to look at? For example, can we guarantee that a pattern like dhat's for ignoring re-entrant alloc/dealloc calls is safe? i.e., that we don't need allocations for non-Drop thread locals? Or if we can only do that for a limited set, perhaps std could define a public thread local for allocators to use?

orlp · 2025-07-26T10:00:12Z

@Mark-Simulacrum

I think this merits some libs-api discussion (unless that's already happened? I didn't quickly find it). Landing this IMO implies at least an implicit guarantee (even if not necessarily stable without actual docs) that we don't introduce global allocator usage in thread local code.

Yes, this should probably be properly discussed.

And we should discuss where we draw the line (e.g., is other runtime-like code in std supposed to do this? For example, environment variables

You make a good point here about environment variables. It is very common to allow configuration and debugging of allocators through them. I'd like to at least guarantee that std::env::var_os does not result in allocation through the global allocator, I will look into that to see how feasible that is.

or other bits that need allocation early, potentially even before main starts -- is that OK to call into the allocator for?)

I think it's good to brainstorm a bit what would be needed for a global allocator, although I don't think an allocator needs all that much from std. Environment variables, thread locals, getting a thread handle comes to mind, and I guess spawning a thread is also rather important for creating background cleanup. I will also investigate that.

OTOH, if we can make a weaker guarantee here while still serving the purpose, that may also be good to look at?

I don't think there's much difference in constraint for the stdlib between guaranteeing this for all thread_local instances as opposed to only non-Drop thread locals, especially on platforms where every single thread-local already does an allocation regardless.

For example, can we guarantee that a pattern like dhat's for ignoring re-entrant alloc/dealloc calls is safe?

Not without putting undue burden on the allocator. dhat doesn't actually change how allocation works, so regardless of re-entrance it calls System.alloc(layout) or System.dealloc(ptr, layout). If it was actually changing allocation method depending on re-entrance it would need to keep track of which pointers were allocated normally, and which were allocated re-entrant, and call the appropriate dealloc function for each. This makes all deallocation significantly slower regardless of re-entrance, in addition to adding extra complexity.

orlp · 2025-07-26T11:07:54Z

I did some investigation into some of the problems of making std::thread::spawn global-allocator-free:

You can't give the thread a name because the interface for setting a name inherently takes a String. Perhaps an extra method could be added static_name which takes a &'static str, and the internal field changed to a Cow.
The RUST_MIN_STACK environment variable gets parsed to an integer if not explicitly specified. So this means var_os needs to be global-allocator-free. ~~This is something I'd want regardless.~~ This is impossible, so any call would need to explicitly specify the stack size.
no_spawn_hooks almost surely has to be set.
The Arc containing the Packet needs to be made to use System.
The Box containing the main function needs to be made to use System.

The above seems feasible to me, although I also had another thought: I don't know if std::thread::spawn needs to be global-allocator-free. The only use-case I can think of for spawning thread in the global allocator is for spawning cleanup threads, and I think it's perfectly fine if this happens after first init:

if !alloc_is_init() {
    init_alloc();
    // Now the allocator works and may be re-entrant.
    spawn_background_threads();
}

So I'm perfectly happy if we don't make std::thread::spawn global-allocator-free.

While investigating the above I also found the following additional blocker for making std::thread::current() global-allocator-free I didn't realize before: ThreadId::new on platforms which don't have 64-bit atomics uses a Mutex, so Mutex would have to be global-allocation-free (likely unfeasible), or this implementation changed. Which lead me to ask:

It's possible to do this with a fairly simple spinlock, but are there platforms we support that have threads and Mutex but not an atomic we can use for a spinlock?

To answer my own question: std requires the existence of AtomicBool.

orlp · 2025-07-26T12:38:14Z

I've added a commit to use a spinlock for ThreadId if 64-bit atomics are unavailable, and with that I believe std::thread::current() should be safe to call in a global allocator.

I also investigated environment variables and... it's hopeless without a new API. The current API returns owned Strings or OsStrings both of which allocate with the global allocator. Currently if the global allocator wants to read environment variables it'll have to use libc::getenv (and thus bypass Rust's lock). I don't think that's the end of the world.

joshtriplett · 2025-07-29T15:50:37Z

We talked about this in today's @rust-lang/libs-api meeting.

We were supportive of doing this, and we agreed that using TLS in an allocator seems desirable.

We also decided there's no point in doing this if we don't make it a guarantee of support. Thus, labeling this libs-api.

@rfcbot merge

Also, before merging this we'd like to see the documentation for TLS updated to make this guarantee.

rfcbot · 2025-07-29T15:50:38Z

Team member @joshtriplett has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

joshtriplett · 2025-07-29T15:51:04Z

Separately, it would also be useful for the global allocator documentation to provide an explicit safelist of other things you're allowed to use.

rfcbot · 2025-07-29T15:54:10Z

🔔 This is now entering its final comment period, as per the review above. 🔔

…l allocator

orlp · 2025-07-29T17:53:20Z

@joshtriplett

Also, before merging this we'd like to see the documentation for TLS updated to make this guarantee.

Done.

Separately, it would also be useful for the global allocator documentation to provide an explicit safelist of other things you're allowed to use.

Where? On the std::alloc page where the #[global_allocator] attribute is described, or on GlobalAlloc?

RalfJung · 2025-08-01T11:09:02Z

So with this, for the first time, a program with a #[global_allocator] might still use the System allocator for some std operations the user has no control over? That seems really surprising and potentially breaking to me. The point of #[global_allocator] is to entirely replace which code std uses for allocations. We even have tests relying on that, using a custom #[global_allocator] to count how many allocations have been created. Those tests will now be less effective.

At the very least, this should be documented with that attribute.

matthieu-m · 2025-08-01T12:02:00Z

Much like Ralf, I am quite surprised at the "allocator fork" occurring here.

Now, I would like to note that there is precedent for this. One of the early pains I encountered in replacing the system allocator with a custom allocator was that the loader will use the system allocator for loading the code (and constants, etc...) regardless. Thus, to an extent, there are already allocations bypassing the global allocator.

Still, up until now, this was a well-defined set. Loaded sections of library/binary would be allocated with the system allocator, all further allocations would be allocated with the global allocator. The divide is clear. And one can (try to) write their own loader if they wish to change the statu quo.

This changes removes control from the hands of the (allocator) developer. It may be fine. It may be the beginning of a slippery slope.

For the point at hand, is there any reason that the thread-local destructors could not be registered in an intrusive linked list instead?

That is, each thread-local variable requiring a destructor would be accompanied by a thread-local:

struct Node {
    pointer: *mut u8,
    destructor: unsafe extern "C" fn(*mut u8),
    next: *mut Node,
}

And destruction would simply consist of popping the first destructor of the linked-list and executing it in a loop.

(This doesn't solve all problems identified in this PR, but it may solve the most salient one)

orlp · 2025-08-01T12:17:58Z

So with this, for the first time, a program with a #[global_allocator] might still use the System allocator for some std operations the user has no control over? That seems really surprising and potentially breaking to me.

Well, yes and no. I think technically yes, this is the first time specifically System is explicitly called from the Rust standard library.

However, System essentially maps to libc::malloc, which is still called all the time by other libc functions and other things we use internally in std. Is there a reason you specifically want to single out this usage of malloc from the others?
For example if I currently set a breakpoint on malloc, on MacOS we call tlv_get_addr in our thread-local implementation, which calls malloc internally. Why was that okay, but would this be breaking?

Note that this doesn't affect no_std applications which absolutely rely on #[global_allocator] and for which there is no system allocator available. The thread_local! and std::thread implementation is firmly in std.

orlp · 2025-08-01T12:20:58Z

For the point at hand, is there any reason that the thread-local destructors could not be registered in an intrusive linked list instead?

On some platforms we can only install a key (that is, a pointer) into thread-local storage, which requires boxing of the thread-local variable. We must allocate on these platforms. See here:

rust/library/std/src/sys/thread_local/os.rs

Line 101 in 8410440

let value =

We also require allocation for std::thread::current() as the thread handle may survive the thread it refers to, so it can't live on the thread's stack.

This doesn't solve all problems identified in this PR, but it may solve the most salient one.

Well, considering there are platforms which hard require allocation for thread-locals, and every platform requiring allocation for std::thread::current(), I don't particularly agree.

orlp · 2025-08-04T14:07:53Z

@RalfJung @matthieu-m Sorry, forgot to tag you earlier, awaiting your response.

RalfJung · 2025-08-04T14:25:19Z

Not much for me to respond to, it's up to t-libs-api to make a call here. I don't care strongly either way myself, but this seemed like a potential issue not mentioned in the discussion so I am curious what t-libs-api thinks.

The one point I do care strongly about is that this must be very clearly documented as part of this PR, that the Rust runtime will bypass the #[global_allocator] for some allocations. I don't think "C code or the linker already did the same" is sufficient justification for the documentation to remain silent on this point. Nobody expects a Rust attribute to change what the linker does. People very reasonably expect that the attribute controls what the Rust runtime does.

…be called

orlp · 2025-08-04T16:52:11Z

The one point I do care strongly about is that this must be very clearly documented as part of this PR, that the Rust runtime will bypass the #[global_allocator] for some allocations.

@RalfJung I've added a commit which adds a footnote to #[global_allocator] stating that Rust's internals may still call System, also linking to a new section on global allocator re-entrance I've added to GlobalAlloc. This section explains the dangers of calling std functions from the global allocator and also specifies our current guarantees for global-allocation-free functions/methods which hopefully satisfies @joshtriplett.

RalfJung · 2025-08-04T16:56:30Z

Thanks! However, I don't think I agree with delegating this to a footnote, that seems a bit too easy to miss -- though I admit I don't know how exactly footnotes even render, we rarely use them (which is another argument against using them here, IMO).

orlp · 2025-08-04T17:07:49Z

@RalfJung It renders as such:

Amanieu · 2025-08-04T23:44:48Z

Note that libc will already often use the system allocator behind our backs, such as for pthread_create which we use for spawning threads.

Use System allocator for thread-local storage

97c26a0

rustbot assigned joboet Jul 25, 2025

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jul 25, 2025

orlp changed the title ~~Use System allocator for thread-local storage~~ Use System allocator for thread-local storage and std::thread::Thread Jul 25, 2025

orlp mentioned this pull request Jul 25, 2025

Allow the global alloc one TLS slot with a destructor #143761

Closed

Mark-Simulacrum added the I-libs-api-nominated Nominated for discussion during a libs-api team meeting. label Jul 26, 2025

Use spinlock for ThreadId if 64-bit atomic unavailable

5a022e0

orlp changed the title ~~Use System allocator for thread-local storage and std::thread::Thread~~ Allow the global allocator to use thread-local storage and std::thread::current() Jul 26, 2025

Amanieu removed the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Jul 29, 2025

joshtriplett added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. and removed T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jul 29, 2025

rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. labels Jul 29, 2025

Amanieu removed the I-libs-api-nominated Nominated for discussion during a libs-api team meeting. label Jul 29, 2025

rfcbot added final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. and removed proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. labels Jul 29, 2025

orlp added 2 commits July 29, 2025 19:47

Add documentation guaranteeing global allocator use of TLS

f0161e9

Remove outdated part of comment claiming thread_local re-enters globa…

e653658

…l allocator

Fix typo in doc comment

8410440

Add comments for guarantees given and footnote that System may still …

4840fd6

…be called

Allow the global allocator to use thread-local storage and std::thread::current() #144465

Are you sure you want to change the base?

Allow the global allocator to use thread-local storage and std::thread::current() #144465

Conversation

orlp commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Mark-Simulacrum commented Jul 26, 2025

Uh oh!

orlp commented Jul 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

orlp commented Jul 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

orlp commented Jul 26, 2025

Uh oh!

joshtriplett commented Jul 29, 2025

Uh oh!

rfcbot commented Jul 29, 2025 • edited by BurntSushi Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joshtriplett commented Jul 29, 2025

Uh oh!

rfcbot commented Jul 29, 2025

Uh oh!

orlp commented Jul 29, 2025

Uh oh!

RalfJung commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

matthieu-m commented Aug 1, 2025

Uh oh!

orlp commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

orlp commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

orlp commented Aug 4, 2025

Uh oh!

RalfJung commented Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

orlp commented Aug 4, 2025

Uh oh!

RalfJung commented Aug 4, 2025

Uh oh!

orlp commented Aug 4, 2025

Uh oh!

Amanieu commented Aug 4, 2025

Uh oh!

Uh oh!

orlp commented Jul 25, 2025 •

edited

Loading

orlp commented Jul 26, 2025 •

edited

Loading

orlp commented Jul 26, 2025 •

edited

Loading

rfcbot commented Jul 29, 2025 •

edited by BurntSushi

Loading

RalfJung commented Aug 1, 2025 •

edited

Loading

orlp commented Aug 1, 2025 •

edited

Loading

orlp commented Aug 1, 2025 •

edited

Loading

RalfJung commented Aug 4, 2025 •

edited

Loading