Make pointers in ValueStore stable across insertions #5576

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

danakj merged 28 commits into carbon-language:trunk from danakj:stable-pointers

Jun 2, 2025

Contributor

danakj commented May 29, 2025 •

edited

Loading

This avoids reallocating the backing buffer in ValueStore so that references into the ValueStore are never invalidated when adding new values. This works especially well since we never delete values from a ValueStore.

The strategy used is to allocate chunks of a fixed size, and inserting into each chunk until it is full before allocating the next. The ValueStore starts with an initial allocated chunk in all cases, so that there is only a single indirection for adding and accessing values from this chunk. After it's full, additional chunks are allocated in a vector, so two indirections are required to add or access values in these chunks.

This obviates the need for #5529 as we no longer need to worry about holding pointers into a ValueStore.

We introduce a Flatten operation for ranges. It flattens a "range over ranges over Ts" down to a "range over Ts". This allows us to make an range over the values in the ValueStore from a range over the chunks in the ValueStore. See https://doc.rust-lang.org/stable/std/iter/trait.Iterator.html#method.flatten for inspiration for this name choice. Flatten is used in one other case where we were writing two levels of for loops to do the same thing.

The array_ref() accessor is changed to values() and its now a range (typed as a ValueStoreRange) over all values as references (like ArrayRef was, but without random access).

As pointers to a ValueStore can no longer be invalidated, we remove the ASAN poisoning feature and support from ValueStore.

This may cause a regression in our compile benchmark of up to 5%, though that is close to or within the noise of the benchmark. We can look at ways to optimize things further in the future. Perhaps by tuning the chunk size further, or by making later chunks larger than earlier chunks, or other strategies.


          stable-pointers

6067f71

github-actions bot requested a review from jonmeow

May 29, 2025 21:03

github-actions bot added infrastructure toolchain labels


          fix-lifetimes-with-init-list

e3ef61d

Contributor Author

danakj commented May 29, 2025

There's a lifetime-extension (or not) bug in here with the initializer_list, which I have fixed.

Contributor Author

danakj commented May 29, 2025

The mac fuzzer is still unhappy, though it does not reproduce locally. :/

Contributor Author

danakj commented May 29, 2025

The mac fuzzer is still unhappy, though it does not reproduce locally. :/

Oh maybe it does in -c opt, looking more.

jonmeow mentioned this pull request

Use concat in formatter #5577

Merged

jonmeow reviewed

View reviewed changes

Contributor

jonmeow left a comment

In general, looks good. I'm glad you got this working!

I've looked over most of this, but the Flatten and ValueStoreRange will take a little more time. I want to think if there are possibly simpler alternatives with llvm's iterator support (and I see you're having lifetime issues). But here at least are comments on the rest, which I hope is helpful.

toolchain/base/value_store.h Outdated Show resolved Hide resolved

toolchain/base/value_store.h Outdated Show resolved Hide resolved

toolchain/base/value_store.h Outdated Show resolved Hide resolved

toolchain/base/value_store.h Outdated Show resolved Hide resolved

toolchain/base/value_store_chunk.h Outdated Show resolved Hide resolved

toolchain/base/value_store_chunk.h Outdated Show resolved Hide resolved

toolchain/base/value_store.h Outdated Show resolved Hide resolved

toolchain/base/value_store_chunk.h Outdated Show resolved Hide resolved

toolchain/base/value_store_chunk.h Outdated Show resolved Hide resolved

toolchain/sem_ir/formatter.cpp Outdated Show resolved Hide resolved

Contributor

jonmeow commented May 29, 2025

Ah, also, it might be good to check the compile_benchmark performance with/without this change. I think we're okay with the cost, but it may be good to have an idea if it's small (for opt only, only really asking about test performance for the NDEBUG stuff)

chandlerc reviewed

View reviewed changes

toolchain/base/value_store_chunk.h Outdated Show resolved Hide resolved

toolchain/base/value_store.h Outdated Show resolved Hide resolved

github-merge-queue bot pushed a commit that referenced this pull request


          Use concat in formatter (#5577)

a01648c

Because this is coming up on #5576

jonmeow reviewed

View reviewed changes

toolchain/base/value_store_chunk.h Outdated Show resolved Hide resolved

danakj added 2 commits

May 30, 2025 11:53


          track size_, rename base_chunk_, rearrange some things

5079e3f


          no-fuzzer-branch

e1416d5

danakj force-pushed the stable-pointers branch from af6ae59 to e1416d5 Compare

May 30, 2025 16:19

danakj and others added 13 commits

May 30, 2025 12:40


          comma

610383f


          concept, aligned allocs, alloc size to 4k and ASAN

9341e74


          review

f97039a


          comment-about-speed


          smallvector-storage-of-chunks


          bit_width

6b36ffd


          int32_t

dce2b0b


          adjusting-constexpr-fns

93b2a47


          lifetimebound

51dbade


          inline


          simpler-index-based-flatten

d6dbd12


          int32_t

eb85099


          extra-parens

d2d32d9


          comments

510621b

danakj requested a review from jonmeow

June 2, 2025 16:06

danakj added 7 commits

June 2, 2025 12:08


          clarify-comment

e5fd2d7


          fuzzier-results

1d362df


          backticks

2f4af6b


          chunk-not-a-range-anymore

ded6c62


          remove-fixme

7bb9f28


          4k-is-same-as-1k-fix-comment

3b915eb


          combine-comments

0fd279d

danakj mentioned this pull request

Fix ASAN poison bugs, add debugging infrastructure for finding them, and enable poisoning with ASAN by default #5529

Closed

jonmeow approved these changes

View reviewed changes

Contributor

jonmeow left a comment

LG. I ahve a question about the chunk setup, but other than that, mostly small things. This seems at a reasonable state to merge, since the performance question seems to be addressed. Thanks, good work here!

toolchain/base/value_store_chunk.h Show resolved Hide resolved

toolchain/base/value_store_chunk.h

+                requires(IdHasValueType<IdT>)
+              constexpr auto PlatformChunkIndexBits() -> int32_t {
+                static_assert(PlatformChunkCapacity<IdT>() > 0);
+                return std::bit_width(uint32_t{PlatformChunkCapacity<IdT>() - 1});

Contributor

jonmeow Jun 2, 2025

I think we typically use static_cast, e.g.:

Suggested change

      
              return std::bit_width(uint32_t{PlatformChunkCapacity<IdT>() - 1});
          
              return std::bit_width(static_cast<uint32_t>(PlatformChunkCapacity<IdT>() - 1));

Contributor Author

danakj Jun 2, 2025

The style guide prefers this syntax when possible, for primitives specifically. It does bounds checking for you at compile time. It doesn't compile if it can't.

toolchain/base/value_store_chunk.h

+                static_assert(Capacity <= std::numeric_limits<int32_t>::max());
+                ValueType* buf_;
+                int32_t num_ = 0;

Contributor

jonmeow Jun 2, 2025

#5576 (comment) brought up removing this size -- should we discuss that further in a meeting? I'm fine with this for now, but I think the duplication is something we should remove.

Contributor Author

danakj Jun 2, 2025

Let's discuss. I think it makes our code much more prone to buffer overflow mistakes.

toolchain/base/value_store.h

Comment on lines +171 to +174

+              // A range over references to the values in a ValueStore, returned from
+              // `ValueStore::values()`. Hides the complex type name of the iterator
+              // internally to provide a type name (`ValueStoreRange<IdT>`) that can be
+              // referred to without auto and templates.

Contributor

jonmeow Jun 2, 2025

Out of curiosity, had you considered doing this directly on ValueStore, e.g.:

  auto values() -> auto { ... }
  using Range = decltype(MakeFlattenedRange(std::declval<const ValueStore<IdT>&>()));

And then instead of ValueStoreRange<IdT>, it's instead ValueStore<IdT>::Range

Contributor Author

danakj Jun 2, 2025

I hadn't but I think I prefer code using ValueStoreRange as a function argument when it's required over a tested type in this case, since it's a full class type. Kind of like ArrayRef for SmallVector. It's probably just a matter of taste.

Not using auto is most of the point of ValueStoreRange so that it's possible to write a function receiving a range of values without making the function a template (like with ArrayRef before).

Contributor

jonmeow Jun 2, 2025

I don't understand the described benefit to auto: aren't you just moving where auto is written, adding a class to wrap it?

Contributor Author

danakj Jun 2, 2025

Without ValueStoreRange we could just return auto which would be some untypeable llvm:: iterator. This means you can't really do:

// In .h
void f(ValueStoreRange<SemIR::InstId> ids);

// In .cpp
void f(ValueStoreRange<SemIR::InstId> ids) { ... }

Cuz the untypeable thing means you need to use a template to describe the range over values in a ValueStore. Whereas ValueStoreRange gives a simple type name to use there.

Contributor

jonmeow Jun 2, 2025

I think you're misunderstanding what I'm suggesting, I really mean the noted should be replaceable with:

// In .h
void f(ValueStore<SemIR::InstId>::Range ids);

// In .cpp
void f(ValueStore<SemIR::InstId>::Range ids) { ... }

But, I don't want to hold up this PR. I might send a separate PR to see what you think.

Contributor Author

danakj Jun 2, 2025

Okay, right. I think I responded to that part here:

I hadn't but I think I prefer code using ValueStoreRange as a function argument when it's required over a tested type in this case, since it's a full class type. Kind of like ArrayRef for SmallVector. It's probably just a matter of taste.

It could go either way but I think I prefer naming the class by the class, rather than introducing an alias to it.

toolchain/base/value_store.h Show resolved Hide resolved

toolchain/base/value_store.h Outdated Show resolved Hide resolved

danakj added 2 commits

June 2, 2025 13:04


          construction-and-simplify-enumerate

284695a


          move-overflow-check-to-size_

ede67cf

danakj force-pushed the stable-pointers branch from 3f71e92 to ede67cf Compare

June 2, 2025 17:24


          Merge branch 'trunk' into stable-pointers

99f72dd

danakj enabled auto-merge

June 2, 2025 18:25

danakj mentioned this pull request

Validate that rewrite constraints match the requirements #5506

Closed

danakj added this pull request to the merge queue

danakj mentioned this pull request

Fearlessly hold references into ValueStore again #5589

Merged

Merged via the queue into carbon-language:trunk with commit 02fc484

8 checks passed

danakj deleted the stable-pointers branch

June 2, 2025 20:07

github-merge-queue bot pushed a commit that referenced this pull request


          Fearlessly hold references into ValueStore again (#5589)

493bea1

Undo changes that were meant to prevent use of a reference into
`ValueStore` after being invalidated. After #5576, the `ValueStore`
makes such references stable, so there's no need to worry about
invalidation.

bricknerb pushed a commit to bricknerb/carbon-lang that referenced this pull request


          Make pointers in ValueStore stable across insertions (carbon-language…

3979af3

…#5576)

This avoids reallocating the backing buffer in ValueStore so that
references into the ValueStore are never invalidated when adding new
values. This works especially well since we never delete values from a
ValueStore.

The strategy used is to allocate chunks of a fixed size, and inserting
into each chunk until it is full before allocating the next. The
ValueStore starts with an initial allocated chunk in all cases, so that
there is only a single indirection for adding and accessing values from
this chunk. After it's full, additional chunks are allocated in a
vector, so two indirections are required to add or access values in
these chunks.

This obviates the need for
carbon-language#5529 as we no longer
need to worry about holding pointers into a ValueStore.

We introduce a Flatten operation for ranges. It flattens a "range over
ranges over Ts" down to a "range over Ts". This allows us to make an
range over the values in the ValueStore from a range over the chunks in
the ValueStore. See
https://doc.rust-lang.org/stable/std/iter/trait.Iterator.html#method.flatten
for inspiration for this name choice. Flatten is used in one other case
where we were writing two levels of for loops to do the same thing.

The `array_ref()` accessor is changed to `values()` and its now a range
(typed as a `ValueStoreRange`) over all values as references (like
ArrayRef was, but without random access).

As pointers to a ValueStore can no longer be invalidated, we remove the
ASAN poisoning feature and support from ValueStore.

This may cause a regression in our compile benchmark of up to 5%, though
that is close to or within the noise of the benchmark. We can look at
ways to optimize things further in the future. Perhaps by tuning the
chunk size further, or by making later chunks larger than earlier
chunks, or other strategies.

bricknerb pushed a commit to bricknerb/carbon-lang that referenced this pull request


          Fearlessly hold references into ValueStore again (carbon-language#5589)

dd54793

Undo changes that were meant to prevent use of a reference into
`ValueStore` after being invalidated. After carbon-language#5576, the `ValueStore`
makes such references stable, so there's no need to worry about
invalidation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

infrastructure toolchain