Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Make pointers in ValueStore stable across insertions #5576

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 28 commits into from
Jun 2, 2025

Conversation

danakj
Copy link
Contributor

@danakj danakj commented May 29, 2025

This avoids reallocating the backing buffer in ValueStore so that references into the ValueStore are never invalidated when adding new values. This works especially well since we never delete values from a ValueStore.

The strategy used is to allocate chunks of a fixed size, and inserting into each chunk until it is full before allocating the next. The ValueStore starts with an initial allocated chunk in all cases, so that there is only a single indirection for adding and accessing values from this chunk. After it's full, additional chunks are allocated in a vector, so two indirections are required to add or access values in these chunks.

This obviates the need for #5529 as we no longer need to worry about holding pointers into a ValueStore.

We introduce a Flatten operation for ranges. It flattens a "range over ranges over Ts" down to a "range over Ts". This allows us to make an range over the values in the ValueStore from a range over the chunks in the ValueStore. See https://doc.rust-lang.org/stable/std/iter/trait.Iterator.html#method.flatten for inspiration for this name choice. Flatten is used in one other case where we were writing two levels of for loops to do the same thing.

The array_ref() accessor is changed to values() and its now a range (typed as a ValueStoreRange) over all values as references (like ArrayRef was, but without random access).

As pointers to a ValueStore can no longer be invalidated, we remove the ASAN poisoning feature and support from ValueStore.

This may cause a regression in our compile benchmark of up to 5%, though that is close to or within the noise of the benchmark. We can look at ways to optimize things further in the future. Perhaps by tuning the chunk size further, or by making later chunks larger than earlier chunks, or other strategies.

@danakj
Copy link
Contributor Author

danakj commented May 29, 2025

There's a lifetime-extension (or not) bug in here with the initializer_list, which I have fixed.

@danakj
Copy link
Contributor Author

danakj commented May 29, 2025

The mac fuzzer is still unhappy, though it does not reproduce locally. :/

@danakj
Copy link
Contributor Author

danakj commented May 29, 2025

The mac fuzzer is still unhappy, though it does not reproduce locally. :/

Oh maybe it does in -c opt, looking more.

@jonmeow jonmeow mentioned this pull request May 29, 2025
Copy link
Contributor

@jonmeow jonmeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, looks good. I'm glad you got this working!

I've looked over most of this, but the Flatten and ValueStoreRange will take a little more time. I want to think if there are possibly simpler alternatives with llvm's iterator support (and I see you're having lifetime issues). But here at least are comments on the rest, which I hope is helpful.

toolchain/base/value_store.h Outdated Show resolved Hide resolved
toolchain/base/value_store.h Outdated Show resolved Hide resolved
toolchain/base/value_store.h Outdated Show resolved Hide resolved
toolchain/base/value_store.h Outdated Show resolved Hide resolved
toolchain/base/value_store_chunk.h Outdated Show resolved Hide resolved
toolchain/base/value_store_chunk.h Outdated Show resolved Hide resolved
toolchain/base/value_store.h Outdated Show resolved Hide resolved
toolchain/base/value_store_chunk.h Outdated Show resolved Hide resolved
toolchain/base/value_store_chunk.h Outdated Show resolved Hide resolved
toolchain/sem_ir/formatter.cpp Outdated Show resolved Hide resolved
@jonmeow
Copy link
Contributor

jonmeow commented May 29, 2025

Ah, also, it might be good to check the compile_benchmark performance with/without this change. I think we're okay with the cost, but it may be good to have an idea if it's small (for opt only, only really asking about test performance for the NDEBUG stuff)

toolchain/base/value_store_chunk.h Outdated Show resolved Hide resolved
toolchain/base/value_store.h Outdated Show resolved Hide resolved
github-merge-queue bot pushed a commit that referenced this pull request May 30, 2025
Because this is coming up on #5576
toolchain/base/value_store_chunk.h Outdated Show resolved Hide resolved
@danakj danakj force-pushed the stable-pointers branch from af6ae59 to e1416d5 Compare May 30, 2025 16:19
@danakj danakj requested a review from jonmeow June 2, 2025 16:06
Copy link
Contributor

@jonmeow jonmeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG. I ahve a question about the chunk setup, but other than that, mostly small things. This seems at a reasonable state to merge, since the performance question seems to be addressed. Thanks, good work here!

toolchain/base/value_store_chunk.h Show resolved Hide resolved
requires(IdHasValueType<IdT>)
constexpr auto PlatformChunkIndexBits() -> int32_t {
static_assert(PlatformChunkCapacity<IdT>() > 0);
return std::bit_width(uint32_t{PlatformChunkCapacity<IdT>() - 1});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we typically use static_cast, e.g.:

Suggested change
return std::bit_width(uint32_t{PlatformChunkCapacity<IdT>() - 1});
return std::bit_width(static_cast<uint32_t>(PlatformChunkCapacity<IdT>() - 1));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The style guide prefers this syntax when possible, for primitives specifically. It does bounds checking for you at compile time. It doesn't compile if it can't.

static_assert(Capacity <= std::numeric_limits<int32_t>::max());

ValueType* buf_;
int32_t num_ = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#5576 (comment) brought up removing this size -- should we discuss that further in a meeting? I'm fine with this for now, but I think the duplication is something we should remove.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss. I think it makes our code much more prone to buffer overflow mistakes.

Comment on lines +171 to +174
// A range over references to the values in a ValueStore, returned from
// `ValueStore::values()`. Hides the complex type name of the iterator
// internally to provide a type name (`ValueStoreRange<IdT>`) that can be
// referred to without auto and templates.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, had you considered doing this directly on ValueStore, e.g.:

  auto values() -> auto { ... }
  using Range = decltype(MakeFlattenedRange(std::declval<const ValueStore<IdT>&>()));

And then instead of ValueStoreRange<IdT>, it's instead ValueStore<IdT>::Range

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't but I think I prefer code using ValueStoreRange as a function argument when it's required over a tested type in this case, since it's a full class type. Kind of like ArrayRef for SmallVector. It's probably just a matter of taste.

Not using auto is most of the point of ValueStoreRange so that it's possible to write a function receiving a range of values without making the function a template (like with ArrayRef before).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the described benefit to auto: aren't you just moving where auto is written, adding a class to wrap it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without ValueStoreRange we could just return auto which would be some untypeable llvm:: iterator. This means you can't really do:

// In .h
void f(ValueStoreRange<SemIR::InstId> ids);

// In .cpp
void f(ValueStoreRange<SemIR::InstId> ids) { ... }

Cuz the untypeable thing means you need to use a template to describe the range over values in a ValueStore. Whereas ValueStoreRange gives a simple type name to use there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're misunderstanding what I'm suggesting, I really mean the noted should be replaceable with:

// In .h
void f(ValueStore<SemIR::InstId>::Range ids);

// In .cpp
void f(ValueStore<SemIR::InstId>::Range ids) { ... }

But, I don't want to hold up this PR. I might send a separate PR to see what you think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, right. I think I responded to that part here:

I hadn't but I think I prefer code using ValueStoreRange as a function argument when it's required over a tested type in this case, since it's a full class type. Kind of like ArrayRef for SmallVector. It's probably just a matter of taste.

It could go either way but I think I prefer naming the class by the class, rather than introducing an alias to it.

toolchain/base/value_store.h Show resolved Hide resolved
toolchain/base/value_store.h Outdated Show resolved Hide resolved
@danakj danakj force-pushed the stable-pointers branch from 3f71e92 to ede67cf Compare June 2, 2025 17:24
@danakj danakj enabled auto-merge June 2, 2025 18:25
@danakj danakj added this pull request to the merge queue Jun 2, 2025
Merged via the queue into carbon-language:trunk with commit 02fc484 Jun 2, 2025
8 checks passed
@danakj danakj deleted the stable-pointers branch June 2, 2025 20:07
github-merge-queue bot pushed a commit that referenced this pull request Jun 3, 2025
Undo changes that were meant to prevent use of a reference into
`ValueStore` after being invalidated. After #5576, the `ValueStore`
makes such references stable, so there's no need to worry about
invalidation.
bricknerb pushed a commit to bricknerb/carbon-lang that referenced this pull request Jun 11, 2025
…#5576)

This avoids reallocating the backing buffer in ValueStore so that
references into the ValueStore are never invalidated when adding new
values. This works especially well since we never delete values from a
ValueStore.

The strategy used is to allocate chunks of a fixed size, and inserting
into each chunk until it is full before allocating the next. The
ValueStore starts with an initial allocated chunk in all cases, so that
there is only a single indirection for adding and accessing values from
this chunk. After it's full, additional chunks are allocated in a
vector, so two indirections are required to add or access values in
these chunks.

This obviates the need for
carbon-language#5529 as we no longer
need to worry about holding pointers into a ValueStore.

We introduce a Flatten operation for ranges. It flattens a "range over
ranges over Ts" down to a "range over Ts". This allows us to make an
range over the values in the ValueStore from a range over the chunks in
the ValueStore. See
https://doc.rust-lang.org/stable/std/iter/trait.Iterator.html#method.flatten
for inspiration for this name choice. Flatten is used in one other case
where we were writing two levels of for loops to do the same thing.

The `array_ref()` accessor is changed to `values()` and its now a range
(typed as a `ValueStoreRange`) over all values as references (like
ArrayRef was, but without random access).

As pointers to a ValueStore can no longer be invalidated, we remove the
ASAN poisoning feature and support from ValueStore.

This may cause a regression in our compile benchmark of up to 5%, though
that is close to or within the noise of the benchmark. We can look at
ways to optimize things further in the future. Perhaps by tuning the
chunk size further, or by making later chunks larger than earlier
chunks, or other strategies.
bricknerb pushed a commit to bricknerb/carbon-lang that referenced this pull request Jun 11, 2025
Undo changes that were meant to prevent use of a reference into
`ValueStore` after being invalidated. After carbon-language#5576, the `ValueStore`
makes such references stable, so there's no need to worry about
invalidation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.