performance optimization of `vindex` for large indexes #12074

GwenaelCaer · Sep 12, 2025

Hello,

In a use case with xarray, I needed to colocalize in-situ data with a Zarr datacube, and for each colocalization, extract a mini-cube around the observation point. This selection uses Dask’s vindex function, and I quickly ran into performance limitations.

After some research, I discovered that vindex uses point-wise indexing, which can become very slow when the index is large.

I therefore tried to optimize vindex for my use case. The idea behind this optimization is to split the indices by chunks, index each chunk individually using these split indices, and then reconstruct the final result from the resulting pieces.

The current version does not yet pass all tests, but it works for my use case. I’m new to Dask internals, so if anyone wants to help or has advice on improving it, I’d greatly appreciate it.

I’m sharing an example to test the function’s performance.

import dask.array as da
import numpy as np

shape = (1_000, 1_000, 1_000)
chunks = (100, 100, 100)
x = da.random.random(shape, chunks=chunks).astype('float32')

buffer = 30
n = 1_000
indices = []
for idx, size in enumerate(x.shape):
    values = np.random.randint(buffer, size - (buffer + 1), n)
    new_shape = [1] * len(x.shape)
    new_shape[idx] = 2 * buffer + 1
    new_shape = [n] + new_shape
    new_array = np.array(
        [
            np.arange(value - buffer, value + buffer + 1)
            for value in values
        ]
    )
    indices.append(new_array.reshape(new_shape))
indices = tuple(indices)

r = x.vindex[indices]
rc = r.compute()

Thanks @keewis for your help and advice.

github-actions · Sep 12, 2025

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

9 files ±0 9 suites ±0 3h 14m 2s ⏱️ -31s
18 059 tests ±0 16 833 ✅ - 9 1 217 💤 ±0 9 ❌ + 9
161 605 runs ±0 149 403 ✅ - 81 12 121 💤 ±0 81 ❌ +81

For more details on these failures, see this check.

Results for commit 196b85e. ± Comparison against base commit 37835c4.

dcherian · Sep 12, 2025

Hi, thanks for digging so deeply in to this.

I had trouble understanding the description so here's what I have understood so far. In your example problem, indices is a list of 3 ndarrays with shapes shape=(1000, 61, 1, 1), shape=(1000, 1, 61, 1), shape=(1000, 1, 1, 61) so it's the case highlighted in this comment: https://github.com/dask/dask/pull/11625/files#r1901046262 . THe current algorithm operates on the full broadcasted versions of these 3 arrays, so it scales badly.

It would certainly be good to come up with a better algorithm for this case; I spent a bunch of time trying to do it but didn't succeed (at least not cleanly). However I don't understand what you've changed. Can you add some comments to the PR please explaining the logic.

FYI I developed #11625 looking at the benchmark in #10237. It would be good to not regress here.

import dask.array as da
import numpy as np
arr_1 = da.ones([6000, 6000], chunks=[3000, 3000])  # 36M elements in 4 chunks
idx = np.repeat(np.arange(0, 6000, 6), 1000)  # 1M index points
arr_2 = arr_1.vindex[idx, idx[::-1]]

In particular, writing code like this: mask = block_coords[dim] == coord inside a hot loop is not great for performance, which is why the current code, does an argsort outside the loop.

keewis · Sep 12, 2025

Not sure if he didn't leave for the week-end already, but @GwenaelCaer has created a documentation website with a lot of details on the optimization: https://odatis-public.gitlab-pages.ifremer.fr/fast_vindex/index.html

Does that answer your questions?

Edit: in particular, this might be interesting: https://odatis-public.gitlab-pages.ifremer.fr/fast_vindex/development/algorithm_step_by_step.html

dcherian · Sep 12, 2025

ah thanks, that does help!

Conceptually, the trick is to completely avoid broadcasting, which I agree with. However, I'm not sure this implementation works well for the pointwise case (which is the case I heavily optimized for). Note that there is no advantage to be gained by not broadcasting in that case.

My conclusion earlier was that we'd need two algorithms to handle the two cases; in fact i remember thinking it's probably better to ask the user to use oindex directly. IIRC that algo is already decent (but I may be wrong)

keewis · Sep 12, 2025

My conclusion earlier was that we'd need two algorithms to handle the two cases

If I remember correctly, the proposed algorithm doesn't try to work with point-wise indexing, it is specialized on extracting regions (conceptually similar to what "vectorized multi-slices" would do). Which is why I suggested to add the proposed algorithm as a optimization for that case, and keep the existing algorithm for the other cases (not sure if the code already does that).

in fact i remember thinking it's probably better to ask the user to use oindex directly.

Not sure oindex would work for this particular use-case (cubes around potentially randomly distributed points), as the indices are only locally orthogonal.

dcherian · Sep 12, 2025

it is specialized on extracting region

as the indices are only locally orthogonal.

Hmmm I don't understand --- the indices going in are broadcastable against each other. I'll need to spend some time thinking about it. This feels like a different indecing type almost

Which is why I suggested to add the proposed algorithm as a optimization for that case,

In that case the first step may be to clearly write out a condition that triggers the optimization after checking a condition. and then we could explore how to share code.

keewis · Sep 12, 2025

Hmmm I don't understand

It appears I might have confused myself. After clearing my head a bit I think you're right in that the arrays could be broadcasted against each other (to get three (n_points, nx, ny, nz) arrays, which we'd like to avoid to reduce the size in memory).

Fundamentally, the problem can be decomposed into n_points orthogonal indexing operations that we're trying to do at the same time to reduce the overhead.

In that case the first step may be to clearly write out a condition that triggers the optimization

Agreed, that makes sense to me.

dcherian · Sep 12, 2025

Fundamentally, the problem can be decomposed into n_points orthogonal indexing operations that we're trying to do at the same time to reduce the overhead.

Yes i agree here. The challenge is figuring out when to do it, which is why maybe .oindex is a better place to put this kind of work (though i'm not sure if this counts as oindexing, i think it does)

keewis · Sep 12, 2025

I believe it's a hybrid between vectorized and orthogonal indexing: if we have the three arrays of shape (n_points, index_size), we want the last dimension to be orthogonal and the first dimension vectorized.

GwenaelCaer · Sep 15, 2025

Thank you @keewis and @dcherian for the very interesting discussion. My algorithm is designed for a specific case with a hybrid of vectorized and orthogonal indexing, and it is therefore not suitable for point-wise indexing. It would therefore be useful to identify a condition that indicates when it could actually be applied.

gc7b229 and others added 2 commits September 12, 2025 11:28

update _vindex_array

c25f76a

Merge branch 'dask:main' into main

196b85e

GwenaelCaer changed the title ~~performance optimization of vindex for large indexes~~ performance optimization of vindex for large indexes Sep 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

performance optimization of `vindex` for large indexes #12074

performance optimization of `vindex` for large indexes #12074

GwenaelCaer commented Sep 12, 2025

Uh oh!

github-actions bot commented Sep 12, 2025

Uh oh!

dcherian commented Sep 12, 2025

Uh oh!

keewis commented Sep 12, 2025 •

edited

Loading

Uh oh!

dcherian commented Sep 12, 2025

Uh oh!

keewis commented Sep 12, 2025

Uh oh!

dcherian commented Sep 12, 2025 •

edited

Loading

Uh oh!

keewis commented Sep 12, 2025

Uh oh!

dcherian commented Sep 12, 2025

Uh oh!

keewis commented Sep 12, 2025

Uh oh!

GwenaelCaer commented Sep 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Search code, repositories, users, issues, pull requests...

Uh oh!

performance optimization of vindex for large indexes #12074

Are you sure you want to change the base?

performance optimization of vindex for large indexes #12074

Conversation

GwenaelCaer commented Sep 12, 2025

Uh oh!

github-actions bot commented Sep 12, 2025

Unit Test Results

Uh oh!

dcherian commented Sep 12, 2025

Uh oh!

keewis commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dcherian commented Sep 12, 2025

Uh oh!

keewis commented Sep 12, 2025

Uh oh!

dcherian commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

keewis commented Sep 12, 2025

Uh oh!

dcherian commented Sep 12, 2025

Uh oh!

keewis commented Sep 12, 2025

Uh oh!

GwenaelCaer commented Sep 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

performance optimization of `vindex` for large indexes #12074

performance optimization of `vindex` for large indexes #12074

keewis commented Sep 12, 2025 •

edited

Loading

dcherian commented Sep 12, 2025 •

edited

Loading