Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Conversation

GwenaelCaer
Copy link

Hello,

In a use case with xarray, I needed to colocalize in-situ data with a Zarr datacube, and for each colocalization, extract a mini-cube around the observation point. This selection uses Dask’s vindex function, and I quickly ran into performance limitations.

pytcube_schema

After some research, I discovered that vindex uses point-wise indexing, which can become very slow when the index is large.

I therefore tried to optimize vindex for my use case. The idea behind this optimization is to split the indices by chunks, index each chunk individually using these split indices, and then reconstruct the final result from the resulting pieces.

The current version does not yet pass all tests, but it works for my use case. I’m new to Dask internals, so if anyone wants to help or has advice on improving it, I’d greatly appreciate it.

I’m sharing an example to test the function’s performance.

import dask.array as da
import numpy as np

shape = (1_000, 1_000, 1_000)
chunks = (100, 100, 100)
x = da.random.random(shape, chunks=chunks).astype('float32')

buffer = 30
n = 1_000
indices = []
for idx, size in enumerate(x.shape):
    values = np.random.randint(buffer, size - (buffer + 1), n)
    new_shape = [1] * len(x.shape)
    new_shape[idx] = 2 * buffer + 1
    new_shape = [n] + new_shape
    new_array = np.array(
        [
            np.arange(value - buffer, value + buffer + 1)
            for value in values
        ]
    )
    indices.append(new_array.reshape(new_shape))
indices = tuple(indices)

r = x.vindex[indices]
rc = r.compute()

Thanks @keewis for your help and advice.

@GwenaelCaer GwenaelCaer changed the title performance optimization of vindex for large indexes performance optimization of vindex for large indexes Sep 12, 2025
Copy link
Contributor

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

      9 files  ±0        9 suites  ±0   3h 14m 2s ⏱️ -31s
 18 059 tests ±0   16 833 ✅  -  9   1 217 💤 ±0   9 ❌ + 9 
161 605 runs  ±0  149 403 ✅  - 81  12 121 💤 ±0  81 ❌ +81 

For more details on these failures, see this check.

Results for commit 196b85e. ± Comparison against base commit 37835c4.

@dcherian
Copy link
Contributor

Hi, thanks for digging so deeply in to this.

I had trouble understanding the description so here's what I have understood so far. In your example problem, indices is a list of 3 ndarrays with shapes shape=(1000, 61, 1, 1), shape=(1000, 1, 61, 1), shape=(1000, 1, 1, 61) so it's the case highlighted in this comment: https://github.com/dask/dask/pull/11625/files#r1901046262 . THe current algorithm operates on the full broadcasted versions of these 3 arrays, so it scales badly.

It would certainly be good to come up with a better algorithm for this case; I spent a bunch of time trying to do it but didn't succeed (at least not cleanly). However I don't understand what you've changed. Can you add some comments to the PR please explaining the logic.

FYI I developed #11625 looking at the benchmark in #10237. It would be good to not regress here.

import dask.array as da
import numpy as np
arr_1 = da.ones([6000, 6000], chunks=[3000, 3000])  # 36M elements in 4 chunks
idx = np.repeat(np.arange(0, 6000, 6), 1000)  # 1M index points
arr_2 = arr_1.vindex[idx, idx[::-1]]

In particular, writing code like this: mask = block_coords[dim] == coord inside a hot loop is not great for performance, which is why the current code, does an argsort outside the loop.

@keewis
Copy link
Contributor

keewis commented Sep 12, 2025

Not sure if he didn't leave for the week-end already, but @GwenaelCaer has created a documentation website with a lot of details on the optimization: https://odatis-public.gitlab-pages.ifremer.fr/fast_vindex/index.html

Does that answer your questions?

Edit: in particular, this might be interesting: https://odatis-public.gitlab-pages.ifremer.fr/fast_vindex/development/algorithm_step_by_step.html

@dcherian
Copy link
Contributor

ah thanks, that does help!

Conceptually, the trick is to completely avoid broadcasting, which I agree with. However, I'm not sure this implementation works well for the pointwise case (which is the case I heavily optimized for). Note that there is no advantage to be gained by not broadcasting in that case.

My conclusion earlier was that we'd need two algorithms to handle the two cases; in fact i remember thinking it's probably better to ask the user to use oindex directly. IIRC that algo is already decent (but I may be wrong)

@keewis
Copy link
Contributor

keewis commented Sep 12, 2025

My conclusion earlier was that we'd need two algorithms to handle the two cases

If I remember correctly, the proposed algorithm doesn't try to work with point-wise indexing, it is specialized on extracting regions (conceptually similar to what "vectorized multi-slices" would do). Which is why I suggested to add the proposed algorithm as a optimization for that case, and keep the existing algorithm for the other cases (not sure if the code already does that).

in fact i remember thinking it's probably better to ask the user to use oindex directly.

Not sure oindex would work for this particular use-case (cubes around potentially randomly distributed points), as the indices are only locally orthogonal.

@dcherian
Copy link
Contributor

dcherian commented Sep 12, 2025

it is specialized on extracting region

as the indices are only locally orthogonal.

Hmmm I don't understand --- the indices going in are broadcastable against each other. I'll need to spend some time thinking about it. This feels like a different indecing type almost

Which is why I suggested to add the proposed algorithm as a optimization for that case,

In that case the first step may be to clearly write out a condition that triggers the optimization after checking a condition. and then we could explore how to share code.

@keewis
Copy link
Contributor

keewis commented Sep 12, 2025

Hmmm I don't understand

It appears I might have confused myself. After clearing my head a bit I think you're right in that the arrays could be broadcasted against each other (to get three (n_points, nx, ny, nz) arrays, which we'd like to avoid to reduce the size in memory).

Fundamentally, the problem can be decomposed into n_points orthogonal indexing operations that we're trying to do at the same time to reduce the overhead.

In that case the first step may be to clearly write out a condition that triggers the optimization

Agreed, that makes sense to me.

@dcherian
Copy link
Contributor

Fundamentally, the problem can be decomposed into n_points orthogonal indexing operations that we're trying to do at the same time to reduce the overhead.

Yes i agree here. The challenge is figuring out when to do it, which is why maybe .oindex is a better place to put this kind of work (though i'm not sure if this counts as oindexing, i think it does)

@keewis
Copy link
Contributor

keewis commented Sep 12, 2025

I believe it's a hybrid between vectorized and orthogonal indexing: if we have the three arrays of shape (n_points, index_size), we want the last dimension to be orthogonal and the first dimension vectorized.

@GwenaelCaer
Copy link
Author

Thank you @keewis and @dcherian for the very interesting discussion. My algorithm is designed for a specific case with a hybrid of vectorized and orthogonal indexing, and it is therefore not suitable for point-wise indexing. It would therefore be useful to identify a condition that indicates when it could actually be applied.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.