fix: Performance improvement for scattergl with many points. Issue #7065 #7301
+9
−2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The performance issues are always intriguing, and I tried to determine why the browser freezes when handling a large number of points. I believe the problem stems from two main factors.
The first factor is the hover effect, which triggers a function that calculates the distance between the mouse pointer and each point in the set. This process can certainly contribute to the performance problem, I believe the best improvement would be using a filter calculated by a window determined by the pointer coordinates with a delta, in this way, we can calculate the distance only for a subset. One possible improvement could be using squared distances for comparison, thereby avoiding the computation of square roots, similar to the method utilized by the K-d tree algorithm, if the plot allows it.
The second factor involves a loop in the part of the code I modified, where "newDistance" is created for every point. This results in excessive overhead for the garbage collector, causing the browser to freeze.
I moved the variable outside the cycle and used a precalculated value for the array length. I conducted tests in the development environments provided by Plotly using the test_dashboard tool.
The values represent average times measured in milliseconds..
The red line shows the performance for the variable inside, the blue line outside. The vertical axis (Y-axis) represents the time measured in milliseconds.

Of course, the performance depends on the engine and the environment in which it runs, but I think this conveys the idea of performance.
Please let me know if this can be the first step to improving performance.