Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

gh-131798: JIT: Optimize _POP_CALL_TWO_LOAD_CONST_INLINE_BORROW #134369

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
May 22, 2025

Conversation

tomasr8
Copy link
Member

@tomasr8 tomasr8 commented May 20, 2025

Optimize _POP_CALL_TWO_LOAD_CONST_INLINE_BORROW

This requires adding some additional ops:

  • _POP_CALL_ONE_LOAD_CONST_INLINE_BORROW
  • _POP_CALL_LOAD_CONST_INLINE_BORROW
  • _POP_THREE
  • _POP_TWO

We can now optimize:

_LOAD_CONST_INLINE_BORROW
_LOAD_CONST_INLINE_BORROW
_POP_CALL_TWO_LOAD_CONST_INLINE_BORROW

into just

_POP_CALL_LOAD_CONST_INLINE_BORROW

i.e. we save two stack pushes and two stack pops.

PyStackRef_CLOSE(pop1);
}

tier2 pure op(_POP_CALL_LOAD_CONST_INLINE_BORROW, (ptr/4, callable, null -- value)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove the pure annotations from these please? They don't get optimized to anything so there's not really a use for them.

Copy link
Member

@Fidget-Spinner Fidget-Spinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. I think @brandtbucher brought this up before, but I think we should be making the uop buffer add-to rather than in-place. What I mean is that instead of adding a pop variant for every optimization we want to do:

...
_POP_X_LOAD_CONST_INLINE_BORROW

We should add:

...
_POP_TOP
_POP_TOP
...
_LOAD_CONST_INLINE_BORROW

This way:

  1. We don't have to add a new special pop uop for every single op. This saves on mental overhead, because a special pop variant of the uop for every uop we want to optimize is really tough to reason about.
  2. In the optimizer, we can just generalize it to POP_TOP X times, which is easier to write code to optimize for, rather than add a new entry to remove_unneeded_uops. As you probably noticed already, it's really tedious to do this.

This would require allowing us to add to the uop buffer, rather than overwriting it in-place. It might be a bigger change, but if you're up for it, it would greatly simplify all future optimizations, and I'd be grateful for someone cleaning that part up. It would best be a separate PR.

@bedevere-app
Copy link

bedevere-app bot commented May 21, 2025

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@tomasr8
Copy link
Member Author

tomasr8 commented May 21, 2025

Thanks for the PR. I think @brandtbucher brought this up before, but I think we should be making the uop buffer add-to rather than in-place.

We actually discussed this w/ Brandt yesterday and I totally agree with that. Brandt also mentioned that it is ok to add this for now but I don't mind closing this and focusing on changing the uop buffer right away.

Note that Brandt greatly simplified remove_unneeded_uops yesterday in #134373. Once I rebase this PR, it should end up being much simpler.

cc @brandtbucher

@brandtbucher
Copy link
Member

Yeah, 90% of the reason why I felt the urgency to move away from in-place modification was because the logic in the pass that removes matching pushes and pops was getting completely out-of-hand. That's not a concern anymore, so I think we can keep the momentum going on the current in-place optimizer work for the time being.

@Fidget-Spinner
Copy link
Member

Fidget-Spinner commented May 21, 2025

Yeah, 90% of the reason why I felt the urgency to move away from in-place modification was because the logic in the pass that removes matching pushes and pops was getting completely out-of-hand. That's not a concern anymore, so I think we can keep the momentum going on the current in-place optimizer work for the time being.

Ok. Can we please make sure to move away from the in-place optimizer after this PR for any new pop tops load const inline X instructions? The pushes and pops are no longer an issue, but the number of new uops variants we're adding is getting out of hand. I have no clue what some of these uops do unless I specifically look up their emission in the optimizer. We don't need this many uops and fewer uops is generally a better thing (less maintainer burden, less human error ...). If we can generalize to just a few replicate(7) POP_TOP_X instructions, that would be awesome.

@brandtbucher
Copy link
Member

Yep, I agree. It also allows us to specialize individual pops for specific known types, immortal values, non-escaping pops, etc. After the sprint ends tomorrow, we can start with larger cleanup efforts (which I agree we need, in the tracer too).

@Fidget-Spinner
Copy link
Member

Great! Feel free to dismiss my review then in the GH UI.

@tomasr8 tomasr8 force-pushed the jit-optimize-pop-call-two branch 4 times, most recently from 3d71a8e to 47995d0 Compare May 22, 2025 15:47
@tomasr8 tomasr8 force-pushed the jit-optimize-pop-call-two branch from 47995d0 to 90d38d7 Compare May 22, 2025 15:48
@tomasr8 tomasr8 marked this pull request as ready for review May 22, 2025 15:50
@brandtbucher
Copy link
Member

When I get back, I'll start working on making the optimizer operate on a copy of the stream. This is good for now though. Really cool to see entire calls start disappearing.

YvesDup pushed a commit to YvesDup/cpython that referenced this pull request May 22, 2025
@brandtbucher brandtbucher merged commit d1ea8ed into python:main May 22, 2025
64 checks passed
@brandtbucher
Copy link
Member

Looks great. @tomasr8: natural follow-ups would be support for tuples of types in isinstance calls, as well as extending this optimization to the other calls we specialize (and seeing if we should possibly specialize more of them).

@tomasr8 tomasr8 deleted the jit-optimize-pop-call-two branch May 22, 2025 18:06
@tomasr8
Copy link
Member Author

tomasr8 commented May 22, 2025

Looks great. @tomasr8: natural follow-ups would be support for tuples of types in isinstance calls, as well as extending this optimization to the other calls we specialize (and seeing if we should possibly specialize more of them).

on it!

lkollar pushed a commit to lkollar/cpython that referenced this pull request May 26, 2025
lkollar pushed a commit to lkollar/cpython that referenced this pull request May 26, 2025
lkollar pushed a commit to lkollar/cpython that referenced this pull request May 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.