-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[MLIR][XeGPU] Add unroll patterns and blocking pass for XeGPU (1/N) #137010
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ With the latest revision this PR passed the C/C++ code formatter. |
Can you please add some justification/explanation regarding those unroll patterns |
Thanks @fschlimb, I made the changes according to your feedback. I hope I addressed all of your concerns. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'll be interesting to later see if we could generalize and reuse vector unrolling to achieve the same. For now, I think it's a good addition to xegpu infrastructure and we'll see in practice how it holds up.
I take it depends on #138701?
Yeah. these patterns are supposed to be companions to vector unrolling patterns. They share the same idea, one is for XeGPU ops only, and one is for vector ops. A pass are supposed to use both of them. |
} | ||
|
||
private: | ||
const char *const packAttrName = "__xetile_blocking_pack__"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xetile->xegpu I guess, here and in tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch. fixed it
/// provide a way to customize the native shape of the operation. | ||
struct UnrollOptions { | ||
using FilterConstraintFnType = std::function<LogicalResult(Operation *op)>; | ||
/// Callback function that indicates whether vector unrolling should be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: let's place this comment above "using" to have uniform look :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed it
Hi @fschlimb and @adam-smnk, do you have more suggestions? |
no, LGTM! |
Hi @adam-smnk, I am going to merge this first. If you have more suggestions, feel free to let me know. Thanks for your help! |
I'm seeing a regression on 32-bit platforms from this change:
|
auto shape = vecTy.getShape(); | ||
SmallVector<Value> results; | ||
for (SmallVector<int64_t> offsets : | ||
StaticTileOffsetRange(shape, blockSize)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR seems not handling xegpu.order attribute at this stage. load_nd on PVC support load with transpose, which takes order as [0,1].
StaticTileOffsetRange() can take extra loopOrder parameter which can be used to support this feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR seems not handling xegpu.order attribute at this stage. load_nd on PVC support load with transpose, which takes order as [0,1]. StaticTileOffsetRange() can take extra loopOrder parameter which can be used to support this feature.
I feel this belongs to the handling of transpose op, not the handling of xegpu.order attribute.
Hi @mgorny, could you help to share your build instructions, so I can reproduce it in my local machine. |
|
A relative simple reproducer would be:
For us,
Though I suppose |
Thanks, it should have been fixed in #140567, please take a look. |
Similar to vector ops, XeGPU ops need to be unrolled into smaller shapes such that they can be dispatched into a hardware instruction. This PR marks the initial phase of a series dedicated to incorporating unroll patterns for XeGPU operations. In this installment, we introduce patterns for the following operations: