Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Running lowermodulelds multiple times needs to be thinlto only, breaking fortran at present #122891

Copy link
Copy link
Open
@JonChesterfield

Description

@JonChesterfield
Issue body actions

As of #85626 and #75333 the lowermodulelds codegen pass is run as part of LTO. That doesn't work - the pass was designed to run once as part of codegen where it can globally allocate variables to the LDS space.

There is a narrow exemption carved out to unblock thinlto - provided the IR module is carved up into independent modules prior to codegen, no calls or references between them, running the allocator on each subgraph works ok and a second run during codegen over the entire module is a no-op. There's a partial check to notice when that invariant is breached - if the input IR has some allocated variables and some non-allocated variables, the pass aborts. That's the error message which @ergawy reported to me for Fortran.

I think the pass should be added to the thinlto pipeline and removed from the full lto pipeline, and generally only run once during codegen except for the thinlto case.

Arguments could be made that the pass should cope with being run multiple times on various bits of IR and spliced together. The problem with running on subgraphs is the reachable test can't be done - we don't know if a call to an external function accesses some visible LDS, or if a function can be called by an external kernel. A correct lowering then looks like a lot of table lookups and overallocation, at which point the user is likely to discover they've run out of LDS and/or have occupancy problems.

Tagging Joseph as well since I think he moved the openmp pipeline to use LTO at some point, in which case that's also vulnerable to this pattern. I suspect we're getting by at present because O0 doesn't get much use and most compilation flows are straightforward, e.g. they don't run opt on bits of IR manually.

Metadata

Metadata

Labels

LTOLink time optimization (regular/full LTO or ThinLTO)Link time optimization (regular/full LTO or ThinLTO)backend:AMDGPU

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    Morty Proxy This is a proxified and sanitized view of the page, visit original site.