Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

[MC][DebugInfo] Emit linetable entries with known offsets immediately #134677

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
Loading
from

Conversation

jmorse
Copy link
Member

@jmorse jmorse commented Apr 7, 2025

DWARF linetable entries are usually emitted as a sequence of MCDwarfLineAddrFragment fragments containing the line-number difference and an MCExpr describing the instruction-range the linetable entry covers. These then get relaxed during assembly emission.

However, a large number of these instruction-range expressions are ranges within a fixed MCDataFragment, i.e. a range over fixed-size instructions that are not subject to relaxation at a later stage. Thus, we can compute the address-delta immediately, and not spend time and memory describing that computation so it can be deferred.

~

A slightly earlier+dirtier version of this patch scored a cool 5% reduction in max-rss on most compile-time tracker samples -- see [0], scroll down to stage1-releaselto-g and observe the improvement in ".link" jobs. The intermediate jobs are LTO bitcode compilations so don't codegen, thus only the .link jobs are representative. The three compiles that don't improve (tramp3d, mafft, SPASS) are due to the domination of variable-location storage for tramp3d, and LiveDebugValues hitting max-rss for the last two.

I'm not intensely familiar with the MCFragment layer so I'm not sure what the best test strategy is. A stage2reldeb build of clang produced an identical binary, but as a performance improvement it's not clear if this needs a functional test. If so, I imagine using -debug-only=mc-dump and examining what fragments are produced would be the way forwards?

Note that the RISCV opt-out comes from precedent set in MCObjectStreamer::emitAbsoluteSymbolDiff and friends, RISCV appears to want to opt out of these space optimisations.

[0] https://llvm-compile-time-tracker.com/compare.php?from=e2c43ba981620cf71ce3ccf004db7c0db4caf8a7&to=ca512762f92108bea21b2e2517f9e17f9d680a33&stat=max-rss&details=on

@llvmbot llvmbot added the mc Machine (object) code label Apr 7, 2025
@llvmbot
Copy link
Member

llvmbot commented Apr 7, 2025

@llvm/pr-subscribers-backend-risc-v
@llvm/pr-subscribers-mc

@llvm/pr-subscribers-debuginfo

Author: Jeremy Morse (jmorse)

Changes

DWARF linetable entries are usually emitted as a sequence of MCDwarfLineAddrFragment fragments containing the line-number difference and an MCExpr describing the instruction-range the linetable entry covers. These then get relaxed during assembly emission.

However, a large number of these instruction-range expressions are ranges within a fixed MCDataFragment, i.e. a range over fixed-size instructions that are not subject to relaxation at a later stage. Thus, we can compute the address-delta immediately, and not spend time and memory describing that computation so it can be deferred.

~

A slightly earlier+dirtier version of this patch scored a cool 5% reduction in max-rss on most compile-time tracker samples -- see [0], scroll down to stage1-releaselto-g and observe the improvement in ".link" jobs. The intermediate jobs are LTO bitcode compilations so don't codegen, thus only the .link jobs are representative. The three compiles that don't improve (tramp3d, mafft, SPASS) are due to the domination of variable-location storage for tramp3d, and LiveDebugValues hitting max-rss for the last two.

I'm not intensely familiar with the MCFragment layer so I'm not sure what the best test strategy is. A stage2reldeb build of clang produced an identical binary, but as a performance improvement it's not clear if this needs a functional test. If so, I imagine using -debug-only=mc-dump and examining what fragments are produced would be the way forwards?

Note that the RISCV opt-out comes from precedent set in MCObjectStreamer::emitAbsoluteSymbolDiff and friends, RISCV appears to want to opt out of these space optimisations.

[0] https://llvm-compile-time-tracker.com/compare.php?from=e2c43ba981620cf71ce3ccf004db7c0db4caf8a7&to=ca512762f92108bea21b2e2517f9e17f9d680a33&stat=max-rss&details=on


Full diff: https://github.com/llvm/llvm-project/pull/134677.diff

1 Files Affected:

  • (modified) llvm/lib/MC/MCObjectStreamer.cpp (+18)
diff --git a/llvm/lib/MC/MCObjectStreamer.cpp b/llvm/lib/MC/MCObjectStreamer.cpp
index e228418fea987..8d1eab03290a9 100644
--- a/llvm/lib/MC/MCObjectStreamer.cpp
+++ b/llvm/lib/MC/MCObjectStreamer.cpp
@@ -462,6 +462,24 @@ void MCObjectStreamer::emitDwarfAdvanceLineAddr(int64_t LineDelta,
                          Label, PointerSize);
     return;
   }
+
+  // If the two labels are within the same fragment and it's a plain data
+  // fragment, then the address-offset is already a fixed constant and is not
+  // relaxable. Emit the advance-line-addr data immediately to save time and
+  // memory.
+  // As per commit bbea64250f6548, RISCV always desires symbolic relocations.
+  MCFragment *LastFrag = LastLabel->getFragment();
+  if (!getAssembler().getContext().getTargetTriple().isRISCV() &&
+      LastFrag->getKind() == MCFragment::FT_Data &&
+      Label->getFragment() == LastFrag) {
+    uint64_t AddrDelta = Label->getOffset() - LastLabel->getOffset();
+    SmallString<16> Tmp;
+    MCDwarfLineAddr::encode(getContext(), Assembler->getDWARFLinetableParams(),
+                            LineDelta, AddrDelta, Tmp);
+    emitBytes(Tmp);
+    return;
+  }
+
   const MCExpr *AddrDelta = buildSymbolDiff(*this, Label, LastLabel, SMLoc());
   insert(getContext().allocFragment<MCDwarfLineAddrFragment>(LineDelta,
                                                              *AddrDelta));

@MaskRay
Copy link
Member

MaskRay commented Apr 11, 2025

The code does exhibit inefficiencies.
I've mitigated the issues and fixed correctness (RISC-V linker relaxation) in commits like 4948849 (2023-05) and https://reviews.llvm.org/D150004 (2023-06),
(Some general MCFragment performance changes help as well).

Simplifying the code would enhance maintainability and reduce potential errors, particularly for linker relaxation targets like RISC-V, LoongArch, and potentially Xtensa. (My familiarity with those targets is limited. Many non-RISC-V targets probably haven't investigated much in this area yet.)

If we are going to introduce some complexity - yeah, testing MCDataFragment should be reliable. Perhaps use MCDataFragment::isLinkerRelaxable.
It should be more efficient than if (AddrDelta->evaluateAsAbsolute(Res, getAssemblerPtr())) { I removed from https://reviews.llvm.org/D150004

BTW, have you tried moving the logic to llvm/lib/MC/MCDwarf.cpp ? The labels are concerns primarily of MCDwarf and ideally the called helper in MCObjectStreamer only deals with MCFragment (which MCDwarf should not need to know).

// fragment, then the address-offset is already a fixed constant and is not
// relaxable. Emit the advance-line-addr data immediately to save time and
// memory.
// As per commit bbea64250f6548, RISCV always desires symbolic relocations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps use absoluteSymbolDiff and remove the isRISCV check. I've improved absoluteSymbolDiff and you may need to rebase

@jmorse jmorse force-pushed the perf/known-linetable-addr-offsets branch from e5bcdbf to da53ea3 Compare May 7, 2025 15:06
Copy link

github-actions bot commented May 7, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@jmorse
Copy link
Member Author

jmorse commented May 7, 2025

Thanks for the review, the simple bits first:

  • I've switched to using absoluteSymbolDiff, thanks,
  • I don't believe I can move the logic to MCDwarf as there's some deliberate indirection between MCAsmStreamer and MCObjectStreamer -- as far as I understand it textual assembly printing relies on the assembler to compute .debug_line, and so decisions about how to encode the linetable it are reasonably placed in MCObjectStreamer,
  • I've added a test using -debug-only=mc-dump that checks how the output is fragmented; for a very simple function with one block, we should be able to precompute all the address deltas in .debug_line, and thus emit .debug_line as a single MCDataFragment.

The most unclear thing to me right now is the RISCV behaviour, and in particular the test llvm/test/DebugInfo/RISCV/dwarf-riscv-relocs.ll . With todays trunk LLVM, it produces a .debug_line section that uses DW_LNS_fixed_advance_pc and a relocation to compute each address-delta in the linetable. With my patch applied, it instead uses a linetable special-opcode to advance the address in the linetable, because the address delta becomes a fixed value at assembly time. The only way to preserve this behaviour would be to add a target hook to let backends opt-out of the optimisation that I'm adding, such as requiresDiffExpressionRelocations that I've added in the most recent push (recovered from bbea642).

With that in mind, I'll signal the RISCV folks to see whether this is a behaviour they want to see preserved or whether it's a spurious test difference.

@jmorse
Copy link
Member Author

jmorse commented May 7, 2025

Ah -- actually I think I see what's going on with that test. It has portions in .debug_line that cross an isLinkerRelaxable boundary /and/ it has portions of .debug_line that are fixed-width. Some parts change, while some do not.

With that in mind, I'll remove the requiresDiffExpressionRelocations hook I added to this PR, pre-commit a version of dwarf-riscv-relocs.ll that uses llvm-dwarfdump -debug-line -v for more verbose output, then rebase so that this PR illustrates exactly what changes in that test. Right now the only visible difference is in the size of .debug_line, but the encoding is what's important.

jmorse added a commit that referenced this pull request May 8, 2025
This is a pre-commit for a patch (#134677) -- the test behaviour is not
being changed, instead I've added "-v" to the llvm-dwarfdump commandline
for --debug-line. This prints out how the linetable is encoded, not just
what the linetable means, and it's important to illustrate that in the
upcoming patch. I've separated out the llvm-dwarfdump line for .debug_info
so that it's not affected by adding -v.
jmorse added 6 commits May 8, 2025 10:45
DWARF linetable entries are usually emitted as a sequence of
MCDwarfLineAddrFragment fragments containing the line-number difference and
an MCExpr describing the instruction-range the linetable entry covers.
These then get relaxed during assembly emission.

However, a large number of these instruction-range expressions are ranges
within a fixed MCDataFragment, i.e. a range over fixed-size instructions
that are not subject to relaxation at a later stage. Thus, we can compute
the address-delta immediately, and not spend time and memory describing
that computation so it can be deferred.
See the comments on bbea642, the RISCV relocation model does
something differently, and thus we need to emit linetables in a fully
symbolic way.
…elocs

Specifically: bbea642 took away a hook for whether a target wanted
symbol differences to be expressed in relocations or not, and here I've
found a use case for it.
Turns out this isn't needed, each fragment records whether it's relaxable
@jmorse jmorse force-pushed the perf/known-linetable-addr-offsets branch from da53ea3 to ba944d4 Compare May 8, 2025 09:56
@jmorse
Copy link
Member Author

jmorse commented May 8, 2025

I believe this latest version now gets it right (TM) for RISCV as well -- observe the changes to the encoding of dwarf-riscv-relocs.ll. The assembly for this file reads:

00000000 <main>:
   0: ff010113      addi    sp, sp, -0x10
   4: 00112623      sw      ra, 0xc(sp)
   8: 00812423      sw      s0, 0x8(sp)
   c: 01010413      addi    s0, sp, 0x10
  10: 00000097      auipc   ra, 0x0
                    00000010:  R_RISCV_CALL_PLT     ext
                    00000010:  R_RISCV_RELAX        *ABS*
  14: 000080e7      jalr    ra <main+0x10>
  18: fe042a23      sw      zero, -0xc(s0)
  1c: 00000513      li      a0, 0x0
  20: 00c12083      lw      ra, 0xc(sp)
  24: 00812403      lw      s0, 0x8(sp)
  28: 01010113      addi    sp, sp, 0x10
  2c: 00008067      ret

i.e. there's a linker-relaxable relocation at address 0x10. The line table retains a DW_LNS_fixed_advance_pc opcode across that boundary, which is controlled by the relocation for .debug_line (which is in the test too). However for the latter section from 0x1c onwards, which isn't relaxable and is in one plain MCDataFragment, we switch to using DWARF linetable special opcodes instead, a smaller and more optimal encoding. I believe this is the correct behaviour: the linetable address-differences remain relocatable across boundaries that are relaxable, but the relative address-differences can be fixed for portions that aren't relaxable.

Thanks for the blog post at https://maskray.me/blog/2021-03-14-the-dark-side-of-riscv-linker-relaxation , that's educated me about what's going on here @MaskRay .

@@ -464,6 +464,19 @@ void MCObjectStreamer::emitDwarfAdvanceLineAddr(int64_t LineDelta,
Label, PointerSize);
return;
}

// If the two labels are within the same fragment and it's a plain data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the two labels are within the same data fragment, then the address-offset is ...

(a data fragment is a MCDataFragment. There is no special or plain data fragment)

@@ -0,0 +1,94 @@
; RUN: llc %s -o %t.o -filetype=obj
; RUN: llvm-dwarfdump --debug-line %t.o | FileCheck %s --check-prefix=LINES
; RUN: llc %s -o %t.o -filetype=obj -debug-only=mc-dump 2>&1 | FileCheck %s --check-prefix=FRAGMENTS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding -debug-only tests requires caution. They need REQUIRES: asserts

@jmorse
Copy link
Member Author

jmorse commented May 12, 2025

Reworded comment, added REQUIRES, thanks for the review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.